I can buy that it is sometimes useful to think about x-risk in terms of a partition of the worlds we could be in, the probability of each part in the partition, and the probability of x-risk in each part. For this to be useful in decision-making, I think we’d want the partition to sort of “carve reality at its joints” in a way that’s relevant to the decisions we’d like to make. I’m generally unconvinced that the partition given here achieves this.
My best attempt at trying to grok the partition here is that worlds are grouped according to something like the “intrinsic difficulty” of alignment, with the remaining uncertainty being over our actions to tackle alignment. But I don’t see a good reason to think that the calculation methodology used in the post would give us such a partition. Perhaps there is another natural way to interpret the partition given, but I don’t see it.
For a more concrete argument against this distribution of probabilities capturing something useful, let’s consider the following two respondents. The first respondent is certain about the “intrinsic difficulty” of alignment, thinking we just have a probability of 50% of surviving. Maybe this first respondent is certain that our survival is determined by an actual coinflip happening in 2040, or whatever. The other respondent thinks there is a 50% chance we are in a world in which alignment is super easy, in which we have a 99% chance of survival, and a 50% chance we are in a world in which alignment is super hard, in which we have a 1% chance of survival. Both respondents will answer 50% when we ask them what their p(doom) is, but they clearly have very different views about the probability distribution on the “intrinsic difficulty” of alignment.
Now, insofar as the above makes sense, it’s probably accurate to say that most respondents’ views on most of the surveyed questions are a lot like respondent 2, with a lot of uncertainty about the “intrinsic difficulty” involved, or whatever the relevant parameter is that the analysis hopes to partition according to. However, the methodology used would give the same results if the people we surveyed were all like respondent 1 and if the people we surveyed were all like respondent 2. (In fact, my vague intuition is that the best attempt to philosophically ground the methodology would assume that everyone is like respondent 1.) This seems strange, because as far as I can intuitively capture what the distribution over probabilities is hoping to achieve, it seems that it should be very different in the two cases. Namely, if everyone is like respondent 1, the distribution should be much more concentrated on certain kinds of worlds than if everyone is like respondent 2.
Note that the question about the usefulness of the partition is distinct from whether one can partition the worlds into groups with the given conditional probabilities of x-risk. If I think a coin lands heads in 50% of the worlds, the math lets me partition all the possible worlds into 50% where the coin has a 0% probability of landing heads, and 50% where the coin has a 100% probability of landing heads. Alternatively, the math also lets me partition all possible worlds into 50% where the coin has 50% probability of landing heads, and 50% where the coin has 50% probability of landing heads. What I’m doubting is that either distribution would be helpful here, and that the distribution given in the post is helpful for understanding x-risk.
I can buy that it is sometimes useful to think about x-risk in terms of a partition of the worlds we could be in, the probability of each part in the partition, and the probability of x-risk in each part. For this to be useful in decision-making, I think we’d want the partition to sort of “carve reality at its joints” in a way that’s relevant to the decisions we’d like to make. I’m generally unconvinced that the partition given here achieves this.
My best attempt at trying to grok the partition here is that worlds are grouped according to something like the “intrinsic difficulty” of alignment, with the remaining uncertainty being over our actions to tackle alignment. But I don’t see a good reason to think that the calculation methodology used in the post would give us such a partition. Perhaps there is another natural way to interpret the partition given, but I don’t see it.
For a more concrete argument against this distribution of probabilities capturing something useful, let’s consider the following two respondents. The first respondent is certain about the “intrinsic difficulty” of alignment, thinking we just have a probability of 50% of surviving. Maybe this first respondent is certain that our survival is determined by an actual coinflip happening in 2040, or whatever. The other respondent thinks there is a 50% chance we are in a world in which alignment is super easy, in which we have a 99% chance of survival, and a 50% chance we are in a world in which alignment is super hard, in which we have a 1% chance of survival. Both respondents will answer 50% when we ask them what their p(doom) is, but they clearly have very different views about the probability distribution on the “intrinsic difficulty” of alignment.
Now, insofar as the above makes sense, it’s probably accurate to say that most respondents’ views on most of the surveyed questions are a lot like respondent 2, with a lot of uncertainty about the “intrinsic difficulty” involved, or whatever the relevant parameter is that the analysis hopes to partition according to. However, the methodology used would give the same results if the people we surveyed were all like respondent 1 and if the people we surveyed were all like respondent 2. (In fact, my vague intuition is that the best attempt to philosophically ground the methodology would assume that everyone is like respondent 1.) This seems strange, because as far as I can intuitively capture what the distribution over probabilities is hoping to achieve, it seems that it should be very different in the two cases. Namely, if everyone is like respondent 1, the distribution should be much more concentrated on certain kinds of worlds than if everyone is like respondent 2.
Note that the question about the usefulness of the partition is distinct from whether one can partition the worlds into groups with the given conditional probabilities of x-risk. If I think a coin lands heads in 50% of the worlds, the math lets me partition all the possible worlds into 50% where the coin has a 0% probability of landing heads, and 50% where the coin has a 100% probability of landing heads. Alternatively, the math also lets me partition all possible worlds into 50% where the coin has 50% probability of landing heads, and 50% where the coin has 50% probability of landing heads. What I’m doubting is that either distribution would be helpful here, and that the distribution given in the post is helpful for understanding x-risk.