The main assumption of this post seems to be that, not only are the true values of the parameters independent, but a given personās estimates of stages are independent. This is a judgment call Iām weakly against.
Suppose you put equal weight on the opinions of Aida and Bjorn. Aida gives 10% for each of the 6 stages, and Bjorn gives 99%, so that Aida has an overall x-risk probability of 10^-6 and Bjorn has around 94%.
If you just take the arithmetic mean between their overall estimates, itās like saying āwe might be in worlds where Aida is correct, or worlds where Bjorn is correctā
But if you take the geometric mean or decompose into stages, as in this post, itās like saying āweāre probably in a world where each of the bits of evidence Aida and Bjorn have towards each proposition are independently 50% likely to be valid, so Aida and Bjorn are each more correct about 2-4 stagesā.
These give you vastly different results, 47% vs 0.4%. Which one is right? I think there are two related arguments to be made against the geometric mean, although they donāt push me all the way towards using the arithmetic mean:
Aida and Bjornās wildly divergent estimates on probably come from some underlying difference in their models of the world, not as independent draws. In this case where Aida is more optimistic about Bjorn on each of the 6 stages, it is unlikely that this is due to independent draws. I think this kind of multidimensional difference in optimism between alignment researchers is actually happening, so any model should take this into account.
If we learn that Bjorn was wrong about stage 1, then we should put less weight on his estimates for stages 2-6. (My guess is thereās some copula that corresponds to a theoretically sensible way to update away from Bjornās position treating his opinions as partially correlated, but I donāt know enough statistics)
This is unquestionably the strongest argument against the SDO method as it applies to AI Risk, and therefore the biggest limitation of the essay. There is really good chance that many of the parameters in the Carlsmith Model are correlated in real life (since basically everything is correlated with everything else by some mechanism), so the important question is whether they are independent enough that what Iāve got here is still plausible. I offer some thoughts on the issue in Section 5.1.
To the best of my knowledge, there is no work making a very strong theoretical claim that any particular element of the Carlsmith Model will be strongly correlated with any other element. I have seen people suggest mechanisms with the implicit claim that if AI is more revolutionary than we expect then there will be correlation between our incentive to deploy it, our desire to expose it to high-impact inputs and our inability to stop it once it tries to disempower usābut Iām pretty confident the validity check in Section 4.3.3 demonstrates that correlation between some parameters doesnāt fundamentally alter conclusions about distributions, although would alter the exact point estimates which were reached.
Practically, I donāt think there is strong evidence that peopleās parameters are correlated across estimates to a degree that will significantly alter results. Below is the correlation matrix for the Full Survey estimates with p<0.05 highlighted in green. Obviously Iām once again leaning on the argument that a survey of AI Risk is the same thing as the actual AI Risk, which I think is another weakness of the essay.
This doesnāt spark any major concerns for meāthere is more correlation than would be expected by chance, but it seems to be mostly contained within the āAlignment turns out to be easyā step, and as discussed above the mechanism still functions if one or two steps are removed because they are indistinguishable from preceding steps. The fact that there is more positive than negative correlation step is some evidence of the āgeneral factor of optimismā which you describe (because the āoptimisticā view is that we wonāt deploy AI until we know it is safe, so weād expect negative correlation on this factor in the table). Overall I think my assumption of independence is reasonable in the sense that the results are likely to be robust to the sorts of correlations I have empirically observed and theoretically seen accounted for, however I do agree with you that if there is a critical flaw in the essay it is likely to be found here.
I donāt quite follow your logic where you conclude that if estimates are correlated then simple mean is preferredāmy exploration of the problem suggests that if estimates are correlated to a degree significant enough to affect my overall conclusion then you stop being able to use conventional statistics at all and have to do something fancy like microsimulation. Anecdataāin the specific example you give my intuition is that 0.4% really is a better summary of our knowledge, since otherwise we round off Aidaās position to āapproximately 1%ā which is several orders of magnitude incorrect. Although as I say above, in the situation you describe above both summary estimates are misleading in different ways and we should look at the distributionāwhich is the key point I was trying to make in the essay.
Thanks. It looks reassuring that the correlations arenāt as large as I thought. (How much variance is in the first principal component in log odds space though?) And yes, I now think the arguments I had werenāt so much for arithmetic mean as against total independence /ā geometric mean, so Iāll edit my comment to reflect that.
If the estimates for the different components were independent, then wouldnāt the distribution of synthetic estimates be the same as the distribution of individual peopleās estimates?
Multiplying Aliceās p1 x Bobās p2 x Carolās p3 x ⦠would draw from the same distribution as multiplying Aliceās p1 x Aliceās p2 x Aliceās p3 ⦠, if estimates to the different questions are unrelated.
So you could see how much non-independence affects the bottom-line results just by comparing the synthetic distribution with the distribution of individual estimates (treating each individual as one data point and multiplying their 6 component probabilities together to get their p(existential catastrophe)).
Insofar as the 6 components are not independent, the question of whether to use synthetic estimates or just look at the distribution of individualsā estimates comes down to 1) how much value is there in increasing the effective sample size by using synthetic estimates and 2) is the non-independence that exists something that you want to erase by scrambling together different peopleās component estimates (because it mainly reflects reasoning errors) or is it something that you want to maintain by looking at individual estimates (because it reflects the structure of the situation).
In practice these numbers wouldnāt perfectly match even if there was no correlation because there is some missing survey data that the SDO method ignores (because naturally you canāt sample data that doesnāt exist). In principle I donāt see why we shouldnāt use this as a good rule-of-thumb check for unacceptable correlation.
The synth distribution gives a geomean of 1.6%, a simple mean of around 9.6%, as per the essay
The distribution of all survey responses multiplied together (as per Alice p1 x Alice p2 x Alice p3) gives a geomean of approx 2.3% and a simple mean of approx 17.3%.
Iād suggest that this implies the SDO methodās weakness to correlated results is potentially depressing the actual result by about 50%, give or take. I donāt think thatās either obviously small enough not to matter or obviously large enough to invalidate the whole approach, although my instinct is that when talking about order-of-magnitude uncertainty, 50% point error would not be a showstopper.
I currently believe that the geometric mean of odds should be the default option for aggregating forecasts. In the two large scale empirical evaluations I am aware of [1][2], it surpasses the mean of probabilities and the median (*). It is also the only method that makes the group aggregate behave as a Bayesian, and (in my opinion) it behaves well with extreme predictions.
If you are not aggregating all-considered views of experts, but rather aggregating models with mutually exclusive assumptions, use the mean of probabilities.
The main assumption of this post seems to be that, not only are the true values of the parameters independent, but a given personās estimates of stages are independent. This is a judgment call Iām weakly against.
Suppose you put equal weight on the opinions of Aida and Bjorn. Aida gives 10% for each of the 6 stages, and Bjorn gives 99%, so that Aida has an overall x-risk probability of 10^-6 and Bjorn has around 94%.
If you just take the arithmetic mean between their overall estimates, itās like saying āwe might be in worlds where Aida is correct, or worlds where Bjorn is correctā
But if you take the geometric mean or decompose into stages, as in this post, itās like saying āweāre probably in a world where each of the bits of evidence Aida and Bjorn have towards each proposition are independently 50% likely to be valid, so Aida and Bjorn are each more correct about 2-4 stagesā.
These give you vastly different results, 47% vs 0.4%. Which one is right? I think there are two related arguments to be made against the geometric mean, although they donāt push me all the way towards using the arithmetic mean:
Aida and Bjornās wildly divergent estimates on probably come from some underlying difference in their models of the world, not as independent draws. In this case where Aida is more optimistic about Bjorn on each of the 6 stages, it is unlikely that this is due to independent draws. I think this kind of multidimensional difference in optimism between alignment researchers is actually happening, so any model should take this into account.
If we learn that Bjorn was wrong about stage 1, then we should put less weight on his estimates for stages 2-6. (My guess is thereās some copula that corresponds to a theoretically sensible way to update away from Bjornās position treating his opinions as partially correlated, but I donāt know enough statistics)
This is unquestionably the strongest argument against the SDO method as it applies to AI Risk, and therefore the biggest limitation of the essay. There is really good chance that many of the parameters in the Carlsmith Model are correlated in real life (since basically everything is correlated with everything else by some mechanism), so the important question is whether they are independent enough that what Iāve got here is still plausible. I offer some thoughts on the issue in Section 5.1.
To the best of my knowledge, there is no work making a very strong theoretical claim that any particular element of the Carlsmith Model will be strongly correlated with any other element. I have seen people suggest mechanisms with the implicit claim that if AI is more revolutionary than we expect then there will be correlation between our incentive to deploy it, our desire to expose it to high-impact inputs and our inability to stop it once it tries to disempower usābut Iām pretty confident the validity check in Section 4.3.3 demonstrates that correlation between some parameters doesnāt fundamentally alter conclusions about distributions, although would alter the exact point estimates which were reached.
Practically, I donāt think there is strong evidence that peopleās parameters are correlated across estimates to a degree that will significantly alter results. Below is the correlation matrix for the Full Survey estimates with p<0.05 highlighted in green. Obviously Iām once again leaning on the argument that a survey of AI Risk is the same thing as the actual AI Risk, which I think is another weakness of the essay.
This doesnāt spark any major concerns for meāthere is more correlation than would be expected by chance, but it seems to be mostly contained within the āAlignment turns out to be easyā step, and as discussed above the mechanism still functions if one or two steps are removed because they are indistinguishable from preceding steps. The fact that there is more positive than negative correlation step is some evidence of the āgeneral factor of optimismā which you describe (because the āoptimisticā view is that we wonāt deploy AI until we know it is safe, so weād expect negative correlation on this factor in the table). Overall I think my assumption of independence is reasonable in the sense that the results are likely to be robust to the sorts of correlations I have empirically observed and theoretically seen accounted for, however I do agree with you that if there is a critical flaw in the essay it is likely to be found here.
I donāt quite follow your logic where you conclude that if estimates are correlated then simple mean is preferredāmy exploration of the problem suggests that if estimates are correlated to a degree significant enough to affect my overall conclusion then you stop being able to use conventional statistics at all and have to do something fancy like microsimulation. Anecdataāin the specific example you give my intuition is that 0.4% really is a better summary of our knowledge, since otherwise we round off Aidaās position to āapproximately 1%ā which is several orders of magnitude incorrect. Although as I say above, in the situation you describe above both summary estimates are misleading in different ways and we should look at the distributionāwhich is the key point I was trying to make in the essay.
Thanks. It looks reassuring that the correlations arenāt as large as I thought. (How much variance is in the first principal component in log odds space though?) And yes, I now think the arguments I had werenāt so much for arithmetic mean as against total independence /ā geometric mean, so Iāll edit my comment to reflect that.
If the estimates for the different components were independent, then wouldnāt the distribution of synthetic estimates be the same as the distribution of individual peopleās estimates?
Multiplying Aliceās p1 x Bobās p2 x Carolās p3 x ⦠would draw from the same distribution as multiplying Aliceās p1 x Aliceās p2 x Aliceās p3 ⦠, if estimates to the different questions are unrelated.
So you could see how much non-independence affects the bottom-line results just by comparing the synthetic distribution with the distribution of individual estimates (treating each individual as one data point and multiplying their 6 component probabilities together to get their p(existential catastrophe)).
Insofar as the 6 components are not independent, the question of whether to use synthetic estimates or just look at the distribution of individualsā estimates comes down to 1) how much value is there in increasing the effective sample size by using synthetic estimates and 2) is the non-independence that exists something that you want to erase by scrambling together different peopleās component estimates (because it mainly reflects reasoning errors) or is it something that you want to maintain by looking at individual estimates (because it reflects the structure of the situation).
In practice these numbers wouldnāt perfectly match even if there was no correlation because there is some missing survey data that the SDO method ignores (because naturally you canāt sample data that doesnāt exist). In principle I donāt see why we shouldnāt use this as a good rule-of-thumb check for unacceptable correlation.
The synth distribution gives a geomean of 1.6%, a simple mean of around 9.6%, as per the essay
The distribution of all survey responses multiplied together (as per Alice p1 x Alice p2 x Alice p3) gives a geomean of approx 2.3% and a simple mean of approx 17.3%.
Iād suggest that this implies the SDO methodās weakness to correlated results is potentially depressing the actual result by about 50%, give or take. I donāt think thatās either obviously small enough not to matter or obviously large enough to invalidate the whole approach, although my instinct is that when talking about order-of-magnitude uncertainty, 50% point error would not be a showstopper.
Jaime Seville (who usually argues in favor of using geometric mean of odds over arithmetic mean of probabilities) makes a similar point here:
Strong endorsement for pushing against unjustified independence assumptions.
Iām having a harder time thinking about how it applies to AI specifically, but I think itās a common problem in generalāe.g. in forecasting.