Froolow comments on ‘Dissolving’ AI Risk – Parameter Uncertainty in AI Future Forecasting

Froolow Oct 19, 2022, 9:46 AM
34 points
2 ∶ 0
This is unquestionably the strongest argument against the SDO method as it applies to AI Risk, and therefore the biggest limitation of the essay. There is really good chance that many of the parameters in the Carlsmith Model are correlated in real life (since basically everything is correlated with everything else by some mechanism), so the important question is whether they are independent enough that what I’ve got here is still plausible. I offer some thoughts on the issue in Section 5.1.
To the best of my knowledge, there is no work making a very strong theoretical claim that any particular element of the Carlsmith Model will be strongly correlated with any other element. I have seen people suggest mechanisms with the implicit claim that if AI is more revolutionary than we expect then there will be correlation between our incentive to deploy it, our desire to expose it to high-impact inputs and our inability to stop it once it tries to disempower us—but I’m pretty confident the validity check in Section 4.3.3 demonstrates that correlation between some parameters doesn’t fundamentally alter conclusions about distributions, although would alter the exact point estimates which were reached.
Practically, I don’t think there is strong evidence that people’s parameters are correlated across estimates to a degree that will significantly alter results. Below is the correlation matrix for the Full Survey estimates with p<0.05 highlighted in green. Obviously I’m once again leaning on the argument that a survey of AI Risk is the same thing as the actual AI Risk, which I think is another weakness of the essay.
This doesn’t spark any major concerns for me—there is more correlation than would be expected by chance, but it seems to be mostly contained within the ‘Alignment turns out to be easy’ step, and as discussed above the mechanism still functions if one or two steps are removed because they are indistinguishable from preceding steps. The fact that there is more positive than negative correlation step is some evidence of the ‘general factor of optimism’ which you describe (because the ‘optimistic’ view is that we won’t deploy AI until we know it is safe, so we’d expect negative correlation on this factor in the table). Overall I think my assumption of independence is reasonable in the sense that the results are likely to be robust to the sorts of correlations I have empirically observed and theoretically seen accounted for, however I do agree with you that if there is a critical flaw in the essay it is likely to be found here.
I don’t quite follow your logic where you conclude that if estimates are correlated then simple mean is preferred—my exploration of the problem suggests that if estimates are correlated to a degree significant enough to affect my overall conclusion then you stop being able to use conventional statistics at all and have to do something fancy like microsimulation. Anecdata—in the specific example you give my intuition is that 0.4% really is a better summary of our knowledge, since otherwise we round off Aida’s position to ‘approximately 1%’ which is several orders of magnitude incorrect. Although as I say above, in the situation you describe above both summary estimates are misleading in different ways and we should look at the distribution—which is the key point I was trying to make in the essay.
What links here?
- Steven Byrnes's comment on ‘Dissolving’ AI Risk – Parameter Uncertainty in AI Future Forecasting by Froolow (Oct 19, 2022, 7:48 PM; 2 points)
- Thomas Kwa Oct 19, 2022, 8:31 PM
  12 points
  2 ∶ 0
  Parent
  Thanks. It looks reassuring that the correlations aren’t as large as I thought. (How much variance is in the first principal component in log odds space though?) And yes, I now think the arguments I had weren’t so much for arithmetic mean as against total independence / geometric mean, so I’ll edit my comment to reflect that.
- Dan_Keys Oct 21, 2022, 7:44 AM
  1 point
  0 ∶ 0
  Parent
  If the estimates for the different components were independent, then wouldn’t the distribution of synthetic estimates be the same as the distribution of individual people’s estimates?
  Multiplying Alice’s p1 x Bob’s p2 x Carol’s p3 x … would draw from the same distribution as multiplying Alice’s p1 x Alice’s p2 x Alice’s p3 … , if estimates to the different questions are unrelated.
  So you could see how much non-independence affects the bottom-line results just by comparing the synthetic distribution with the distribution of individual estimates (treating each individual as one data point and multiplying their 6 component probabilities together to get their p(existential catastrophe)).
  Insofar as the 6 components are not independent, the question of whether to use synthetic estimates or just look at the distribution of individuals’ estimates comes down to 1) how much value is there in increasing the effective sample size by using synthetic estimates and 2) is the non-independence that exists something that you want to erase by scrambling together different people’s component estimates (because it mainly reflects reasoning errors) or is it something that you want to maintain by looking at individual estimates (because it reflects the structure of the situation).
  - Froolow Oct 21, 2022, 8:26 AM
    3 points
    0 ∶ 0
    Parent
    In practice these numbers wouldn’t perfectly match even if there was no correlation because there is some missing survey data that the SDO method ignores (because naturally you can’t sample data that doesn’t exist). In principle I don’t see why we shouldn’t use this as a good rule-of-thumb check for unacceptable correlation.
    The synth distribution gives a geomean of 1.6%, a simple mean of around 9.6%, as per the essay
    The distribution of all survey responses multiplied together (as per Alice p1 x Alice p2 x Alice p3) gives a geomean of approx 2.3% and a simple mean of approx 17.3%.
    I’d suggest that this implies the SDO method’s weakness to correlated results is potentially depressing the actual result by about 50%, give or take. I don’t think that’s either obviously small enough not to matter or obviously large enough to invalidate the whole approach, although my instinct is that when talking about order-of-magnitude uncertainty, 50% point error would not be a showstopper.