Thomas Kwa🔹 comments on ‘Dissolving’ AI Risk – Parameter Uncertainty in AI Future Forecasting

Thomas Kwa🔹 19 Oct 2022 5:57 UTC
84 points
12 ∶ 0
The main assumption of this post seems to be that, not only are the true values of the parameters independent, but a given person’s estimates of stages are independent. This is a judgment call I’m weakly against.
Suppose you put equal weight on the opinions of Aida and Bjorn. Aida gives 10% for each of the 6 stages, and Bjorn gives 99%, so that Aida has an overall x-risk probability of 10^-6 and Bjorn has around 94%.
- If you just take the arithmetic mean between their overall estimates, it’s like saying “we might be in worlds where Aida is correct, or worlds where Bjorn is correct”
- But if you take the geometric mean or decompose into stages, as in this post, it’s like saying “we’re probably in a world where each of the bits of evidence Aida and Bjorn have towards each proposition are independently 50% likely to be valid, so Aida and Bjorn are each more correct about 2-4 stages”.
These give you vastly different results, 47% vs 0.4%. Which one is right? I think there are two related arguments to be made against the geometric mean, although they don’t push me all the way towards using the arithmetic mean:
- Aida and Bjorn’s wildly divergent estimates on probably come from some underlying difference in their models of the world, not as independent draws. In this case where Aida is more optimistic about Bjorn on each of the 6 stages, it is unlikely that this is due to independent draws. I think this kind of multidimensional difference in optimism between alignment researchers is actually happening, so any model should take this into account.
- If we learn that Bjorn was wrong about stage 1, then we should put less weight on his estimates for stages 2-6. (My guess is there’s some copula that corresponds to a theoretically sensible way to update away from Bjorn’s position treating his opinions as partially correlated, but I don’t know enough statistics)
What links here?
- Erik Jenner's comment on ‘Dissolving’ AI Risk – Parameter Uncertainty in AI Future Forecasting by Froolow (19 Oct 2022 18:09 UTC; 16 points)
- Froolow 19 Oct 2022 9:46 UTC
  36 points
  3 ∶ 0
  Parent
  This is unquestionably the strongest argument against the SDO method as it applies to AI Risk, and therefore the biggest limitation of the essay. There is really good chance that many of the parameters in the Carlsmith Model are correlated in real life (since basically everything is correlated with everything else by some mechanism), so the important question is whether they are independent enough that what I’ve got here is still plausible. I offer some thoughts on the issue in Section 5.1.
  To the best of my knowledge, there is no work making a very strong theoretical claim that any particular element of the Carlsmith Model will be strongly correlated with any other element. I have seen people suggest mechanisms with the implicit claim that if AI is more revolutionary than we expect then there will be correlation between our incentive to deploy it, our desire to expose it to high-impact inputs and our inability to stop it once it tries to disempower us—but I’m pretty confident the validity check in Section 4.3.3 demonstrates that correlation between some parameters doesn’t fundamentally alter conclusions about distributions, although would alter the exact point estimates which were reached.
  Practically, I don’t think there is strong evidence that people’s parameters are correlated across estimates to a degree that will significantly alter results. Below is the correlation matrix for the Full Survey estimates with p<0.05 highlighted in green. Obviously I’m once again leaning on the argument that a survey of AI Risk is the same thing as the actual AI Risk, which I think is another weakness of the essay.
  This doesn’t spark any major concerns for me—there is more correlation than would be expected by chance, but it seems to be mostly contained within the ‘Alignment turns out to be easy’ step, and as discussed above the mechanism still functions if one or two steps are removed because they are indistinguishable from preceding steps. The fact that there is more positive than negative correlation step is some evidence of the ‘general factor of optimism’ which you describe (because the ‘optimistic’ view is that we won’t deploy AI until we know it is safe, so we’d expect negative correlation on this factor in the table). Overall I think my assumption of independence is reasonable in the sense that the results are likely to be robust to the sorts of correlations I have empirically observed and theoretically seen accounted for, however I do agree with you that if there is a critical flaw in the essay it is likely to be found here.
  I don’t quite follow your logic where you conclude that if estimates are correlated then simple mean is preferred—my exploration of the problem suggests that if estimates are correlated to a degree significant enough to affect my overall conclusion then you stop being able to use conventional statistics at all and have to do something fancy like microsimulation. Anecdata—in the specific example you give my intuition is that 0.4% really is a better summary of our knowledge, since otherwise we round off Aida’s position to ‘approximately 1%’ which is several orders of magnitude incorrect. Although as I say above, in the situation you describe above both summary estimates are misleading in different ways and we should look at the distribution—which is the key point I was trying to make in the essay.
  What links here?
  - Steven Byrnes's comment on ‘Dissolving’ AI Risk – Parameter Uncertainty in AI Future Forecasting by Froolow (19 Oct 2022 19:48 UTC; 2 points)
  - Thomas Kwa🔹 19 Oct 2022 20:31 UTC
    12 points
    2 ∶ 0
    Parent
    Thanks. It looks reassuring that the correlations aren’t as large as I thought. (How much variance is in the first principal component in log odds space though?) And yes, I now think the arguments I had weren’t so much for arithmetic mean as against total independence / geometric mean, so I’ll edit my comment to reflect that.
  - Dan_Keys 21 Oct 2022 7:44 UTC
    1 point
    0 ∶ 0
    Parent
    If the estimates for the different components were independent, then wouldn’t the distribution of synthetic estimates be the same as the distribution of individual people’s estimates?
    Multiplying Alice’s p1 x Bob’s p2 x Carol’s p3 x … would draw from the same distribution as multiplying Alice’s p1 x Alice’s p2 x Alice’s p3 … , if estimates to the different questions are unrelated.
    So you could see how much non-independence affects the bottom-line results just by comparing the synthetic distribution with the distribution of individual estimates (treating each individual as one data point and multiplying their 6 component probabilities together to get their p(existential catastrophe)).
    Insofar as the 6 components are not independent, the question of whether to use synthetic estimates or just look at the distribution of individuals’ estimates comes down to 1) how much value is there in increasing the effective sample size by using synthetic estimates and 2) is the non-independence that exists something that you want to erase by scrambling together different people’s component estimates (because it mainly reflects reasoning errors) or is it something that you want to maintain by looking at individual estimates (because it reflects the structure of the situation).
    - Froolow 21 Oct 2022 8:26 UTC
      3 points
      0 ∶ 0
      Parent
      In practice these numbers wouldn’t perfectly match even if there was no correlation because there is some missing survey data that the SDO method ignores (because naturally you can’t sample data that doesn’t exist). In principle I don’t see why we shouldn’t use this as a good rule-of-thumb check for unacceptable correlation.
      The synth distribution gives a geomean of 1.6%, a simple mean of around 9.6%, as per the essay
      The distribution of all survey responses multiplied together (as per Alice p1 x Alice p2 x Alice p3) gives a geomean of approx 2.3% and a simple mean of approx 17.3%.
      I’d suggest that this implies the SDO method’s weakness to correlated results is potentially depressing the actual result by about 50%, give or take. I don’t think that’s either obviously small enough not to matter or obviously large enough to invalidate the whole approach, although my instinct is that when talking about order-of-magnitude uncertainty, 50% point error would not be a showstopper.
- Linch 19 Oct 2022 6:33 UTC
  12 points
  2 ∶ 0
  Parent
  Jaime Seville (who usually argues in favor of using geometric mean of odds over arithmetic mean of probabilities) makes a similar point here:
  I currently believe that the geometric mean of odds should be the default option for aggregating forecasts. In the two large scale empirical evaluations I am aware of [1] [2], it surpasses the mean of probabilities and the median (*). It is also the only method that makes the group aggregate behave as a Bayesian, and (in my opinion) it behaves well with extreme predictions.
  If you are not aggregating all-considered views of experts, but rather aggregating models with mutually exclusive assumptions, use the mean of probabilities.
- Guy Raveh 19 Oct 2022 14:07 UTC
  3 points
  0 ∶ 0
  Parent
  Strong endorsement for pushing against unjustified independence assumptions.
  
  I’m having a harder time thinking about how it applies to AI specifically, but I think it’s a common problem in general—e.g. in forecasting.