spreadlove5683 comments on Summary of posts on XPT forecasts on AI risk and timelines

spreadlove5683 25 Jul 2023 12:56 UTC
2 points
0 ∶ 0
Can someone give me the TLDR on the implications of these results in light of the fact that Samotsvety’s group seemingly/perhaps had much higher odds for AI catastrophe? I didn’t read the exact definitions they used for catastrophe, but:

Samotsvety’s group (n=13) gave “What’s your probability of misaligned AI takeover by 2100, barring pre-APS-AI catastrophe?” at 25%
(source https://forum.effectivealtruism.org/posts/EG9xDM8YRz4JN4wMN/samotsvety-s-ai-risk-forecasts)

Whereas XPT gave
”AI Catastrophic risk (>10% of humans die within 5 years)” for year 2100 at 2.13%

Without having read the exact definitions for “misaligned AI takeover” and still knowing that Samotsvety’s prediction was conditional on pre-APS-AI catastrophe not happening, this still seems like a very large discrepancy. I know that Samotsvety’s group was a much smaller n. n=13 vs n=88. How much weight should we give to Samotsvety’s group’s other predictions on AI timelines given the discrepancy in the risk prediction likelihoods?
- rosehadshar 25 Jul 2023 14:29 UTC
  4 points
  1 ∶ 0
  Parent
  Good question.
  There’s a little bit on how to think about the XPT results in relation to other forecasts here (not much). Extrapolating from there to Samotsvety in particular:
  - Reasons to favour XPT (superforecaster) forecasts:
    Larger sample size
    The forecasts were incentivised (via reciprocal scoring, a bit more detail here)
    The most accurate XPT forecasters in terms of reciprocal scoring also gave the lowest probabilities on AI risk (and reciprocal scoring accuracy may correlate with actual accuracy)
  - Speculative reasons to favour Samotsvety forecasts:
    (Guessing) They’ve spent longer on average thinking about it
    (Guessing) They have deeper technical expertise than the XPT superforecasters
  I also haven’t looked in detail at the respective resolution criteria, but at first glance the forecasts also seem relatively hard to compare directly. (I agree with you though that the discrepancy is large enough that it suggests a large disagreement were the two groups to forecast the same question—just expect that it will be hard to work out how large.)