One potential reason for the observed difference in expert and superforecaster estimates: even though they’re nominally participating in the same tournament, for the experts, this is a much stranger choice than it is for the superforecasters, who presumably have already built up an identity where it makes sense to spend a ton of time and deep thought on a forecasting tournament, on top of your day job and other life commitments. I think there’s some evidence for this in the dropout rates, which were 19% for the superforecasters but 51% (!) for the experts, suggesting that experts were especially likely to second-guess their decision to participate. (Also, see the discussion in Appendix 1 of the difficulties in recruiting experts—it seems like it was pretty hard to find non-superforecasters who were willing to commit to a project like this.)
So, the subset of experts who take the leap and participate in the study anyway are selected for something like “openness to unorthodox decisions/beliefs,” roughly equivalent to the Big Five personality trait of openness (or other related traits). I’d guess that each participant’s level of openness is a major driver (maybe even the largest driver?) of whether they accept or dismiss arguments for 21st-century x-risk, especially from AI.
Ways you could test this:
Test the big five personality traits of all participants. My guess is that the experts would have a higher average openness than the superforcasters—but the difference would be even greater if comparing the average openness of the “AI-concerned” group (highest) to the “AI skeptics” (lowest). These personality-level differences seem to match well with the groups’ object-level disagreements on AI risk, which mostly didn’t center on timelines and instead centered on disagreements about whether to take the inside or outside view on AI.
I’d also expect the “AI-concerned” to have higher neuroticism than the “AI skeptics,” since I think high/low neuroticism maps closely to something like a strong global prior that the world is/isn’t dangerous. This might explain the otherwise strange finding that “although the biggest area of long-run disagreement was the probability of extinction due to AI, there were surprisingly high levels of agreement on 45 shorter-run indicators when comparing forecasters most and least concerned about AI risk.”
When trying to compare the experts and superforecasters to the general population, don’t rely on a poll of random people, since completing a poll is much less weird than participating in a forecasting tournament. Instead, try to recruit a third group of “normal” people who are neither experts nor superforecasters, but have a similar opportunity cost for their time, to participate in the tournament. For example, you might target faculty and PhD candidates at US universities working on non-x-risk topics. My guess is that the subset of people in this population who decide “sure, why not, I’ll sign up to spend many hours of my life rigorously arguing with strangers about the end of the world” would be pretty high on openness, and thus pretty likely to predict high rates of x-risk.
I bring all this up in part because, although Appendix 1 includes a caveat that “those who signed up cannot be claimed to be a representative of [x-risk] experts in each of these fields,” I don’t think there was discussion of specific ways they are likely to be non-representative. I expect most people to forget about this caveat when drawing conclusions from this work, and instead conclude there must be generalizable differences between superforecaster and expert views on x-risk.
Also, I think it would be genuinely valuable to learn the extent to which personality differences do or don’t drive differences in long-term x-risk assessments in such a highly analytical environment with strong incentives for accuracy. If personality differences really are a large part of the picture, it might help resolve the questions presented at the end of the abstract:
“The most pressing practical question for future work is: why were superforecasters so unmoved by experts’ much higher estimates of AI extinction risk, and why were experts so unmoved by the superforecasters’ lower estimates? The most puzzling scientific question is: why did rational forecasters, incentivized by the XPT to persuade each other, not converge after months of debate and the exchange of millions of words and thousands of forecasts?”
Fwiw, despite the tournmant feeling like a drag at points, I think I kept at it due to a mix of: a) I committed to it and wanted to fulfill the committment (which I suppose is conscientiousness), b) me generally strongly sharing the motivations for having more forecasting, and c) having the money as a reward for good performance and for just keeping at it.
One potential reason for the observed difference in expert and superforecaster estimates: even though they’re nominally participating in the same tournament, for the experts, this is a much stranger choice than it is for the superforecasters, who presumably have already built up an identity where it makes sense to spend a ton of time and deep thought on a forecasting tournament, on top of your day job and other life commitments. I think there’s some evidence for this in the dropout rates, which were 19% for the superforecasters but 51% (!) for the experts, suggesting that experts were especially likely to second-guess their decision to participate. (Also, see the discussion in Appendix 1 of the difficulties in recruiting experts—it seems like it was pretty hard to find non-superforecasters who were willing to commit to a project like this.)
So, the subset of experts who take the leap and participate in the study anyway are selected for something like “openness to unorthodox decisions/beliefs,” roughly equivalent to the Big Five personality trait of openness (or other related traits). I’d guess that each participant’s level of openness is a major driver (maybe even the largest driver?) of whether they accept or dismiss arguments for 21st-century x-risk, especially from AI.
Ways you could test this:
Test the big five personality traits of all participants. My guess is that the experts would have a higher average openness than the superforcasters—but the difference would be even greater if comparing the average openness of the “AI-concerned” group (highest) to the “AI skeptics” (lowest). These personality-level differences seem to match well with the groups’ object-level disagreements on AI risk, which mostly didn’t center on timelines and instead centered on disagreements about whether to take the inside or outside view on AI.
I’d also expect the “AI-concerned” to have higher neuroticism than the “AI skeptics,” since I think high/low neuroticism maps closely to something like a strong global prior that the world is/isn’t dangerous. This might explain the otherwise strange finding that “although the biggest area of long-run disagreement was the probability of extinction due to AI, there were surprisingly high levels of agreement on 45 shorter-run indicators when comparing forecasters most and least concerned about AI risk.”
When trying to compare the experts and superforecasters to the general population, don’t rely on a poll of random people, since completing a poll is much less weird than participating in a forecasting tournament. Instead, try to recruit a third group of “normal” people who are neither experts nor superforecasters, but have a similar opportunity cost for their time, to participate in the tournament. For example, you might target faculty and PhD candidates at US universities working on non-x-risk topics. My guess is that the subset of people in this population who decide “sure, why not, I’ll sign up to spend many hours of my life rigorously arguing with strangers about the end of the world” would be pretty high on openness, and thus pretty likely to predict high rates of x-risk.
I bring all this up in part because, although Appendix 1 includes a caveat that “those who signed up cannot be claimed to be a representative of [x-risk] experts in each of these fields,” I don’t think there was discussion of specific ways they are likely to be non-representative. I expect most people to forget about this caveat when drawing conclusions from this work, and instead conclude there must be generalizable differences between superforecaster and expert views on x-risk.
Also, I think it would be genuinely valuable to learn the extent to which personality differences do or don’t drive differences in long-term x-risk assessments in such a highly analytical environment with strong incentives for accuracy. If personality differences really are a large part of the picture, it might help resolve the questions presented at the end of the abstract:
Fwiw, despite the tournmant feeling like a drag at points, I think I kept at it due to a mix of:
a) I committed to it and wanted to fulfill the committment (which I suppose is conscientiousness),
b) me generally strongly sharing the motivations for having more forecasting, and
c) having the money as a reward for good performance and for just keeping at it.