Excited to have the full results of your survey released soon! :) I read a few paragraphs of it when you sent me a copy, though I haven’t read the full paper.
Your “probability of an existential catastrophe due to AI” got mean 0.23 and median 0.1. Notably, this includes misuse risk along with accident risk, so it’s especially striking that it’s lower than my survey’s Q2, “[risk from] AI systems not doing/optimizing what the people deploying them wanted/intended”, which got mean ~0.401 and median 0.3.
Looking at different subgroups’ answers to Q2:
MIRI: mean 0.8, median 0.7.
OpenAI: mean ~0.207, median 0.26. (A group that wasn’t in your survey.)
No affiliation specified: mean ~0.446, median 0.35. (Might or might not include MIRI people.)
All respondents other than ‘MIRI’ and ‘no affiliation specified’: mean 0.278, median 0.26.
Even the latter group is surprisingly high. A priori, I’d have expected that MIRI on its own would matter less than ‘the overall (non-MIRI) target populations are very different for the two surveys’:
My survey was sent to FHI, MIRI, DeepMind, CHAI, Open Phil, OpenAI, and ‘recent OpenAI’.
Your survey was sent to four of those groups (FHI, MIRI, CHAI, Open Phil), subtracting OpenAI, ‘recent OpenAI’, and DeepMind. Yours was also sent to CSER, Mila, Partnership on AI, CSET, CLR, FLI, AI Impacts, GCRI, and various independent researchers recommended by these groups. So your survey has fewer AI researchers, more small groups, and more groups that don’t have AGI/TAI as their top focus.
You attempted to restrict your survey to people “who have taken time to form their own views about existential risk from AI”, whereas I attempted to restrict to anyone “who researches long-term AI topics, or who has done a lot of past work on such topics”. So I’d naively expect my population to include more people who (e.g.) work on AI alignment but haven’t thought a bunch about risk forecasting; and I’d naively expect your population to include more people who have spent a day carefully crafting an AI x-risk prediction, but primarily work in biosecurity or some other area. That’s just a guess on my part, though.
Overall, your methods for choosing who to include seem super reasonable to me -- perhaps more natural than mine, even. Part of why I ran my survey was just the suspicion that there’s a lot of disagreement between orgs and between different types of AI safety researcher, such that it makes a large difference which groups we include. I’d be interested in an analysis of that question; eyeballing my chart, it looks to me like there is a fair amount of disagreement like that (even if we ignore MIRI).
Oh, your survey also frames the questions very differently, in a way that seems important to me. You give multiple-choice questions like :
Which of these is closest to your estimate of the probability that there will be an existential catastrophe due to AI (at any point in time)?
0.0001%
0.001%
0.01%
0.1%
0.5%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
15%
20%
25%
30%
35%
40%
45%
50%
55%
60%
65%
70%
75%
80%
85%
90%
95%
100%
… whereas I just asked for a probability.
Overall, you give fourteen options for probabilities below 10%, and two options above 90%. (One of which is the dreaded-by-rationalists “100%”.)
By giving many fine gradations of ‘AI x-risk is low probability’ without giving as many gradations of ‘AI x-risk is high probability’, you’re communicating that low-probability answers are more normal/natural/expected.
The low probabilities are also listed first, which is a natural choice but could still have a priming effect. (Anchoring to 0.0001% and adjusting from that point, versus anchoring to 95%.) On my screen’s resolution, you have to scroll down three pages to even see numbers as high as 65% or 80%. I lean toward thinking ‘low probabilities listed first’ wasn’t a big factor, though.
My survey’s also a lot shorter than yours, so I could imagine it filtering for respondents who are busier, lazier, less interested in the topic, less interested in helping produce good survey data, etc.
Excited to have the full results of your survey released soon! :) I read a few paragraphs of it when you sent me a copy, though I haven’t read the full paper.
Your “probability of an existential catastrophe due to AI” got mean 0.23 and median 0.1. Notably, this includes misuse risk along with accident risk, so it’s especially striking that it’s lower than my survey’s Q2, “[risk from] AI systems not doing/optimizing what the people deploying them wanted/intended”, which got mean ~0.401 and median 0.3.
Looking at different subgroups’ answers to Q2:
MIRI: mean 0.8, median 0.7.
OpenAI: mean ~0.207, median 0.26. (A group that wasn’t in your survey.)
No affiliation specified: mean ~0.446, median 0.35. (Might or might not include MIRI people.)
All respondents other than ‘MIRI’ and ‘no affiliation specified’: mean 0.278, median 0.26.
Even the latter group is surprisingly high. A priori, I’d have expected that MIRI on its own would matter less than ‘the overall (non-MIRI) target populations are very different for the two surveys’:
My survey was sent to FHI, MIRI, DeepMind, CHAI, Open Phil, OpenAI, and ‘recent OpenAI’.
Your survey was sent to four of those groups (FHI, MIRI, CHAI, Open Phil), subtracting OpenAI, ‘recent OpenAI’, and DeepMind. Yours was also sent to CSER, Mila, Partnership on AI, CSET, CLR, FLI, AI Impacts, GCRI, and various independent researchers recommended by these groups. So your survey has fewer AI researchers, more small groups, and more groups that don’t have AGI/TAI as their top focus.
You attempted to restrict your survey to people “who have taken time to form their own views about existential risk from AI”, whereas I attempted to restrict to anyone “who researches long-term AI topics, or who has done a lot of past work on such topics”. So I’d naively expect my population to include more people who (e.g.) work on AI alignment but haven’t thought a bunch about risk forecasting; and I’d naively expect your population to include more people who have spent a day carefully crafting an AI x-risk prediction, but primarily work in biosecurity or some other area. That’s just a guess on my part, though.
Overall, your methods for choosing who to include seem super reasonable to me -- perhaps more natural than mine, even. Part of why I ran my survey was just the suspicion that there’s a lot of disagreement between orgs and between different types of AI safety researcher, such that it makes a large difference which groups we include. I’d be interested in an analysis of that question; eyeballing my chart, it looks to me like there is a fair amount of disagreement like that (even if we ignore MIRI).
Oh, your survey also frames the questions very differently, in a way that seems important to me. You give multiple-choice questions like :
… whereas I just asked for a probability.
Overall, you give fourteen options for probabilities below 10%, and two options above 90%. (One of which is the dreaded-by-rationalists “100%”.)
By giving many fine gradations of ‘AI x-risk is low probability’ without giving as many gradations of ‘AI x-risk is high probability’, you’re communicating that low-probability answers are more normal/natural/expected.
The low probabilities are also listed first, which is a natural choice but could still have a priming effect. (Anchoring to 0.0001% and adjusting from that point, versus anchoring to 95%.) On my screen’s resolution, you have to scroll down three pages to even see numbers as high as 65% or 80%. I lean toward thinking ‘low probabilities listed first’ wasn’t a big factor, though.
My survey’s also a lot shorter than yours, so I could imagine it filtering for respondents who are busier, lazier, less interested in the topic, less interested in helping produce good survey data, etc.