Fascinating results! I really appreciate the level of thought and precision you all put into the survey questions.
Were there any strong correlations between which of the five scenarios respondents considered more likely?
Survey results for Q2, Q1 (hover for spoilers):
OpenAI: ~21%, ~13%
FHI: ~27%, ~19%
DeepMind: (no respondents declared this affiliation)
CHAI/Berkeley: 39%, 39%
MIRI: 80%, 70%
Open Philanthropy: ~35%, ~16%
Some reasons I can imagine for focusing on 90+% loss scenarios:
You might just have the empirical view that very few things would cause ‘medium-sized’ losses of a lot of the future’s value. It could then be useful to define ‘existential risk’ to exclude medium-sized losses, so that when you talk about ‘x-risks’ people fully appreciate just how bad you think these outcomes would be.
‘Existential’ suggests a threat to the ‘existence’ of humanity, i.e., an outcome about as bad as human extinction. (Certainly a lot of EAs—myself included, when I first joined the community! -- misunderstand x-risk and think it’s equivalent to extinction risk.)
After googling a bit, I now think Nick Bostrom’s conception of existential risk (at least as of 2012) is similar to Toby’s. In https://www.existential-risk.org/concept.html, Nick divides up x-risks into the categories ”human extinction, permanent stagnation, flawed realization, and subsequent ruination”, and says that in a “flawed realization”, “humanity reaches technological maturity” but “the amount of value realized is but a small fraction of what could have been achieved”. This only makes sense as a partition of x-risks if all x-risks reduce value to “a small fraction of what could have been achieved” (or reduce the future’s value to zero).
I still think that the definition of x-risk I proposed is a bit more useful, and I think it’s a more natural interpretation of phrasings like “drastically curtail [Earth-originating intelligent life’s] potential” and “reduce its quality of life (compared to what would otherwise have been possible) permanently and drastically”. Perhaps I should use a new term, like hyperastronomical catastrophe, when I want to refer to something like ‘catastrophes that would reduce the total value of the future by 5% or more’.
Oh, your survey also frames the questions very differently, in a way that seems important to me. You give multiple-choice questions like :
Which of these is closest to your estimate of the probability that there will be an existential catastrophe due to AI (at any point in time)?0.0001%0.001%0.01%0.1%0.5%1%2%3%4%5%6%7%8%9%10%15%20%25%30%35%40%45%50%55%60%65%70%75%80%85%90%95%100%
Which of these is closest to your estimate of the probability that there will be an existential catastrophe due to AI (at any point in time)?
… whereas I just asked for a probability.
Overall, you give fourteen options for probabilities below 10%, and two options above 90%. (One of which is the dreaded-by-rationalists “100%”.)
By giving many fine gradations of ‘AI x-risk is low probability’ without giving as many gradations of ‘AI x-risk is high probability’, you’re communicating that low-probability answers are more normal/natural/expected.
The low probabilities are also listed first, which is a natural choice but could still have a priming effect. (Anchoring to 0.0001% and adjusting from that point, versus anchoring to 95%.) On my screen’s resolution, you have to scroll down three pages to even see numbers as high as 65% or 80%. I lean toward thinking ‘low probabilities listed first’ wasn’t a big factor, though.
My survey’s also a lot shorter than yours, so I could imagine it filtering for respondents who are busier, lazier, less interested in the topic, less interested in helping produce good survey data, etc.
I have sometimes wanted to draw a sharp distinction between scenarios where 90% of humans die vs. ones where 40% of humans die; but that’s largely because the risk of subsequent extinction or permanent civilizational collapse seems much higher to me in the 90% case. I don’t currently see a similar discontinuity in ’90% of the future lost vs. 40% of the future lost’, either in ‘the practical upshot of such loss’ or in ‘the kinds of scenarios that tend to cause such loss’. But I’ve also spent a lot less time about Toby thinking about the full range of x-risk scenarios.
Excited to have the full results of your survey released soon! :) I read a few paragraphs of it when you sent me a copy, though I haven’t read the full paper.
Your “probability of an existential catastrophe due to AI” got mean 0.23 and median 0.1. Notably, this includes misuse risk along with accident risk, so it’s especially striking that it’s lower than my survey’s Q2, “[risk from] AI systems not doing/optimizing what the people deploying them wanted/intended”, which got mean ~0.401 and median 0.3.
Looking at different subgroups’ answers to Q2:
MIRI: mean 0.8, median 0.7.
OpenAI: mean ~0.207, median 0.26. (A group that wasn’t in your survey.)
No affiliation specified: mean ~0.446, median 0.35. (Might or might not include MIRI people.)
All respondents other than ‘MIRI’ and ‘no affiliation specified’: mean 0.278, median 0.26.
Even the latter group is surprisingly high. A priori, I’d have expected that MIRI on its own would matter less than ‘the overall (non-MIRI) target populations are very different for the two surveys’:
My survey was sent to FHI, MIRI, DeepMind, CHAI, Open Phil, OpenAI, and ‘recent OpenAI’.
Your survey was sent to four of those groups (FHI, MIRI, CHAI, Open Phil), subtracting OpenAI, ‘recent OpenAI’, and DeepMind. Yours was also sent to CSER, Mila, Partnership on AI, CSET, CLR, FLI, AI Impacts, GCRI, and various independent researchers recommended by these groups. So your survey has fewer AI researchers, more small groups, and more groups that don’t have AGI/TAI as their top focus.
You attempted to restrict your survey to people “who have taken time to form their own views about existential risk from AI”, whereas I attempted to restrict to anyone “who researches long-term AI topics, or who has done a lot of past work on such topics”. So I’d naively expect my population to include more people who (e.g.) work on AI alignment but haven’t thought a bunch about risk forecasting; and I’d naively expect your population to include more people who have spent a day carefully crafting an AI x-risk prediction, but primarily work in biosecurity or some other area. That’s just a guess on my part, though.
Overall, your methods for choosing who to include seem super reasonable to me -- perhaps more natural than mine, even. Part of why I ran my survey was just the suspicion that there’s a lot of disagreement between orgs and between different types of AI safety researcher, such that it makes a large difference which groups we include. I’d be interested in an analysis of that question; eyeballing my chart, it looks to me like there is a fair amount of disagreement like that (even if we ignore MIRI).
People might also cluster more if we did the exact same survey again, but asking them to look at the first survey’s results.
Yeah, a big part of why I left the term vague is because I didn’t want people to get hung up on those details when many AGI catastrophe scenarios are extreme enough to swamp those details. E.g., focusing on whether the astronomical loss threshold is 80% vs. 50% is besides the point if you think AGI failure almost always means losing 98+% of the future’s value.
I might still do it differently if I could re-run the survey, however. It would be nice to have a number, so we could more easily do EV calculations.
Then perhaps it’s good that I didn’t include my nonstandard definition of x-risk, and we can expect the respondents to be at least somewhat closer to Ord’s definition.
I do find it odd to say that ’40% of the future’s value is lost’ isn’t an x-catastrophe, and in my own experience it’s much more common that I’ve wanted to draw a clear line between ’40% of the future is lost’ and ‘0.4% of the future is lost’, than between 90% and 40%. I’d be interested to hear about cases where Toby or others found it illuminating to sharply distinguish 90% and 40%.
I can imagine some longtermists thinking that getting 90% of the possible value is basically an existential win
What’s the definition of an “existential win”? I agree that this would be a win, and would involve us beating some existential risks that currently loom large. But I also think this would be an existential catastrophe. So if “win” means “zero x-catastrophes”, I wouldn’t call this a win.
Bostrom’s original definition of existential risk talked about things that “drastically curtail [the] potential” of “Earth-originating intelligent life”. Under that phrasing, I think losing 10% of our total potential qualifies.
I think you’re implicitly agreeing with my comment that losing 0.1% of the future is acceptable, but I’m unsure if this is endorsed.
?!? What does “acceptable” mean? Obviously losing 0.1% of the future’s value is very bad, and should be avoided if possible!!! But I’d be fine with saying that this isn’t quite an existential risk, by Bostrom’s original phrasing.
If you were to redo the survey for people like me, I’d have preferred a phrasing that says more like “a drastic reduction (>X%) of the future’s value.”
If you were to redo the survey for people like me, I’d have preferred a phrasing that says more like
“a drastic reduction (>X%) of the future’s value.”
Agreed, I’d probably have gone with a phrasing like that.
(I considered just saying “existential risk” without defining the term, but I worried that people sometimes conflate existential risk with things like “extinction risk” or “risk that we’ll lose the entire cosmic endowment”.)
I’d also be interested in hearing if others found this confusing. The intent was a large relative change in the future’s value—hence the word “overall”, and the mirroring of some language from Bostrom’s definition of existential risk. I also figured that this would be clear from the fact that the survey was called “Existential risk from AI” (and this title was visible to all survey respondents).
None of the respondents (and none of the people who looked at my drafts of the survey) expressed confusion about this, though someone could potentially misunderstand without commenting on it (e.g., because they didn’t notice there was another possible interpretation).
Example of why this is important: given the rate at which galaxies are receding from us, my understanding is that every day we delay colonizing the universe loses us hundreds of thousands of stars. Thinking on those scales, almost any tiny effect today can have enormous consequences in absolute terms. But the concept of existential risk correctly focuses our attention on the things that threaten a large fraction of the future’s value.
By the way, I find the “less than maximum potential” operationalizations to call for especially high probability estimates
Yeah, I deliberately steered clear of ‘less than maximum potential’ in the survey (with help from others’ feedback on my survey phrasing). Losing a galaxy is not, on its own, an existential catastrophe, because one galaxy is such a small portion of the cosmic endowment (even though it’s enormously important in absolute terms). In contrast, losing 10% of all reachable galaxies would be a clear existential catastrophe.
I don’t know the answer, though my initial guess would have been that (within the x-risk ecosystem) “Unusually ‘optimistic’ people being for some reason unusually likely to have given public, quantitative estimates before” is a large factor. I talked about this here. I’d guess the cause is some combination of:
There just aren’t many people giving public quantitative estimates, so noise can dominate.
Noise can also be magnified by social precedent; e.g., if the first person to give a public estimate happened to be an optimist by pure coincidence, that on its own might encourage other optimists to speak up more and pessimists less, which could then cascade.
For a variety of dangerous and novel things, if you say ‘this risk is low-probability, but still high enough to warrant concern’, you’re likelier to sound like a sober, skeptical scientist, while if you say ‘this risk is high-probability’, you’re likelier to sound like a doomsday-prophet crackpot. I think this is an important part of the social forces that caused many scientists and institutions to understate the risk of COVID in Jan/Feb 2020.
Causing an AI panic could have a lot of bad effects, such as (paradoxically) encouraging racing, or (less paradoxically) inspiring poorly-thought-out regulatory interventions. So there’s more reason to keep quiet if your estimates are likelier to panic others. (Again, this may have COVID parallels: I think people were super worried about causing panics at the outset of the pandemic, though I think this made a lot less sense in the case of COVID.)
This bullet point would also skew the survey optimistic, unless people give a lot of weight to ‘it’s much less of a big deal for me to give my pessimistic view here, since there will be a lot of other estimates in the mix’.
Alternatively, maybe pessimists mostly aren’t worried about starting a panic, but are worried about other people accusing them of starting a panic, so they’re more inclined to share their views when they can be anonymous?
Intellectuals in the world at large tend to assume a “default” view along the lines of ‘the status quo continues; things are pretty safe and stable; to the extent things aren’t safe or stable, it’s because of widely known risks with lots of precedent’. If you have a view that’s further from the default, you might be more reluctant to assert that view in public, because you expect more people to disagree, ask for elaborations and justifications, etc. Even if you’re happy to have others criticize and challenge your view, you might not want to put in the extra effort of responding to such criticisms or preemptively elaborating on your reasoning.
For various reasons, optimism about AI seems to correlate with optimism about public AI discourse. E.g., some people are optimists about AI outcomes in part because they think the world is more competent/coordinated/efficient/etc. overall; which could then make you expect fewer downsides and more upside from public discourse.
Of course, this is all looking at only one of several possible explanations for ‘the survey results here look more pessimistic than past public predictions by the x-risk community’. I focus on these to explain one of the reasons I expected to see an effect like this. (The bigger reason is just ‘I talked to people at various orgs over the years and kept getting this impression’.)
You’re very welcome! I was relying on the shapes to make things clear (circle = technical safety researcher in all charts, square = strategy researcher), but I’ve now added text to clarify.
Thanks for registering your predictions, Michael!
Predicted mean survey answer: 14%Predicted median survey answer: 6%
Predicted mean survey answer: 14%
Predicted median survey answer: 6%
Results (hover to read):
Mean answer for Q1 was ~30.1%, median answer 20%.
My impression is that people at MIRI would probably have a mean x-risk from AI estimate of ~50%, while people at the other places you mentioned would have a mean estimate of ~10% and a median of 8%.
Looking only at people who declared their affiliation: MIRI people’s mean probability for x-catastrophes from “AI systems not doing/optimizing what the people deploying them wanted/intended” was 80% (though I’m not sure this is what you mean by “x-risk from AI” here), with median 70%.
People who declared a non-MIRI affiliation had a mean Q2 probability of 27.8%, median 26%.
With (even) less confidence, I’d say people at MIRI would give a mean of 40% to question 1, and people elsewhere would give a mean of 7% and a median of 5%.
For Q1, MIRI-identified people gave mean 70% (and median 80%). Non-MIRI-identified people gave mean ~18.7%, median 10%.
I’m guessing MIRI people will be something like a quarter of your respondents.
5⁄27 of respondents who specified an affiliation said they work at MIRI (~19%). (By comparison, 17/~117 ~= 15% of recipients work at MIRI.)
I’ve added six prediction interfaces, for people to give their own probability for each Q, their guess at the mean survey respondent answer, and their guess at the median answer.
Though in the world where the credible range of estimates is 1-10%, and 80% of the field believed the probability were >10% (my prediction from upthread), that would start to get into ‘something’s seriously wrong with the field’ territory from my perspective; that’s not a small disagreement.
(I’m assuming here, as I did when I made my original prediction, that they aren’t all clustered around 15% or whatever; rather, I’d have expected a lot of the field to give a much higher probability than 10%.)