Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
I just wanted to give major kudos for evaluating a prediction you made and very publicly sharing the results even though they were not fully in line with your prediction.
Thanks for sharing this! I’ve now added these results to my database of existential risk estimates.
What follows are some reflections on similarities or differences between these survey results and the other estimates I’ve collected in that database. But really most estimates aren’t directly comparable, since people use different operationalisations, have different caveats, etc. I’ll mostly fail to mention these issues and just compare the numbers anyway. Take this with oodles of salt, and check the database for details.
These estimates seem notably higher than:
Carlsmith’s estimate (though he’s only talking about before 2070 and about one type of AI x-risk scenario)
Ord’s estimate (though he was focusing on the next 100 years and thought TAI/AGI has a substantial chance of coming later)
my estimate
the GCR conference’s estimate (that focused just on extinction and just on “superintelligent AI”, but I’m guessing they would’ve at that time thought that was >50% of the risk, so their full x-risk number would be similar)
the Grace et al. survey estimate
Pamlin & Armstrong’s estimate
Garfinkel’s estimate
Shah’s estimate (though that’s only for “adversarial optimization against humans”)
Fodor’s estimate
Christiano’s estimate (though that focuses on only one type of alignment failure)
These numbers seem a little higher than Cotra’s estimate
These numbers are similar to or a little lower than with Armstrong’s estimate
These numbers are lower than Shlegeris’s estimate and Tallinn’s estimate
I find it surprising that this survey’s results are not near the middle of the distribution of previous estimates. (Specifically, they’re approximately the fourth highest out of approximately 15 estimates—but I would’ve been similarly surprised by them being approximately the fourth lowest.)
(This surprise is reflected in my predicted survey results being substantially lower than the real results, though I did at least correctly predict that the estimates from MIRI people would be much higher than other people’s estimates.)
Part of why I find this surprising is the facts that:
This was a survey of 44 people, not just one new person giving their estimate, so you’d think the mean and median results would be similar to the mean and median from previous estimates
Most of the previous estimates are from people working at the same orgs this survey was sent to, so you’d think this is sampling from a similar population
Do you have any thoughts on where this difference might come from?
Some possibilities:
Different operationalisations
Things like the fact that many of the previous estimates were about the next 100 years or things like that, and maybe the people just thought TAI was unlikely to be developed in that time (but in some cases, I know the people don’t think that’s very unlikely)
Random noise
Your survey for some reason attracting unusually “pessimistic” people
Unusually “optimistic” people being for some reason unusually likely to have given public, quantitative estimates before
None of those thing seem likely to explain this size and direction of difference, though, so I currently still feel confused.
I find it plausible that there’s some perceived pressure to not give unreasonably-high-seeming probabilities in public, so as to not seem weird (as Rob hypothesized in the discussion here, which inspired this survey). This could manifest both as “unusually ‘optimistic’ people being unusually likely to give public, quantitative estimates” and “people being prone to downplay their estimates when they’re put into the spotlight.”
Personally, I’ve noticed the latter effect a couple of times when I was talking to people who I thought would be turned off by high probabilities for TAI. I didn’t do it on purpose, but after two conversations I noticed that the probabilities I gave for TAI in 10 years, or things similar to that, seemed uncharacteristically low for me. (I think it’s natural for probabilities estimates to fluctuate between elicitation attempts, but if the trend is quite strong and systematically goes in one direction, then that’s an indicator of some type of bias.)
I also remember that I felt a little uneasy about giving my genuine probabilities in a survey of alignment- and longtermist-strategy researchers in August 2020 (by an FHI research scholar), out of concerns of making myself or the community seem a bit weird. I gave my true probabilities anyway (I think it was anonymized), but I felt a bit odd for thinking that I was giving 65% to things that I expected a bunch of reputable EAs to only give 10% to. (IIRC, the survey questions were quite similar to the wording in this post.)
(By the way, I find the “less than maximum potential” operationalizations to call for especially high probability estimates, since it’s just a priori unlikely that humans set things up in perfect ways, and I do think that small differences in the setup can have huge effects on the future. Maybe that’s an underappreciated crux between researchers – which could also include some normative subcruxes.)
Thanks, this and Rob’s comment are interesting.
But I think these explanations would predict “public, attributed estimates will tend to be lower than estimates from anonymised surveys (e.g., this one) and/or nonpublic estimates”. But that’s actually not the data we’re observing. There were 3 previous anonymised surveys (from the 2008 GCR conference, Grace et al., and that 2020 survey you mention), and each had notably lower mean/median estimates for somewhat similar questions than this survey does.[1]
Maybe the theory could be “well, that part was just random noise—it’s just 4 surveys, so it’s not that surprising for this one to just happen to give the highest estimate—and then the rest is because people are inclined against giving high estimates when it’ll be public and attributed to them”.
But that has a slight epicycle/post-hoc-reasoning flavour. Especially because, similar to the points I raised above:
each survey had a decent number of participants (so you’d think the mean/medians would be close to the means/medians the relevant populations as a whole would give)
the surveys were sampling from somewhat similar populations (most clearly for the 2020 survey and this one, and less so for the 2008 one—due to a big time gap—and the Grace et al. one)
So this still seems pretty confusing to me.
I’m inclined to think the best explanation would be that there’s something distinctive about this survey that meant either people with high estimates were overrepresented or people were more inclined than they’d usually be to give high estimates.[2] But I’m not sure what that something would be, aside from Rob’s suggestions that “respondents who were following the forum discussion might have been anchored in some way by that discussion, or might have had a social desirability effect from knowing that the survey-writer puts high probability on AI risk. It might also have made a difference that I work at MIRI.” But I wouldn’t have predicted in advance that those things would have as big an effect as seems to be happening here.
I guess it could just be a combination of three sets of small effects (noise, publicity/attribution selecting for lower estimates, and people being influenced by knowing this survey was from Rob).
[1] One notable difference is that the GCR conference attendees were just estimating human extinction by 2100 as a result of “superintelligent AI”. Maybe they thought that only accounted for less than 25% of total x-risk from AI (because there could also be later, non-extinction, or non-superintelligence x-risks from AI). But that seems unlikely to me, based on my rough impression of what GCR researchers around 2008 tended to focus on.
[2] I don’t think the reason I’m inclined to think this is trying to defend my previous prediction about the survey results or wanting a more optimistic picture of the future. That’s of course possible, but seems unlikely, and there are similarly plausibly biases that could push me in the opposite direction (e.g., I’ve done some AI-related work in the past and will likely do more in future, so higher estimates make my work seem more important.)
I mostly just consider the FHI research scholar survey to be relevant counter evidence here because 2008 is indeed really far away and because I think EA researchers reason quite differently than the domain experts in the Grace et al survey.
When I posted my above comment I realized that I hadn’t seen the results of the FHI survey! I’d have to look it up to say more, but one hypothesis I already have could be: The FHI research scholars survey was sent to a broader audience than the one by Rob now (e.g., it was sent to me, and some of my former colleagues), and people with lower levels of expertise tend to defer more to what they consider to be the expert consensus , which might itself be affected by the possibility of public-facing biases.
Of course, I’m also just trying to defend my initial intuition here. :)
Edit: Actually I can’t find the results of that FHI RS survey. I only find this announcement. I’d be curious if anyone knows more about the results of that survey – when I filled it out I thought it was well designed and I felt quite curious about people’s answers!
I helped run the other survey mentioned , so I’ll jump in here with the relevant results and my explanation for the difference. The full results will be coming out this week.
Results
We asked participants to estimate the probability of an existential catastrophe due to AI (see definitions below). We got:
mean: 0.23
median: 0.1
Our question isn’t directly comparable with Rob’s, because we don’t condition on the catastrophe being “as a result of humanity not doing enough technical AI safety research” or “as a result of AI systems not doing/optimizing what the people deploying them wanted/intended”. However, that means that our results should be even higher than Rob’s.
Also, we operationalise existential catastophe/risk differently, though I think the operationalisations are similar to the point that they wouldn’t effect my estimate. Nonetheless:
it’s possible that some respondents mistook “existential catastrophe” for “extinction” in our survey, despite our clarifications (survey respondents often don’t read the clarifications!)
while “the overall value of the future will be drastically less than it could have been” and “existential catastrophe” are intended to be basically the same, the former intuitively “sounds” more likely than the latter, which might have affected some responses.
My explanation
I think it’s probably a combination of things, including this difference in operationalisation, random noise, and Rob’s suggestion that “respondents who were following the forum discussion might have been anchored in some way by that discussion, or might have had a social desirability effect from knowing that the survey-writer puts high probability on AI risk. It might also have made a difference that I work at MIRI.”
I can add a bit more detail to how it might have made a difference that Rob works at MIRI:
In Rob’s survey, 5⁄27 of respondents who specified an affiliation said they work at MIRI (~19%)
In our survey, 1⁄43 respondents who specified an affiliation said they work at MIRI (~2%)
(Rob’s survey had 44 respondents in total, ours had 75)
Definitions from our survey
Other results from our survey
We also asked participants to estimate the probability of an existential catastrophe due to AI under two other conditions.
Within the next 50 years
mean: 0.12
median: 0.05
In a counterfactual world where AI safety and governance receive no further investment or work from people aligned with the ideas of “longtermism”, “effective altruism” or “rationality” (but there are no other important changes between this counterfactual world and our world, e.g. changes in our beliefs about the importance and tractability of AI risk issues).
mean: 0.32
median: 0.25
Excited to have the full results of your survey released soon! :) I read a few paragraphs of it when you sent me a copy, though I haven’t read the full paper.
Your “probability of an existential catastrophe due to AI” got mean 0.23 and median 0.1. Notably, this includes misuse risk along with accident risk, so it’s especially striking that it’s lower than my survey’s Q2, “[risk from] AI systems not doing/optimizing what the people deploying them wanted/intended”, which got mean ~0.401 and median 0.3.
Looking at different subgroups’ answers to Q2:
MIRI: mean 0.8, median 0.7.
OpenAI: mean ~0.207, median 0.26. (A group that wasn’t in your survey.)
No affiliation specified: mean ~0.446, median 0.35. (Might or might not include MIRI people.)
All respondents other than ‘MIRI’ and ‘no affiliation specified’: mean 0.278, median 0.26.
Even the latter group is surprisingly high. A priori, I’d have expected that MIRI on its own would matter less than ‘the overall (non-MIRI) target populations are very different for the two surveys’:
My survey was sent to FHI, MIRI, DeepMind, CHAI, Open Phil, OpenAI, and ‘recent OpenAI’.
Your survey was sent to four of those groups (FHI, MIRI, CHAI, Open Phil), subtracting OpenAI, ‘recent OpenAI’, and DeepMind. Yours was also sent to CSER, Mila, Partnership on AI, CSET, CLR, FLI, AI Impacts, GCRI, and various independent researchers recommended by these groups. So your survey has fewer AI researchers, more small groups, and more groups that don’t have AGI/TAI as their top focus.
You attempted to restrict your survey to people “who have taken time to form their own views about existential risk from AI”, whereas I attempted to restrict to anyone “who researches long-term AI topics, or who has done a lot of past work on such topics”. So I’d naively expect my population to include more people who (e.g.) work on AI alignment but haven’t thought a bunch about risk forecasting; and I’d naively expect your population to include more people who have spent a day carefully crafting an AI x-risk prediction, but primarily work in biosecurity or some other area. That’s just a guess on my part, though.
Overall, your methods for choosing who to include seem super reasonable to me -- perhaps more natural than mine, even. Part of why I ran my survey was just the suspicion that there’s a lot of disagreement between orgs and between different types of AI safety researcher, such that it makes a large difference which groups we include. I’d be interested in an analysis of that question; eyeballing my chart, it looks to me like there is a fair amount of disagreement like that (even if we ignore MIRI).
Oh, your survey also frames the questions very differently, in a way that seems important to me. You give multiple-choice questions like :
… whereas I just asked for a probability.
Overall, you give fourteen options for probabilities below 10%, and two options above 90%. (One of which is the dreaded-by-rationalists “100%”.)
By giving many fine gradations of ‘AI x-risk is low probability’ without giving as many gradations of ‘AI x-risk is high probability’, you’re communicating that low-probability answers are more normal/natural/expected.
The low probabilities are also listed first, which is a natural choice but could still have a priming effect. (Anchoring to 0.0001% and adjusting from that point, versus anchoring to 95%.) On my screen’s resolution, you have to scroll down three pages to even see numbers as high as 65% or 80%. I lean toward thinking ‘low probabilities listed first’ wasn’t a big factor, though.
My survey’s also a lot shorter than yours, so I could imagine it filtering for respondents who are busier, lazier, less interested in the topic, less interested in helping produce good survey data, etc.
Yeah, I deliberately steered clear of ‘less than maximum potential’ in the survey (with help from others’ feedback on my survey phrasing). Losing a galaxy is not, on its own, an existential catastrophe, because one galaxy is such a small portion of the cosmic endowment (even though it’s enormously important in absolute terms). In contrast, losing 10% of all reachable galaxies would be a clear existential catastrophe.
I don’t know the answer, though my initial guess would have been that (within the x-risk ecosystem) “Unusually ‘optimistic’ people being for some reason unusually likely to have given public, quantitative estimates before” is a large factor. I talked about this here. I’d guess the cause is some combination of:
There just aren’t many people giving public quantitative estimates, so noise can dominate.
Noise can also be magnified by social precedent; e.g., if the first person to give a public estimate happened to be an optimist by pure coincidence, that on its own might encourage other optimists to speak up more and pessimists less, which could then cascade.
For a variety of dangerous and novel things, if you say ‘this risk is low-probability, but still high enough to warrant concern’, you’re likelier to sound like a sober, skeptical scientist, while if you say ‘this risk is high-probability’, you’re likelier to sound like a doomsday-prophet crackpot. I think this is an important part of the social forces that caused many scientists and institutions to understate the risk of COVID in Jan/Feb 2020.
Causing an AI panic could have a lot of bad effects, such as (paradoxically) encouraging racing, or (less paradoxically) inspiring poorly-thought-out regulatory interventions. So there’s more reason to keep quiet if your estimates are likelier to panic others. (Again, this may have COVID parallels: I think people were super worried about causing panics at the outset of the pandemic, though I think this made a lot less sense in the case of COVID.)
This bullet point would also skew the survey optimistic, unless people give a lot of weight to ‘it’s much less of a big deal for me to give my pessimistic view here, since there will be a lot of other estimates in the mix’.
Alternatively, maybe pessimists mostly aren’t worried about starting a panic, but are worried about other people accusing them of starting a panic, so they’re more inclined to share their views when they can be anonymous?
Intellectuals in the world at large tend to assume a “default” view along the lines of ‘the status quo continues; things are pretty safe and stable; to the extent things aren’t safe or stable, it’s because of widely known risks with lots of precedent’. If you have a view that’s further from the default, you might be more reluctant to assert that view in public, because you expect more people to disagree, ask for elaborations and justifications, etc. Even if you’re happy to have others criticize and challenge your view, you might not want to put in the extra effort of responding to such criticisms or preemptively elaborating on your reasoning.
For various reasons, optimism about AI seems to correlate with optimism about public AI discourse. E.g., some people are optimists about AI outcomes in part because they think the world is more competent/coordinated/efficient/etc. overall; which could then make you expect fewer downsides and more upside from public discourse.
Of course, this is all looking at only one of several possible explanations for ‘the survey results here look more pessimistic than past public predictions by the x-risk community’. I focus on these to explain one of the reasons I expected to see an effect like this. (The bigger reason is just ‘I talked to people at various orgs over the years and kept getting this impression’.)
The Elicit predictions:
Actual: mean: (~0.3, ~0.4), median: (0.2, 0.3)
elifland: mean: (.23, .33), median: (.15, .25)
WilliamKiely: mean: (0.18, 0.45), median: (0.1, 0.3)
Ben Pace: mean: (0.15, 0.17), median: (.15, .17)
bsokolowsky: mean: (0.69, .40), median: (.64, .37)
MichaelA: mean: (0.14, N/A), median: (0.06, N/A)
SamClarke: mean (N/A, N/A), median: (0.05, 0.05)
Scoring Elicit predictions by margin of error, we get:
elifland: mean: (-.07, -.07), median: (-.05, -.05) - Sum of abs.value(errors): 0.24
WilliamKiely: mean: (-0.12, 0.05), median: (-.1, 0) - Sum of errors: 0.27
Ben Pace: mean: (-0.15, 0.23), median: (-.1, 15) - Sum of errors: 0.63
bsokolowsky: mean: (0.39, 0), median: (.44, .07) - Sum of errors: 0.90
MichaelA: mean: (-0.16, N/A), median: (-.24, N/A) - Sum of two errors: 0.40 (only Q1)
SamClarke: mean (N/A, N/A), median: (-.15,-0.25) - Sum of two errors: 0.40 (medians only)
In retrospect, my forecast that the median response to the first question would be as low as 10% was too ambitious. That would have been surprisingly low for a median.
I think my other forecasts were good. My 18% mean on Q1 was so low only because my median was low. Interestingly my own answer for Q1 was 20%, which was exactly the median response. I forget why I thought the mean and median answer would be lower than mine.
Critiques aside, thanks a lot for doing this survey! :) I expect to find it moderately helpful for improving my own estimates.
Thanks very much for sharing this! Very interesting. I’m sure I will refer back to this article in the future.
One quick question: when you have the two charts—“Separating out the technical safety researchers and the strategy researchers”—could you make explicit which is which? It’s possible to work it out based on the colour of the dots if you try of course.
You’re very welcome! I was relying on the shapes to make things clear (circle = technical safety researcher in all charts, square = strategy researcher), but I’ve now added text to clarify.
The wide spread in responses is surprising to me. Perhaps future surveys like this should ask people what their inside view is and what their all-things-considered view is. My suspicion/prediction would be that doing that would yield all-things-considered views closer together.
People might also cluster more if we did the exact same survey again, but asking them to look at the first survey’s results.
It was unclear to me upon several rereads whether “drastically less” is meant to be interpreted in relative terms (intuitive notions of goodness that looks more like a ratio) or absolute terms (fully taking into account astronomical waste arguments). If the former, this means that eg. 99.9% as good as it could’ve been is still a pretty solid future, and would resolve “no.” If the later, 0.1% of approximately infinity is still approximately infinity.
Would be interested if other people had the same confusion, or if I’m somehow uniquely confused here.
I’d also be interested in hearing if others found this confusing. The intent was a large relative change in the future’s value—hence the word “overall”, and the mirroring of some language from Bostrom’s definition of existential risk. I also figured that this would be clear from the fact that the survey was called “Existential risk from AI” (and this title was visible to all survey respondents).
None of the respondents (and none of the people who looked at my drafts of the survey) expressed confusion about this, though someone could potentially misunderstand without commenting on it (e.g., because they didn’t notice there was another possible interpretation).
Example of why this is important: given the rate at which galaxies are receding from us, my understanding is that every day we delay colonizing the universe loses us hundreds of thousands of stars. Thinking on those scales, almost any tiny effect today can have enormous consequences in absolute terms. But the concept of existential risk correctly focuses our attention on the things that threaten a large fraction of the future’s value.
Sure but how large is large? You said in a different comment that losing 10% of the future is too high/an existential catastrophe, which I think is already debatable (I can imagine some longtermists thinking that getting 90% of the possible value is basically an existential win, and some of the survey respondents thinking that drastic reduction actually means more like 30%+ or 50%+). I think you’re implicitly agreeing with my comment that losing 0.1% of the future is acceptable, but I’m unsure if this is endorsed.
If you were to redo the survey for people like me, I’d have preferred a phrasing that says more like
Or alternatively, instead of asking for probabilities,
> What’s the expected fraction of the future’s value that would be lost?
Though since a) nobody else raised the same issue I did, and b) I’m not a technical AI safety or strategy researcher, and thus outside of your target audience, so this might all be a moot point.
What’s the definition of an “existential win”? I agree that this would be a win, and would involve us beating some existential risks that currently loom large. But I also think this would be an existential catastrophe. So if “win” means “zero x-catastrophes”, I wouldn’t call this a win.
Bostrom’s original definition of existential risk talked about things that “drastically curtail [the] potential” of “Earth-originating intelligent life”. Under that phrasing, I think losing 10% of our total potential qualifies.
?!? What does “acceptable” mean? Obviously losing 0.1% of the future’s value is very bad, and should be avoided if possible!!! But I’d be fine with saying that this isn’t quite an existential risk, by Bostrom’s original phrasing.
Agreed, I’d probably have gone with a phrasing like that.
So I reskimmed the paper, and FWIW, Bostrom’s original phrasing doesn’t seem obviously sensitive to 2 orders of magnitude by my reading of it. “drastically curtail” feels more like poetic language than setting up clear boundaries.
He does have some lower bounds:
> However, the true lesson is a different one. If what we are concerned with is (something like) maximizing the expected number of worthwhile lives that we will create, then in addition to the opportunity cost of delayed colonization, we have to take into account the risk of failure to colonize at all. We might fall victim to an existential risk, one where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.[8] Because the lifespan of galaxies is measured in billions of years, whereas the time-scale of any delays that we could realistically affect would rather be measured in years or decades, the consideration of risk trumps the consideration of opportunity cost. For example, a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility point-of-view) a delay of over 10 million years. (LZ: I was unable to make this section quote-text)
Taking “decades” conservatively to mean “at most ten decades”, this would suggest that something equivalent to a delay of ten decade (100 years) probably does not count as an existential catastrophe. However, this is a lower bound of 100⁄10 million * 1%, or 10^-7, far smaller than the 10^-3 I mentioned upthread.
(I agree that “acceptable” is sloppy language on my end, and losing 0.1% of the future’s value is very bad.)
(I considered just saying “existential risk” without defining the term, but I worried that people sometimes conflate existential risk with things like “extinction risk” or “risk that we’ll lose the entire cosmic endowment”.)
I agree that “existential risk” without defining the term would be much worse. It might have a technical definition within longtermism philosophy, but I don’t think the term has the exact same meaning as broadly understood by EAs.
Unfortunately, even the technical definition relies on the words “destroys” or “drastically curtails”, which leaves room for interpretation. I would guess that most people interpret those things as “destroys the vast majority [of our potential]”, e.g. reduces the EV of the future to 10% of what it could’ve been or lower. But it sounds like Rob interprets it as reduces the EV by at least 10%, which I would’ve called an example of a non-existential trajectory change.
Actually, I’ve just checked where I wrote about this before, and saw I quoted Ord saying:
So I think Rob’s “at least 10% is lost” interpretation would indeed be either unusual or out of step with Ord (less sure about Bostrom).
Then perhaps it’s good that I didn’t include my nonstandard definition of x-risk, and we can expect the respondents to be at least somewhat closer to Ord’s definition.
I do find it odd to say that ’40% of the future’s value is lost’ isn’t an x-catastrophe, and in my own experience it’s much more common that I’ve wanted to draw a clear line between ’40% of the future is lost’ and ‘0.4% of the future is lost’, than between 90% and 40%. I’d be interested to hear about cases where Toby or others found it illuminating to sharply distinguish 90% and 40%.
I have sometimes wanted to draw a sharp distinction between scenarios where 90% of humans die vs. ones where 40% of humans die; but that’s largely because the risk of subsequent extinction or permanent civilizational collapse seems much higher to me in the 90% case. I don’t currently see a similar discontinuity in ’90% of the future lost vs. 40% of the future lost’, either in ‘the practical upshot of such loss’ or in ‘the kinds of scenarios that tend to cause such loss’. But I’ve also spent a lot less time about Toby thinking about the full range of x-risk scenarios.
FWIW, I personally don’t necessarily think we should focus more on 90+% loss scenarios than 1-90% loss scenarios, or even than <1% loss scenarios (though I’d currently lean against that final focus). I see this as essentially an open question (i.e., the question of which kinds of trajectory changes to prioritise increasing/decreasing the likelihood).
I do think Ord thinks we should focus more on 90+% loss scenarios, though I’m not certain why. I think people like Beckstead and MacAskill are less confident about that. (I’m lazily not including links, but can add them on request.)
I have some messy, longwinded drafts on something like this topic from a year ago that I could share, if anyone is interested.
I was just talking about what people take x-risk to mean, rather than what I believe we should prioritise.
Some reasons I can imagine for focusing on 90+% loss scenarios:
You might just have the empirical view that very few things would cause ‘medium-sized’ losses of a lot of the future’s value. It could then be useful to define ‘existential risk’ to exclude medium-sized losses, so that when you talk about ‘x-risks’ people fully appreciate just how bad you think these outcomes would be.
‘Existential’ suggests a threat to the ‘existence’ of humanity, i.e., an outcome about as bad as human extinction. (Certainly a lot of EAs—myself included, when I first joined the community! -- misunderstand x-risk and think it’s equivalent to extinction risk.)
After googling a bit, I now think Nick Bostrom’s conception of existential risk (at least as of 2012) is similar to Toby’s. In https://www.existential-risk.org/concept.html, Nick divides up x-risks into the categories ”human extinction, permanent stagnation, flawed realization, and subsequent ruination”, and says that in a “flawed realization”, “humanity reaches technological maturity” but “the amount of value realized is but a small fraction of what could have been achieved”. This only makes sense as a partition of x-risks if all x-risks reduce value to “a small fraction of what could have been achieved” (or reduce the future’s value to zero).
I still think that the definition of x-risk I proposed is a bit more useful, and I think it’s a more natural interpretation of phrasings like “drastically curtail [Earth-originating intelligent life’s] potential” and “reduce its quality of life (compared to what would otherwise have been possible) permanently and drastically”. Perhaps I should use a new term, like hyperastronomical catastrophe, when I want to refer to something like ‘catastrophes that would reduce the total value of the future by 5% or more’.
I agree with everything but your final paragraph.
On the final paragraph, I don’t strongly disagree, but:
I think to me “drastically curtail” more naturally means “reduces to much less than 50%” (though that may be biased by me having also heard Ord’s operationalisation for the same term).
At first glance, I feel averse to introducing a new term for something like “reduces by 5-90%”
I think “non-existential trajectory change”, or just “trajectory change”, maybe does an ok job for what you want to say
Technically those things would also cover 0.0001% losses or the like. But it seems like you could just say “trajectory change” and then also talk about roughly how much loss you mean?
It seems like if we come up with a new term for the 5-90% bucket, we would also want a new term for other buckets?
I also mentally noted that “drastically less” was ambiguous, though for the sake of my quick forecasts I decided that whether you meant (or whether others would interpret you as meaning) “5% less” or “90% less” didn’t really matter to my forecasts, so I didn’t bother commenting.
Yeah, a big part of why I left the term vague is because I didn’t want people to get hung up on those details when many AGI catastrophe scenarios are extreme enough to swamp those details. E.g., focusing on whether the astronomical loss threshold is 80% vs. 50% is besides the point if you think AGI failure almost always means losing 98+% of the future’s value.
I might still do it differently if I could re-run the survey, however. It would be nice to have a number, so we could more easily do EV calculations.
I’d be interested in seeing operationalizations at some subset of {1%, 10%, 50%, 90, 99%}.* I can imagine that most safety researchers will give nearly identical answers to all of them, but I can also imagine that large divergences, so decent value of information here.
*Probably can’t do all 5, at least not at once, because of priming effects.
On the premise that a 10% chance of AGI is much more salient than a 50% chance, given the stakes, it would be good to see a survey of a similar set of people with these two questions:
1. Year with 10% chance of AGI.
2. P(doom|AGI in that year)
(Operationalising “doom” as Ord’s definition of “the greater part of our potential is gone and very little remains”, although I pretty much think of it as being paperclipped or equivalent so that ~0 value remains.)