Models such as the Carlsmith one, which treat AI x-risk as highly conjunctive (i.e. lots of things need to happen for an AI existential catastrophe), already seem like they’ll bias results towards lower probabilities (see e.g. this section of Nate’s review of the Carlsmith report). I won’t say more on this since I think it’s been discussed several times already.
What I do want to highlight is that the methodology of this post exacerbates that effect. In principle, you can get reasonable results with such a model if you’re aware of the dangers of highly conjunctive models, and sufficiently careful in assigning probabilities.[1] This might at least plausibly be the case for a single person giving probabilities, who has hopefully thought about how to avoid the multiple stage fallacy, and spent a lot of time thinking about their probability estimates. But if you just survey a lot of people, you’ll very likely get at least a sizable fraction of responses who e.g. just tend to assign probabilities close to 50% because anything else feels overconfident, or who don’t actually condition enough on previous steps having happened, even if the question tells them to. (This isn’t really meant to critique people who answered the survey—it’s genuinely hard to give good probabilities for these conjunctive models). The way the analysis in this post works, if some people give probabilities that are too low, the overall result will also be very low (see e.g. this comment).
I would strongly guess that if you ran exactly the same type of survey and analysis with a highly disjunctive model (e.g. more along the lines of this one by Nate Soares), you would get way higher probabilities of X-risk. To be clear, that would be just as bad, it would likely be an overestimate!
One related aspect I want to address:
Most models of AI risk are – at an abstract enough level – more like an elimination tournament than a league, at least based on what has been published on various AI-adjacent forums. The AI needs everything to go its way in order to catastrophically depower humanity.
There is a lot of disagreement about whether AI risk is conjunctive or disjunctive (or, more realistically, where it is on the spectrum between the two). If I understand you correctly (in section 3.1), you basically found only one model (Carlsmith) that matched your requirements, which happened to be conjunctive. I’m not sure if that’s just randomness, or if there’s a systematic effect where people with more disjunctive models don’t tend to write down arguments in the style “here’s my model, I’ll assign probabilities and then multiply them”.
If we do want to use a methodology like the one in this post, I think we’d need to take uncertainty over the model itself extremely seriously. E.g. we could come up with a bunch of different models, assign weights to them somehow (e.g. survey people about how good a model of AI x-risk this is), and then do the type of analysis you do here for each model separately. At the end, we average over the probabilities each model gives using our weights. I’m still not a big fan of that approach, but at least it would take into account the fact that there’s a lot of disagreement about the conjunctive vs disjunctive character of AI risk. It would also “average out” the biases that each type of model induces to some extent.
- ^
Though there’s still the issue of disjunctive pathways being completely ignored, and I also think it’s pretty hard to be sufficiently careful.
I think you’re pointing to a real phenomenon here (though I might not call it an “optimism bias”—EAs also tend to be unusually pessimistic about some things).
I have pretty strong disagreements with a lot of the more concrete points in the post though, I’ve tried to focus on the most important ones below.
(I think you may have missed the factor of 0.01, the relative risk reduction you postulated? I get 8 billion * 0.06 * 0.01 * 0.1 * 0.1 = 48,000. So AI safety would look worse by a factor of 100 compared to your numbers.)
But anyway, I strongly disagree with those numbers, and I’m pretty confused as to what kind of model generates them. Specifically, you seem to be extremely confident that we can’t solve AI X-risk (< 1⁄10,000 chance if we multiply together the 1% relative reduction with your two 10% chances). On the other hand, you think we’ll most likely be fine by default (94%). So you seem to be saying that there probably isn’t any problem in the first place, but if there is, then we should be extremely certain that it’s basically intractable. This seems weird to me. Why are you so sure that there isn’t a problem which would lead to catastrophe by default, but which could be solved by e.g. 1,000 AI safety researchers working for 10 years? To get to your level of certainty (<1/10,000 is a lot!), you’d need a very detailed model of AI X-risk IMO, more detailed than I think anyone has written about. A lot of the uncertainty people tend to have about AI X-risk comes specifically from the fact that we’re unsure what the main sources of risk are etc., so it’s unclear how you’d exclude the possibility that there are significant sources of risk that are reasonably easy to address.
As to why I’m not convinced by the argument that leads you to the <1/10,000 chance: the methodology of “split my claim into a conjunction of subclaims, then assign reasonable-sounding probabilities to each, then multiply” often just doesn’t work well (there are exceptions, but this certainly isn’t one of them IMO). You can get basically arbitrary result by splitting up the claim in different ways, since what probabilities are “reasonable-sounding” isn’t very consistent in humans.
I can’t speak for all longtermists of course, but that is decidedly not an argument I want to make (and FWIW, my impression is that this is not the key objection most longtermists would raise). If you convinced me that our chances of preventing an AI existential catastrophe were <1/10,000, and that additionally we’d very likely die in a few centuries anyway (not sure just how likely you think that is?), then I would probably throw the expected value calculations out the window and start from scratch trying to figure out what’s important. Basically for exactly the reasons you mention: at some point this starts feeling like a Pascal’s mugging, and that seems fishy and confusing.
But I think the actual chances we prevent an AI existential catastrophe are way higher than 1⁄10,000 (more like 1⁄10 in terms of the order of magnitude). And I think conditioned on that, our chances of surviving for billions of years are pretty decent (very spontaneous take: >=50%). Those feel like cruxes to me way more than whether we should blindly do expected value calculations with tiny probabilities, because my probabilities aren’t tiny.
I agree it’s possible in a very weak sense, but I think we can say something stronger about just how unlikely this is (over the next millennium or two): Nothing like this has happened over the past 65 million years (where I’m counting the asteroid back then as “unstoppable” even though I think we could stop that soon after AGI). So unless you think that alien invasions are reasonably likely to happen soon (but were’t likely before we sent out radio waves, for example), this scenario seems to be firmly in the “not really worth thinking about” category.
This may seem really nitpicky, but I think it’s important when we talk about how likely it is that we’ll continue living for billions of years. You give several scenarios for how things could go badly, but it would be just as easy to list scenarios for how things could go well. Listing very unlikely scenarios, especially just on one side, actively makes our impression of the overall probabilities worse.