The Pending Disaster Framing as it Relates to AI Risk

This post tries to explain one particular frame that plays a significant role in why I prioritise AI safety as a cause area. I suspect many other people who are focused on AI Safety share this frame as well, but I can’t recall the last time I’ve heard it articulated.

Here’s two possible questions we can ask:

1) How can I have the greatest impact?:

  • This is the question we usually ask as EAs. It’s a fairly good one :-).

  • However, it seems useful to reason about the world in multiple ways as a check/​form of self-consistency.

2) What does the trajectory of the world look like by default?:

  • Many people, myself included, are worried that the default trajectory of AI is uncomfortably likely to be very, very bad for humanity, and soon. We also feel that the reasoning for this is robust to different ways that the future could pan out:

  • On the other hand, if AI goes well, we should basically expect it to solve global poverty, something EA’s current efforts—a tiny fraction of global efforts—likely couldn’t achieve in a hundred years. Admittedly, if AI ends up going well, EA will also only deserve a small fraction of the credit from this, but it feels like we have the opportunity to play a far more decisive role here.

  • If AI will literally be a disaster for humanity, then something feels absurd about distributing bednets to make a marginal difference in terms of global efforts to resolve poverty while we’re on the Titanic headed towards an iceberg:

    • Maybe this is just irrational feeling? “Feels absurd” isn’t a concrete argument. But I would expect this to be persuasive to most people and the main reason for people to reject this to be that they either don’t think things are likely to be that bad or that they don’t trust their own judgement.

    • I believe that “feels absurd” is not just an emotion, but indirectly points to some non-obvious reasons why we might want to prioritise this issue. I observe below that the cost of a mistake is non-symmetric, that investing can provide option-value and that near-term crisis spending should generally be allocated faster than non-crisis spending. I further claim that even without being aware of these explicit arguments, most people implicitly have some idea of these considerations from their experience with crises and so that the “feels absurd” intuition is actually grounded.

  • The downside of mistakenly prioritising global poverty over AI safety is likely much worse than the other way around:

    • This becomes relevant if instead of assigning a fixed p(doom) we imagine this as a probability distribution to represent our uncertainty.

    • Additionally, AI is a generically useful skill. Given how powerful current and upcoming AI systems are, time spent building skills in these areas wouldn’t be a complete waste even if you updated to AI risks being overstated. There will be opportunities to use these skills for impact, even though it might not end up being the optimal choice in the case where you stop being concerned about AI risks.

  • If we incorporate investment over time into our model, then the analysis most likely becomes more favorable to AI risk. Resource allocation isn’t just a single decision since we can adjust over time as new information comes in. This means that there is value in allocating resources towards preparatory work in case further investigation turns up facts unfavorable to us (similar to betting in poker just to stay in the game until the next card is revealed). When timelines are short, not investing is more likely to cost us option-value.

  • Additionally, if we were trying to split resources evenly between global poverty and a near-term x-risk, we’d likely to want spend the resources faster for the x-risk given that the later the date of allocation is, the more likely we won’t be able to actually allocate the resources according to the x-risk model:

    • Of course, a funder pursuing such a strategy would have to have to be careful here to avoid the situation where they’ve spent all of the resources allocated to x-risk early and so they’d end up unfairly tapping into the resources reserved for global poverty, due to reckless spending rather than an update in the facts.

    • Even if the crisis isn’t once-off, this generalises if such crises are actually rare (though many institutions may struggle to define crises in such a way as to ensure they are actually rare).

  • Notice how asking this question brings up considerations that might not have come up if we’d just focused on asking about how we could have the greatest impact.

  • I suspect that many people who are worried about AI risk struggle to communicate why they are worried about this because their position is dependent on some of these more indirect considerations. They might be in the same position as me where they have certain beliefs about the frame in which these arguments should take place which feel so obvious to them, that they struggle to put it into words:

    • Hopefully, identifying the frame helps with communication between people in different camps regardless of whether this causes more people to agree with me or not.

Counter-considerations:

  • Some people argue that our chances of success are so small that maybe we should just focus on making the world as good as it is for whatever time we have:

    • I think this felt more persuasive in the past, but we’ve seen a lot of traction, both in the emergence of a bunch of potentially promising alignment directions, but also in terms of the governance:

      • For technical alignment, I’ll highlight the rapid progress in interpretability and the rise of model internals techniques like contrast-consistent search. Naively, it would have been easy to throw up our hands at these problems and call them too hard:

        • Some people have criticised the interpretability path as too slow or accelerating capabilities more than alignment but still demonstrate that some problems that initially look intractable aren’t necessarily as intractable as they appear.

        • I view even partial progress as valuable for helping to open up and clarify the possibility space. Additionally, it helps us understand which challenges are easier than they appear and which challenges are harder than they appear.

      • For governance, I’ll note the CAIS Letter, the UK AI Safety Institute, the Bletchley Park Agreement and Biden’s Executive Order

    • When something is important enough, then it often makes sense to just jump in there and trust that you’ll have a decent chance of figuring something out, even if you don’t have much of a clue of what you’ll do at the start:

      • This is in opposition to the Original EA framework which focused on evidence and actually knowing that we were doing good. But that narrative is more compelling when there aren’t any sufficiently big disasters looming on the horizon such that even a small chance of making things go better is ridiculously high EV.

      • It’s still very important to be careful with actions that could be negative given things like the Unilateralist’s Curse.

      • This strategy may be dependent upon you being pretty competent at a broad range of skills and may not be worth pursuing otherwise.

    • Then again, you could argue that humans have a bias towards false hope. Some people argue that this is too little too late.

  • Some people have suggested that EA’s AI safety work has been net-negative so far:

    • Many people point to EA’s involvement with OpenAI as an example

    • That could be the case, however, even if our previous work hasn’t been net-positive, if the default trajectory is as bad as many of us think, then it feels absurd to just throw up our hands and surrender so easily. I’d suggest that cause areas having a bit of a learning curve is to be expected and this curve is likely to be rougher for an area like AI safety where we can’t rely on repeatable experiments to determine how to make decisions. This is especially the case when you consider some of the other actors to whom it would grant more influence:

      • Then again, it’s always very easy to convince yourself “I’ll do better next time, I’ll do better next time”:

        • Despite our flaws, I still trust in the ability and wisdom of the EA community.

      • Additionally, two experienced “EA-adjacent” people were outplayed in the recent OpenAI board fight. You might argue that this isn’t our strength.

        • Perhaps the stakes are high enough that we need to find a way to win anyway?

        • I agree with Scott Alexander that we shouldn’t over-update on dramatic events.

Notes:

  • Again, I’d encourage you to consider both questions. I believe that both of them provide a valuable lens for figuring out which cause area to prioritise.

  • Obviously, there are other questions like personal fit, but I’ll leave that one to you to figure out.

  • This post focuses on cause prioritisation from an individual perspective, rather than what the focus of the EA movement as a whole should be. I’m not making any claims in relation to this because this would raise a whole host of additional considerations. To name one, EA has a better chance of being able to pivot than AI safety specific groups if evidence emerged of a new Cause X that was more important than AI Safety.

  • I would love to know if anyone has performed any more solid mathematical modeling of some of these effects that I’ve discussed.