On presenting the case for AI risk

Epistemic status: Personal anecdotal evidence, not fully thought through to my own satisfaction. I’m posting this anyway because if I wait until I’ve thought it through to my satisfaction then I might never post it at all.

People keep telling me how they’ve had trouble convincing others to care at all about AI risk, or to take the concerns about misaligned AI seriously. This has puzzled me somewhat, because my experience has been significantly different than that.

In my case I mostly talk to engineers who deal with real-world applications of current or near-future AI systems, and when I talk to them about AI safety I actually don’t focus on existential risks at all. Instead I just talk about the risks from the systems we have right now, and the challenges we face in making them safe for today’s safety- and mission-critical systems. So I’ll talk about things like specification gaming, negative side effects, robustness, interpretability, testing and evaluation challenges, security against adversaries, and social coordination failures or races to the bottom. And then at the end I’ll throw in something about, oh yeah and obviously if we scale up AI to be even more powerful optimizers than they are now, then clearly these problems can have potentially catastrophic consequences.

And the thing is, I don’t recall getting just about any pushback on this—including on the longer term risks part. In fact, I find that people tend to extrapolate to the existential risks on their own. It really is pretty straightforward: There are huge problems with the safety of today’s AI systems that are at least partly due to their complexity, the complexity of the environment they’re meant to be deployed in, and the fact that powerful optimizers tend to come up with surprising and unforeseen solutions to whatever objectives we give them. So as we build ever more complex, powerful optimizers, and as we attempt to deploy them in ever more complex environments, of course the risks will go way up!

Sometimes I’ll start adding something about the longer term existential risks and they’ll be like, “whoah, yeah, that’s going to be a huge problem!” And sometimes they do the extrapolation themselves. Sometimes I’ll actually immediately follow that up with reasons people have given why we shouldn’t be worried… and the people I’m talking to will usually shoot those arguments down immediately on their own! For example, I might say something like, “well, some have argued that it’s not such a worry because we won’t be stupid enough to deploy really powerful systems like that without the proper safeguards...” And they’ll respond with incredulous faces, comments like, “yeah, I think I saw that movie already—it didn’t end well,” and references to obvious social coordination failures (even before the pandemic). Depending on how the conversation goes I might stop there, or we might then get a bit more into the weeds about specific concerns, maybe mention mesa-optimization or the like, etc. But at that point it’s a conversation about details, and I’m not selling them on anything.

Notice that nowhere did I get into anything about philosophy or the importance of the long term future. I certainly don’t lead with those topics. Of course, if the conversation goes in that direction, which it sometimes does, then I’m happy to go into those topics. - I do have a philosophy degree, after all, and I love discussing those subjects. But I’m pretty sure that leading with those topics would be actively counterproductive in terms of convincing most of the people I’m talking to that they should pay attention to AI safety at all. In fact, I think the only times I’ve gotten any real pushback was when I did lead with longer-term concerns (because the conversation was about certain projects I was working on related to longer-term risks), or when I was talking to people who were already aware of the philosophy or longtermist arguments and immediately pattern-matched what I was saying to that: “Well I’m not a Utilitarian so I don’t like Effective Altruism so I’m not really interested.” Or, “yeah, I’ve heard about Yudkowsky’s arguments and it’s all just fear-mongering from nerds who read too much science fiction.” Never mind that those aren’t even actual arguments—if I’ve gotten to that point then I’ve already lost the discussion and there’s usually no point continuing it.

Why do I not get pushback? I can think of a few possibilities (not mutually exclusive):

  • As above, by not leading with philosophy, longtermism, effective altruism, or science-fiction-sounding scenarios I avoid any cached responses to those topics. Instead I lead with issues that nobody can really argue with since those issues are here already, and then I segue into longer-term concerns as the conversation allows.

  • My main goal in these discussions is not usually to convince people of existential risks from future very advanced AI, but rather that they should be doing more about safety and other risks related to today’s AI. (This gets into a larger topic that I hope to post about soon, about why I think it’s important to promote near-term safety research.) I suspect that if I were only focused on promoting research that was explicitly related to existential risk, then this would make the sell a lot harder.

  • My usual audience is engineers who are working (at least to some degree) on systems with potential safety- and mission-critical applications in the military, space exploration, and public health domains. These people are already quite familiar with the idea that if the systems they come up with are anything less than extremely reliable and trustworthy, then those systems will simply not be deployed. (Contrary to what appears to be common perception among many EAs, I’ve been told many times by colleagues familiar with the subject that the US military is actually very risk-averse when it comes to deploying potentially unsafe systems in the real world.) So for these people it’s an easy sell—I just need to remind them that if they want any of the cool AI toys they’re working on to be actually deployed in the real world, then we need do an awful lot more to ensure safety. It’s possible that if I were giving these talks to a different audience, say ML researchers at a commercial startup, then it might be a much harder sell.

Here’s a presentation I gave on this topic a few times, including (in an abridged version) for a Foresight Institute talk. Note that the presentation is slightly out of date and I would probably do it a bit differently if I were putting it together now. Relatedly, here’s a rough draft for a much longer report along similar lines that I worked on with my colleague I-Jeng Wang as part of a project for the JHU Institute for Assured Autonomy. If anybody is interested in working with me to flesh out or update the presentation and/​or report, please email me (aryeh.englander@jhuapl.edu).