On presenting the case for AI risk
Epistemic status: Personal anecdotal evidence, not fully thought through to my own satisfaction. I’m posting this anyway because if I wait until I’ve thought it through to my satisfaction then I might never post it at all.
People keep telling me how they’ve had trouble convincing others to care at all about AI risk, or to take the concerns about misaligned AI seriously. This has puzzled me somewhat, because my experience has been significantly different than that.
In my case I mostly talk to engineers who deal with real-world applications of current or near-future AI systems, and when I talk to them about AI safety I actually don’t focus on existential risks at all. Instead I just talk about the risks from the systems we have right now, and the challenges we face in making them safe for today’s safety- and mission-critical systems. So I’ll talk about things like specification gaming, negative side effects, robustness, interpretability, testing and evaluation challenges, security against adversaries, and social coordination failures or races to the bottom. And then at the end I’ll throw in something about, oh yeah and obviously if we scale up AI to be even more powerful optimizers than they are now, then clearly these problems can have potentially catastrophic consequences.
And the thing is, I don’t recall getting just about any pushback on this—including on the longer term risks part. In fact, I find that people tend to extrapolate to the existential risks on their own. It really is pretty straightforward: There are huge problems with the safety of today’s AI systems that are at least partly due to their complexity, the complexity of the environment they’re meant to be deployed in, and the fact that powerful optimizers tend to come up with surprising and unforeseen solutions to whatever objectives we give them. So as we build ever more complex, powerful optimizers, and as we attempt to deploy them in ever more complex environments, of course the risks will go way up!
Sometimes I’ll start adding something about the longer term existential risks and they’ll be like, “whoah, yeah, that’s going to be a huge problem!” And sometimes they do the extrapolation themselves. Sometimes I’ll actually immediately follow that up with reasons people have given why we shouldn’t be worried… and the people I’m talking to will usually shoot those arguments down immediately on their own! For example, I might say something like, “well, some have argued that it’s not such a worry because we won’t be stupid enough to deploy really powerful systems like that without the proper safeguards...” And they’ll respond with incredulous faces, comments like, “yeah, I think I saw that movie already—it didn’t end well,” and references to obvious social coordination failures (even before the pandemic). Depending on how the conversation goes I might stop there, or we might then get a bit more into the weeds about specific concerns, maybe mention mesa-optimization or the like, etc. But at that point it’s a conversation about details, and I’m not selling them on anything.
Notice that nowhere did I get into anything about philosophy or the importance of the long term future. I certainly don’t lead with those topics. Of course, if the conversation goes in that direction, which it sometimes does, then I’m happy to go into those topics. - I do have a philosophy degree, after all, and I love discussing those subjects. But I’m pretty sure that leading with those topics would be actively counterproductive in terms of convincing most of the people I’m talking to that they should pay attention to AI safety at all. In fact, I think the only times I’ve gotten any real pushback was when I did lead with longer-term concerns (because the conversation was about certain projects I was working on related to longer-term risks), or when I was talking to people who were already aware of the philosophy or longtermist arguments and immediately pattern-matched what I was saying to that: “Well I’m not a Utilitarian so I don’t like Effective Altruism so I’m not really interested.” Or, “yeah, I’ve heard about Yudkowsky’s arguments and it’s all just fear-mongering from nerds who read too much science fiction.” Never mind that those aren’t even actual arguments—if I’ve gotten to that point then I’ve already lost the discussion and there’s usually no point continuing it.
Why do I not get pushback? I can think of a few possibilities (not mutually exclusive):
As above, by not leading with philosophy, longtermism, effective altruism, or science-fiction-sounding scenarios I avoid any cached responses to those topics. Instead I lead with issues that nobody can really argue with since those issues are here already, and then I segue into longer-term concerns as the conversation allows.
My main goal in these discussions is not usually to convince people of existential risks from future very advanced AI, but rather that they should be doing more about safety and other risks related to today’s AI. (This gets into a larger topic that I hope to post about soon, about why I think it’s important to promote near-term safety research.) I suspect that if I were only focused on promoting research that was explicitly related to existential risk, then this would make the sell a lot harder.
My usual audience is engineers who are working (at least to some degree) on systems with potential safety- and mission-critical applications in the military, space exploration, and public health domains. These people are already quite familiar with the idea that if the systems they come up with are anything less than extremely reliable and trustworthy, then those systems will simply not be deployed. (Contrary to what appears to be common perception among many EAs, I’ve been told many times by colleagues familiar with the subject that the US military is actually very risk-averse when it comes to deploying potentially unsafe systems in the real world.) So for these people it’s an easy sell—I just need to remind them that if they want any of the cool AI toys they’re working on to be actually deployed in the real world, then we need do an awful lot more to ensure safety. It’s possible that if I were giving these talks to a different audience, say ML researchers at a commercial startup, then it might be a much harder sell.
Here’s a presentation I gave on this topic a few times, including (in an abridged version) for a Foresight Institute talk. Note that the presentation is slightly out of date and I would probably do it a bit differently if I were putting it together now. Relatedly, here’s a rough draft for a much longer report along similar lines that I worked on with my colleague I-Jeng Wang as part of a project for the JHU Institute for Assured Autonomy. If anybody is interested in working with me to flesh out or update the presentation and/or report, please email me (aryeh.englander@jhuapl.edu).
- On presenting the case for AI risk by 9 Mar 2022 1:41 UTC; 54 points) (LessWrong;
- Followup on Terminator by 12 Mar 2022 1:11 UTC; 32 points) (
- EA Updates for April 2022 by 31 Mar 2022 16:43 UTC; 32 points) (
- Seeking Survey Responses—Attitudes Towards AI risks by 28 Mar 2022 17:47 UTC; 23 points) (
I too have this experience that most people tend to agree just fine with the case for AI risk when presented this way. Also, as far as I can tell, none of them have changed their actions (unless they were already into EA). Has your experience been different?
EDIT: Tbc, there’s value in getting people to “I’m glad other people are working on this”—it seems to me like this is how things become mainstream. But often I’m not just trying to make an incremental chip at “get AI safety to be mainstream” and instead I want to get some particular action, and in those cases I want to know what strategy I should use.
Thanks for sharing. In particular, I really appreciate you sharing the PowerPoint presentation. It’s excellent and will be a very useful slide deck to repurpose if I, or others I know, ever want to deliver a related talk.
Thanks for writing this up, it’s fantastic to get a variety of perspectives on how different messaging strategies work.
Do you have evidence or a sense of if people you have talked to have changed their actions as a result? I worry that the approach you use is so similar to what people already think that it doesn’t lead to shifts in behavior. (But we need nudges where we can get them)
I also worry about anchoring on small near term problems and this leading to a moral-licensing type effect for safety (and a false sense of security). It is unclear how likely this is. As in, if people care about AI Safety but lack the big picture, they might establish a safety team dedicated to say algorithmic bias. If the counter factual is no safety team, this is likely good. If the counter factual is a safety team focused on interpretability, this is likely bad. It could be that “having a safety team” makes an org or the people in it feel more justified in taking risks or investing less in other elements of safety (seems likely); this would be bad. To me, the cruxes here are something like: “what do people do after these conversations” “are the safety things they work on relevant to big problems” “how does safety culture interact with security-licensing or false sense of security”. I hope this comment didn’t come off aggressively. I’m super excited about this approach and particularly the way you meet people where they’re at, which is usually a much better strategy than how messaging around this usually comes off.
Yes, I have seen people become more actively interested in joining or promoting projects related to AI safety. More importantly, I think it creates an AI safety culture and mentality. I’ll have a lot more to say about all of this in my (hopefully) forthcoming post on why I think promoting near-term research is valuable.
Strongly agreed that working on the near term applications of AI safety is underrated by most EAs. Nearly all of the AI safety discussion focuses on advanced RL agents that are not widely deployed in the world today, and it’s possible that these systems do not soon reach commercial viability. Misaligned AI is causing real harms today and solving those problems would be a great step towards building the technical tools and engineering culture necessary to scale to aligning more advanced AI.
(That’s just a three sentence explanation of a topic deserving much more detailed analysis, so really looking forward to your post!!)
This is a great post; I’ll try to change the way I talk about AI risk in the future to follow these tips.
I am reminded of blogger Dynomight’s interesting story about how he initially got a bunch of really hostile reactions to a post about ultrasonic humidifiers & air quality, but was able to lightly reframe things using a more conventional tone and the hostility disappeared, even though the message and vast majority of the content was the same:
In his case, the solution was to add some friendly caveats—personally I think we do this plenty, at least in the semi-formal writing style of most EA Forum posts! But the logic of building “up” from real-world details and extrapolation, rather than building “down” from visions of AI apocalypse (which probably sounds to most people like attempting to justify an arbitrary sci-fi scenario), might be an equally powerful tool for talking about AI risk.
Thanks, I appreciate this post a lot!
Playing the devil’s advocate for a minute, I think one main challenge to this way of presenting the case is something like “yeah, and this is exactly what you’d expect to see for a field in its early stages. Can you tell a story for how these kinds of failures end up killing literally everyone, rather than getting fixed along the way, well before they’re deployed widely enough to do so?”
And there, it seems you do need to start talking about agents with misaligned goals, and the reasons to expect misalignment that we don’t manage to fix?
What I do (assuming I get to that point in the conversation) is that I deliberately mention points like this, even before trying to argue otherwise. In my experience (which again is just my experience) a good portion of the time the people I’m talking to debunk those counterarguments themselves. And if they don’t, well then I can start discussing it at that point—but at that point it feels to me like I’ve already established credibility and non-craziness by (a) starting off with noncontroversial topics, (b) starting off the more controversial topics with arguments against taking it seriously, and (c) by drawing mostly obvious lines of reasoning from (a) to (b) to whatever conclusions they do end up reaching. So long as I don’t go signaling science-fiction-geekiness too much during the conversation, it feels to me like if I end up having to make some particular arguments in the end then those become a pretty easy sell.
Do you have a list of specific examples of these risks?
Some—see the links at the end of the post.
Would be really helpful to have this front and center.
fwiw my friend said he recently explained AI risk to his mom, and her response was “yeah, that makes sense.”