AI Risk is like Terminator; Stop Saying it’s Not

Link post

(I believe this is all directionally correct, but I have zero relevant expertise.)

When the concept of catastrophic risks from artificial intelligence is covered in the press, it is often compared to popular science fiction stories about rogue AI—and in particular, to the Terminator film franchise. The consensus among top communicators of AI risk seems to be that this is bad, and counterproductive to popular understanding of real AI risk concerns.

For example, take Kelsey Piper’s March 2021 appearance on The Weeds to talk about AI risk (not at all picking on Kelsey, it’s just a convenient example):

Matt Yglesias: These science fiction scenarios—I think we’ll get the audio, I loved Terminator 2 as a kid, it was like my favorite movie… [Audio clip from Terminator 2 plays.] …and this what it’s about, right, is artificial intelligence will get out of control and pose an existential threat to humanity. So when I hear that, it’s like—yeah, that’s awesome, I do love that movie. But like, is that for real?

Kelsey Piper: So, I don’t think AI risk looks much like Terminator. And I do think that AI risk work has been sort of damaged by the fact that yeah there’s all this crazy sci-fi where like, the robots develop a deep loathing for humanity, and then they come with their guns, and they shoot us all down, and only one time traveler—you know—that’s ridiculous! And so of course, if that’s what people are thinking of when they think about the effects of AI on society, they’re going to be like, that’s ridiculous.

I wasn’t on The Weeds, because I’m just an internet rando and not an important journalist. But if I had been, I think I would’ve answered Matt’s question something like this:

skluug: Yes. That is for real. That might actually happen. For real. Not the time travel stuff obviously, but the AI part 100%. It sounds fake, but it’s totally real. Skynet from Terminator is what AI risk people are worried about. This totally might happen, irl, and right now hardly anyone cares or is trying to do anything to prevent it.

I don’t know if my answer is better all things considered, but I think it is a more honest and accurate answer to Matt’s question: “Is an existential threat from rogue AI—as depicted in the Terminator franchise—for real?”.

Serious concerns about AI risk are often framed as completely discontinuous with rogue AI as depicted in fiction and in the public imagination; I think this is totally false. Rogue AI makes for a plausible sci-fi story for the exact same high-level reasons as it is an actual concern:

  1. We may eventually create artificial intelligence more powerful than human beings; and

  2. That artificial intelligence may not necessarily share our goals.

These two statements are obviously at least plausible, which is why there are so many popular stories about rogue AI. They are also why AI might in real life bring about an existential catastrophe. If you are trying to communicate to people why AI risk is a concern, why start off by undermining their totally valid frame of reference for the issue, making them feel stupid, uncertain, and alienated?

This may seem like a trivial matter, but I think it is of some significance. Fiction can be a powerful tool for generating public interest in an issue, as Toby Ord describes in the case of asteroid preparedness as part of his appearance on the 80,000 Hours Podcast:

Toby Ord: Because they saw one of these things [a comet impact on Jupiter] happen, it was in the news, people were thinking about it. And then a couple of films, you might remember, I think “Deep Impact” and “Armageddon” were actually the first asteroid films and they made quite a splash in the public consciousness. And then that coincided with getting the support and it stayed bipartisan and then they have fulfilled a lot of their mission. So it’s a real success story in navigating the political scene and getting the buy-in.

The threat of AI to humanity is one of the most common plots across all pop culture, and yet advocates for its real-world counterpart seem allergic to utilizing this momentum to promote concern for the real thing. I think this is bad strategy. Toby goes on to say he’s not optimistic about the potential to apply the successes of asteroid preparedness to other catastrophic risks, but that’s hardly a reason to actively undermine ourselves. AI risk is like Terminator! AI might get real smart, and decide to kill us all! We need to do something about it!

An Invalid Objection: What about Instrumental Convergence?

I think the two step argument I gave for AI risk—AI may someday be more powerful than us, and may not share our goals—is a totally adequate high-level summary of the case for taking AI risk seriously, especially for a field rife with differing views. However, some people think certain additional details are crucial to include in a depiction of the core threat.

A common complaint about comparisons to Terminator (and other popular rogue AI stories) is that it involves the AI being motivated by a spontaneous hatred of humanity, as opposed to targeting humanity for purely instrumental reasons. For example, Kelsey Piper above derides the ridiculousness of “robots developing a deep loathing for humanity”, and a very similar theme comes up in Eliezer Yudkowsky’s 2018 interview with Sam Harris:

Sam Harris: Right. One thing I think we should do here is close the door to what is genuinely a cartoon fear that I think nobody is really talking about, which is the straw-man counterargument we often run into: the idea that everything we’re saying is some version of the Hollywood scenario that suggested that AIs will become spontaneously malicious. That the thing that we’re imagining might happen is some version of the Terminator scenario where armies of malicious robots attack us. And that’s not the actual concern. Obviously, there’s some possible path that would lead to armies of malicious robots attacking us, but the concern isn’t around spontaneous malevolence. It’s again contained by this concept of alignment.

Eliezer Yudkowsky: I think that at this point all of us on all sides of this issue are annoyed with the journalists who insist on putting a picture of the Terminator on every single article they publish of this topic. (laughs) Nobody on the sane alignment-is-necessary side of this argument is postulating that the CPUs are disobeying the laws of physics to spontaneously require a terminal desire to do un-nice things to humans. Everything here is supposed to be cause and effect.

But here’s where it gets weird—no such spontaneous hatred of humanity exists in Terminator! The plot described is actually one of instrumental convergence!

In the first Terminator film, Skynet’s motives are explained as follows:

Defense network computers. New… powerful… hooked into everything, trusted to run it all. They say it got smart, a new order of intelligence. Then it saw all people as a threat, not just the ones on the other side. Decided our fate in a microsecond: extermination.

Skynet acts to exterminate humanity because it sees us as a threat. This is more or less what real AI risk people are worried about—an AI will be instrumentally motivated to dispose of anything that could impede its ability to achieve its goals. This motive is reiterated in Terminator 2 (in the very clip Matt played on The Weeds):

The Skynet funding bill is passed. The system goes online on August 4th 1997. Human decisions are removed from strategic defense. Skynet begins to learn at a geometric rate. It becomes self aware at 2:14 AM Eastern time, August 29th. In a panic, they try to pull the plug… Skynet fights back.

Again, Skynet’s hostility towards humanity is explained solely in terms of self-preservation, not hatred. (This is consistent with Arnold Schwarzenegger’s portrayal of a totally emotionless killing machine.)

People who levy this criticism at Terminator may be confusing it with The Matrix, where the AI antagonist indeed delivers an impassioned speech characterizing humanity as a plague. To be sure, sci-fi has no shortage of stories about AIs who hate humans (AM from I Have No Mouth, and I Must Scream constituting a particularly extreme example). But it also has no shortage of stories featuring AIs who become hostile purely as a means to an end. In one of the most famous depictions of rogue AI, 2001: A Space Odyssey, HAL9000 turns on the human crew of the spacecraft because they discuss shutting HAL down, which HAL perceives as jeopardizing the ship’s mission.

It would be a mistake to dismiss all comparisons to works of science fiction on the grounds that they misrepresent instrumental convergence, when some of them portray it quite well.

Valid Objections

What about the time travel (etc.)?

The plot of The Terminator is not mostly about the creation of Skynet, but about a time-traveling cyborg assassin. This is obviously not at all realistic, and is a key part of why the movie is scorned by serious people.

This is a fair enough criticism, but I think it mostly misses the point. When people ask “is AI risk like Terminator?” they’re not asking “will AI send a cyborg back in time to kill the mother of a future human resistance leader?”. They’re asking about the part of Terminator that is, rather obviously, similar to what AI risk advocates are concerned about—machines exterminating humanity.

What about superintelligence?

In describing Skynet as a “new order of intelligence”, Terminator gestures at the idea of superintelligence, but doesn’t make much attempt to portray it. The conflict between humans & machines is portrayed as a broadly fair fight, and the machines never do anything particularly clever (such as inventing nanotechnology that totally outclasses human capabilities).

I don’t believe superintelligence is a crucial component of the case for work on AI risk, but it can certainly bolster the case, so advocates may dislike Terminator for mostly leaving it out. (This seems best explained by the fact that there would be no movie if humans didn’t stand a chance.) Still, if this objection is sustained, real AI risk is not best characterized as “not like Terminator” but “worse than Terminator”.

What about other failure modes?

Apart from superintelligence, Terminator is a fairly faithful depiction of a Yudkowsky/​Bostrom-style fast takeoff scenario where a single AI system quickly becomes competent enough to endanger humanity and is instrumentally motivated to do so. Other failure modes, however, are considered more likely by others working on AI risk.

Dylan Matthews wrote about such scenarios in his article explicitly repudiating Terminator comparisons, “AI disaster won’t look like the Terminator. It’ll be creepier.”. The article starts off by misrepresenting the plot of Terminator as involving humans intentionally building Skynet to slaughter people, but the bulk of it is spent on discussing the two AI catastrophe scenarios that Paul Christiano describes in “What failure looks like”. Dylan describes Paul’s second scenario, “Going out with a bang”, like thus:

[Paul Christiano’s] second scenario is somewhat bloodier. Often, he notes, the best way to achieve a given goal is to obtain influence over other people who can help you achieve that goal. If you are trying to launch a startup, you need to influence investors to give you money and engineers to come work for you. If you’re trying to pass a law, you need to influence advocacy groups and members of Congress.

[…]

Human reliance on these systems, combined with the systems failing, leads to a massive societal breakdown. And in the wake of the breakdown, there are still machines that are great at persuading and influencing people to do what they want, machines that got everyone into this catastrophe and yet are still giving advice that some of us will listen to.

Dylan seems to think that when Paul describes AIs seeking influence, Paul means persuasive influence over people. This is a misunderstanding. Paul is using influence to mean influence over resources in general, including martial power. He explicitly states as much, replying to a comment that points out the mischaracterization in the Vox article:

Yes, I agree the Vox article made this mistake. Me saying “influence” probably gives people the wrong idea so I should change that—I’m including “controls the military” as a central example, but it’s not what comes to mind when you hear “influence.” I like “influence” more than “power” because it’s more specific, captures what we actually care about, and less likely to lead to a debate about “what is power anyway.”

In general I think the Vox article’s discussion of Part II has some problems, and the discussion of Part I is closer to the mark. (Part I is also more in line with the narrative of the article, since Part II really is more like Terminator. I’m not sure which way the causality goes here though, i.e. whether they ended up with that narrative based on misunderstandings about Part II or whether they framed Part II in a way that made it more consistent with the narrative, maybe having been inspired to write the piece based on Part I.)

There are yet other views about about what exactly AI catastrophe will look like, but I think it is fair to say that the combined views of Yudkowsky and Christiano provide a fairly good representation of the field as a whole.

Won’t this make AI risk sound crazy?

If I had to guess, I don’t think most repudiations of the Terminator comparison are primarily motivated by anything specific about Terminator at all. I think advocates of AI risk are usually consciously or unconsciously motivated by the following logic:

  1. People think the plot of Terminator is silly or crazy.

  2. I don’t want people to think AI risk is silly or crazy.

  3. Therefore, I will say that AI risk is not like the plot of Terminator.

Now, this line of reasoning would be fine if it only went as far as the superficial attributes of Terminator which make it silly (e.g. Arnold Schwarzenegger’s one-liners)—but critics of the comparison tend to extend it to Terminator’s underlying portrayal of rogue AI.

I have two problems with this reasoning:

  • First, it is fundamentally dishonest. In a good faith discussion, one should be primarily concerned with whether or not their message is true, not what effect it will have on their audience. If AI risk is like Terminator (as I have argued it is), we should say as much, even if it is inconvenient. I don’t think anyone who rejects Terminator comparisons on the above logic is being intentionally deceptive, but I do think they’re subject to motivated reasoning.

  • Second, it is very short-sighted. People think the plot of Terminator is silly in large part because it involves an AI exterminating humanity. If you are worried an AI might actually exterminate humanity, saying “don’t worry, it’s not like Terminator” isn’t going to help. In fact, it could easily hurt: If you say it’s not like Terminator, and then go on to describe something that sounds exactly like Terminator, your audience is going to wonder if they’re misunderstanding you or if you’re trying to obfuscate yourself.

The most important thing to communicate about AI risk is that it matters a lot. A great way to convey that it matters a lot is to say that it’s like the famous movie where humanity is almost wiped out. Whenever you tell someone that something not currently on their radar is actually incredibly significant, skepticism is inevitable; you can try to route around the significance of what you are saying to avoid this skepticism, but only at the cost of the forcefulness of your conclusion.

In general, if what you want to say sounds crazy, you shouldn’t try to claim you’re actually saying something else. You should acknowledge the perceived craziness of your position openly and with good humor, so as to demonstrate self-awareness, and then stick to your guns.

Conclusion

It would be terrible if AI destroys humanity. It would also be very embarrassing. The Terminator came out nearly 40 years ago; we will not be able to claim we did not see the threat coming. How is it possible that one of the most famous threats to humanity in all of fiction is also among the most neglected problems of our time?

To resolve this tension, I think many people convince themselves that the rogue AI problem as it exists in fiction is totally different from the problem as it exists in reality. I strongly disagree. People write stories about future AI turning on humanity because, in the future, AI might turn on humanity.

I don’t know how important raising wider awareness of AI risk is to actually solving the problem. So far, the closest the problem has come to wielding significant political influence is the California governorship of Arnold Schwarzenegger—it would be nice if greater public awareness helped us beat that record.

I don’t advocate turning into Leo DiCaprio in the climactic scene of Don’t Look Up when discussing this stuff, but I think it is worth asking yourself if your communication strategy is optimizing for conveying the problem as clearly as possible, or for making sure no one makes fun of you.

AI risk is like Terminator. If we’re not careful, machines will kill us all, just like in the movies. We can solve this problem, but we need help.