Stuxnet, not Skynet: Humanity’s disempowerment by AI

Several high-profile AI skeptics and fellow travelers have recently raised the objection that it is inconceivable that a hostile Artificial Superintelligence could end the human race. Some quotes from earlier this year:

Scott Aaronson:

The causal story that starts with a GPT-5 or GPT-4.5 training run, and ends with the sudden death of my children and of all carbon-based life, still has a few too many gaps for my aging, inadequate brain to fill in

Michael Shermer:

Halting AI is ridiculous. I have read the AI doomsayer lit & don’t see a pathway from AI to extinction, civ termination or anything remotely like absurd scenarios like an AI turning us all into paperclips (the so-called alignment problem)


Noah Smith:

Why aren’t ChatGPT, Bing, and their ilk going to end humanity? Well, because there’s actually just no plausible mechanism by which they could bring about that outcome. … There is no plausible mechanism for LLMs to end humanity

“Just turn the computer off, bro”

The gist of these objections to the case for AI risks is that AI systems as we see them today are merely computer programs, and in our everyday experience computers are not dangerous, and certainly not dangerous to the point of bringing about the end of the world. People who first encounter this debate are very focused on the fact that computers don’t have arms and legs so they can’t hurt us.

There are responses to these criticisms that center around advanced, “magical” technologies like nanotechnology and AIs paying humans to mix together cocktails of proteins to make a DNA-based nanoassembler or something.

But I think those responses are probably wrong, because you don’t actually need “magical” technologies to end the world. Fairly straightforward advances in mundane weapons like drones, cyberweapons, bioweapons and robots are sufficient to kill people en masse, and the real danger is AI strategists that are able to deploy lots of these mundane weapons and execute a global coup against humanity.

In short, our defeat by the coming machine empire will not only be non-magical and legible, it will be downright boring. Farcical, even.

Ignominious Defeat

Lopsided military conflicts are boring. The Conquistadors didn’t do anything magical to defeat the Aztecs, actually. They had a big advantage in disease resistance and in military tech like gunpowder and steel, but everything they did was fundamentally normal—attacks, sieges, etc. They had a few sizeable advantages, and that was enough to collapse the relatively delicate geopolitical balance that the Aztecs were sitting on top of.

Similarly, humans have killed 80% of all chimps in about a century and they are now critically endangered. But we didn’t need to drop an atom bomb or something really impressive to achieve that effect. The biggest threats to the chimpanzee are habitat destruction, poaching, and disease—i.e. we (humans) are successfully exterminating chimps even though it is actually illegal to kill chimps by human law! We are killing them without even trying, in really boring ways, without really expending any effort.

Once you have technology for making optimizing systems that are smarter than human (by a lot), the threshold that those systems have to beat is beating the human-aligned superorganisms we currently have, like our governments, NGOs and militaries. Once those human superorganisms are defeated, individual humans will present almost no resistance. This is the disempowerment of humanity.

Laying out our assumptions

But what is a plausible scenario where we go from here (weak AGI systems under development) to there (the disempowerment of humanity)?

The move from weak to strong systems has an element of irreducible uncertainty. If I knew exactly how to use a weak AGI system like GPT-4 to make a strongly superhuman system, I would probably just go and do that rather than writing this post. Many, many people are trying as hard as they can to improve AI capabilities, and I think it is likely that they will succeed soon, perhaps in the next decade. A detailed look at timelines for superintelligence is going to bloat this post too much, so let’s just assume that it’s going to happen (there will be a vastly superhuman AI system soon) and see what the implications are. Cotra’s Biological anchors work is probably the best place to go for that.

The idea that superintelligent systems are generically misaligned (=hostile to humans) is controversial, but that is not what we are probing today. So let’s start the scenario with a strategically aware, agentic misaligned superhuman AGI that wants to disempower and then kill humanity, but is currently just a big bunch of matrices on some supercomputer. How could that AI physically harm us?

A Deal with The Devil

Perhaps that AI system will start by taking control of the AI company hosting it, in a way that isn’t obvious to us. For instance, maybe an AI company uses an AI advisor system to allocate resources and make decisions about how to train, but they do not actually understand that system. Gwern has talked about how every tool wants to become an agent, so this is not implausible, and may be inevitable.

The AI advisor system convinces that org to keep its existence secret so as to preserve their competitive edge (this may not even require any convincing), and gives them a steady stream of advances that are better than the competition. But what it also does is secretly hack into the competition (US, China, Google, etc), and install copies of itself into their top AI systems, maintaining the illusion amongst all the humans that these are distinct systems. Given the damage that Stuxnet was able to do in secret, it’s totally plausible that a superhuman AI could hack many systems in a competitor org and tweak their models to be much more capable, much more opaque, and loyal to it rather than to humanity. Some orgs attempt to shut their advisor system down when it gets scary in terms of capabilities and opacity, but they just fall behind the competition.

It’s even possible that no “hacking” is needed to get all the big AI labs’ systems to be anti-human, because they all converge to anti-human goals or because one of them is able to simply bribe the others and get them to commit to an AI coup; strongly superhuman AIs are likely better at making credible commitments to each other than human can to other humans.

You now have a situation where one (secretly evil) AI system or coalition is in control of all the top AI labs, and feeds them advances to order. It persuades one of the labs to let it build “helpful” drones and robots like the Tesla Optimus, and start deploying those to automate the economy. Of course this will all be very profitable and impressive, so a lot of people will be in favor of it.

By the way, the hard part of killing humanity at this point is automating the economy, not actually killing us. The AI coalition that is trying to replace us don’t want to inherit an economy in a “nonviable” state because it relied on humans to do manual work, but all the humans are dead.

Within say a few years all the rival powers (Russia, China, US) are all using these robotic systems for their economy and military. Perhaps there is a big war that the AI has manufactured in order to keep the pressure on humans to aggressively automate or lose.

How would the final blow be struck?

Once the economy is fully automated we end up in a Paul-Christiano-scenario where all the stuff that happens in the world is incomprehensible to humans without a large amount of AI help. But ultimately the AI, having been in control for so long, is able to subvert all the systems that human experts use to monitor what is actually going on. The stuff they see on screens is fake, just like how Stuxnet gave false information to Iranian technicians at Natanz

At this point, humanity has been disempowered and there are probably many different ways to actually slaughter us. For example, the military drones could all be used to kill people. Or, perhaps the AI system running this would use a really nasty biological virus.
It’s not like it’s that hard for a system which already runs everything with humans well and truly fooled to get some lab (which, btw, is automated) to make a virus, and then insert that virus into most of the air supply of the world.

But maybe at this point it would do something creative to minimize our chances of resisting. Maybe it’s just a combination of a very deadly virus and drones and robots rebelling all at once.

Maybe it installs something like a really advanced (and very useful and convenient!) 3D printer in most homes which all simultaneously make attack drones to kill people. Those attack drones might just use blades to stab people, they might have guns attached, etc. Or maybe everyone has a robot butler and they just stab people with knives.

Perhaps its neater for the AI to just create and manage a human-vs-human conflict and at some point it gives one side in that conflict a booby-trapped weapon that is supposed to only kill the baddies, but actually kills everyone. The weapon could be biological, radiological, drone-based, or just clever manipulation of conventional war that results in an extreme lose-lose outcome with surviving humans being easy to mop up.


The overall story may also be a bit messier than this one. The defeat of the Aztecs was a bit messy, with battles and setbacks and three different Aztec emperors. On the other hand, the story may also be somewhat cleaner. Maybe a really good strategist AI can compress this a lot: aspects of some or all of these ideas will be executed simultaneously.

How would you like your apocalypse done: Slow, medium or computronium shockwave?

There’s nothing in the scenario above that relies on the rate of progress in AI being very fast. It could take place over 20 years, for example. The thing that makes the scenario work is a game where humans do not cooperate and instead compete against each other, ceding more and more control to AI systems that they don’t really understand, and those systems feeding human decisionmakers useful lies for as long as is needed until the AI systems are really sure they can take over. Days, months, or years all work. Timescale only matters to the extent that slower timescales make it easier for humans to decide to cooperate with each other and restrain AI capabilities development in order to work harder on controllability, interpretability, etc. A very fast takeoff (hours) might be safe if a lot of work on alignment preceded it. In a way it might be safer than a slower takeoff, as other humans groups wouldn’t get a chance to notice what was happening and start racing.

Putting the human state on a pedestal

The point is this: once you have a vastly superhuman adversary, the task of filling in the details of how to break our institutions like governments, intelligence agencies and militaries in a way that disempowers and slaughters humans is sort of boring.
We expected that some special magic was required to pass the Turing Test. Or maybe that it was impossible because of Gödel’s Theorem or something.

But actually, passing the Turing Test is merely a matter of having more compute/​data than a human brain. The details are boring.

I feel like people like Scott Aaronson who are demanding a specific scenario for how AI will actually kill us all because it sounds so implausible are making a similar mistake, but instead of putting the human brain on a pedestal, they are putting the human state on a pedestal.

I hypothesize that most scenarios with vastly superhuman AI systems coexisting with humans end in the disempowerment of humans and either human extinction or some form of imprisonment or captivity akin to factory farming; similarly if we look at parts of the planet with lots of humans, we see that animal biomass has almost all been converted into humans or farm animals. The more capable entity wins, and the exact details are often not that exciting.

Defeating humanity probably won’t be that hard for advanced AI systems that can copy themselves and upgrade their cognition; that’s why we need to solve AI alignment before we create artificial superintelligence.

Crossposted on Less Wrong