Explaining cybersecurity risk from AI to a general audience

Hi all, This text is meant for a general audience (people who don’t know much about AI risk or Cybersecurity) to help them understand concrete risks of AI, and how legislation could help.

I’d like this to be something you could send to your cousin/​friend who is interested in politics but doesn’t really care about AI risk, and have them “get it.”

To that effect, would love feedback on understandability, other useful points, and fact-checking, as I’m still learning about this topic.

Will cybersecurity capabilities cause the first mass casualties from AI?

Threat Scenario 1: Cyberterrorism

In 2026, a European ecoterrorist group will decide to take down Texas’s energy grid as punishment for the US’s failure to uphold climate commitments. They decide to leverage an open source AI model, like Meta’s Llama, to infiltrate the grid. The model has been trained to refuse dangerous prompts, but researchers have known how to remove that training since 2024. They find a guide published online that shows a beginner how to retrain the model to remove the safety guardrails, for less than $200. They connect the model to the internet, and instruct it to take down the regional grid.

The model injects malware into a reference site frequented by Texas grid engineers. When they visit the site, the malware moves into the control systems, abruptly increases electric flow, and blows out the electrical distribution transformers that route energy across a grid. The grid is down indefinitely—until the transformers have been replaced.

Unfortunately, this site is frequented by grid engineers all over the world, and transformers rapidly destruct across the globe. Transformers are highly specialized equipment, built in just a few factories in highly developed countries. These factories, unfortunately, are run on electricity that is no longer flowing. The developed world goes dark, the financial system crashes, markets for energy and crops have to be executed manually, and famine erupts across nations.

***

How likely is this scenario? Every frontier lab tests its new models for cyber capabilities, reflecting consensus that this is one of the most imminent threats posed by AI. Today, frontier models display rudimentary hacking abilities: more than a human who’s a complete beginner, but much less than the autonomous agent described in the grid failure scenario.

It’s not obvious when or if we will get to an AI that can or will conduct such a cyber offensive, but this scenario doesn’t depend on any other technical advancements or complex equipment. No nanobots, self-aware superintelligence, or access to anthrax are required for mass casualties and a breakdown of global order. Given that OpenAI is already working with the Pentagon to develop cybersecurity capabilities for the US government, we shouldn’t assume that the last piece—AI that can conduct cyberattacks—won’t fall into place.

The state of critical infrastructure cybersecurity today

For the average person, cybersecurity only enters our minds when we get notified that some company has lost our personal data. We know that we should not reuse passwords, click on sketchy links, or ignore emails about breaches, but sometimes we do, and the world spins on. The companies that hold our money, fly us across oceans, and supply our grocery stores are not impenetrable, but they’ve invested heavily in mitigating cyber attacks even when their users and employees make mistakes, not least because they are often financially on the hook for the damages.

In the past decade, though, hackers have been ramping up attacks on much less prepared targets: hospitals, gas pipelines, and municipal water systems. These are part of our critical infrastructure—the necessary functions of life. A breakdown of these physical systems affect a region all at once, meaning that individuals can’t easily bail each other out. Typically, hackers attacking critical infrastructure are motivated by money, so they deploy “ransomware” that disables the system until a payment is sent.

But not all hackers. Many nations, including the US, China, Russia, Israel, Iran, Germany, and the UK, have highly funded teams that treat cyberspace as a national security domain. They aggressively look for exploits in the critical infrastructure of other nations, but not to demand bitcoin. Instead, they sit on them—the exploits—and keep them as leverage. For example, Russia waited until Ukraine’s Independence Day celebration to shut down the power. Ukraine recovered their system after 3 days, but the purpose of the attack was politics, not profit.

At this point, we know that other superpowers can and have found back doors into our critical infrastructure. But since we’re also in theirs, and our economies are tightly intertwined, it’s unlikely that any superpower will escalate to massive physical infrastructure destruction. Softer aggression, like meddling with election reporting systems, is more likely.

But this fragile balance is dependent on a specific economics: while most security experts agree that all systems are fundamentally hackable, a more secure system can be prohibitively expensive to hack. The most powerful exploits are worth millions of dollars to government buyers, and the skill to identify and use them correctly is rare even for dedicated hackers.

If an AI approaches or surpasses the ability of human hacking teams, that economic balance changes. A terrorist group with little to lose could pull the trigger, possibly with much greater consequences than intended. Since one of the primary use cases for AI has been helping software engineers code faster, a skill extremely similar to hacking, it’s likely that cyber attack capability will be the first danger to human life posed by AI.

***

Threat Scenario 2: Superpower cyberconflict

Here’s another scenario. GPT-5 has been released, and it’s excellent at leveraging other tools and information on the internet, executing complex workflows, and making plans. In testing, GPT-5 has displayed cyber offensive capabilities near the level of expert human hackers. The US government is aware of these capabilities, but expects to use them exclusively. (The public version has been trained to refuse to participate in hacking.) Because it takes hundreds of millions of dollars to train the model, as well as specialized chips that China doesn’t currently have, the US feels secure that only they can leverage this new technology for national security.

Training a GPT-5 model is incredibly expensive and difficult, but stealing it could be cheap. Models have not been protected like national security secrets, they have more or less the standard precautions taken by any tech startup. Unbeknownst to OpenAI, the model weights—the output of all that expensive training—have been surreptitiously copied and sent to China via a simple vulnerability in a note-taking app used by an OpenAI researcher.

Chinese researchers use the stolen weights to recreate the model, and again use fine-tuning to remove any safeguards. They use the model’s cybersecurity capabilities to identify and patch their own vulnerabilities. While the US has identified vulnerabilities using the model, they get stuck not on technology but on politics. Vulnerabilities exist across a patchwork of regional companies and local governments, and legislation requiring them to update their systems and introduce new security protections erupts into partisan showboating about big government and over- regulation. China doesn’t have this problem, leading to a new imbalance in vulnerabilities.

So when the United States supplies Taiwan with military equipment, China strikes back, but not officially: Chinese hackers who are not formally part of the military shut down gas pipelines across America. The economy sputters, food and medicine shortages crop up, and the US population turns against the administration. With no ability to respond in kind, the US government has to decide whether to escalate to formal warfare with a major world power, while order breaks down internally.

***

Hardening our systems

The Biden administration has been taking steps to improve security for critical infrastructure, introducing new agencies to establish standards and improve information sharing about attacks. But enforcement can be difficult. The many corporations across the country that supply water, power, and communications have historically lobbied against regulations that increase their costs. And even if they didn’t, much of our infrastructure is under local, not federal, jurisdiction.

In some ways, a decentralized system is an advantage. In the scenarios above, the danger is everything going down at once. A few failures in an otherwise functioning country is something we’re more used to dealing with, like a natural disaster. If our infrastructure is managed by a patchwork of different software systems, it’s less likely (though not impossible) that a bad actor would be able to take them all down at once. Although if AI ever surpasses our current capabilities and can cheaply identify many exploits across a network, this assumption won’t hold.

Introducing more decentralization could make us more resilient. For example, micro-grids powered by local solar panels could help communities meet their essential needs during a grid attack. Unfortunately, the power utilities are financially incentivized to prevent that from happening. And if those household systems are also hacked, they wouldn’t help.

Another option sounds weirdly radical—take the internet out of things. We had power, water, and transportation before the internet, and it worked. They could still be bombed, or infiltrated by spies, but without the internet wiring them all together, extreme catastrophes caused by mass failure would be almost impossible.

A middle ground is requiring more separation—for example, nuclear power plants are required to be “air-gapped,” meaning operating on a private network that’s not directly connected to the internet. Applying this to all critical infrastructure would be expensive and not fool-proof, but would greatly reduce risk.

***

Threat Scenario 3: Superintelligence

Let’s look at one last scenario—a model has been trained, costing trillions of dollars, that supersedes human intelligence completely. Through calculations incomprehensible to us, it determines that its best course of action is to maximize human mortality.

The model identifies a mechanism (even despite air gaps!) to trigger meltdowns at nuclear power plants across the world. Using phishing attacks, malware, and an understanding of nuclear power plants, the model could trigger widespread radiation poisoning and nuclear winter without human dependency or a physical body.

While researchers are making great strides towards robots that are as physically capable as humans, at this point our corporeality cannot be replicated by bad cyber actors, AI or human. As we merrily roll along towards the Internet of Things, where a computer exists in every physical object, this protection will disappear. Just as we’ve decided voting is too important to be done online, perhaps we should take a look at the systems we most rely on every day.

***

What we can do

Unlike our critical infrastructure, frontier AI is not decentralized. Only a few companies have the funding and talent to move models towards these threat scenarios. This begs the question, then—what can they do to prevent these scenarios?

Today, every major lab in the US tests for cybersecurity capabilities, though it’s voluntary and unstandardized. And when or if these capabilities do develop, it’s unclear what will happen next. Most labs rely on a safety process that trains the model to refuse threatening requests, but researchers have demonstrated that these can sometimes be evaded through clever phrasing. If a model is released as open source (like Meta’s Llama), or stolen, security researchers have found that these safety guardrails can be easily removed.

The most obvious precautions we can take are ensuring that models, especially open source ones, are tested extremely carefully for cybersecurity capabilities, with reporting going to national security stakeholders.

Second, any model that has demonstrated cybersecurity capabilities should be protected from cybertheft, for example by being air-gapped. RAND has developed a detailed report on how models can be better protected.

These strategies focus on the threat (AI), not the target (critical infrastructure). There’s plenty we can do to harden our systems, and federal agencies have the power to set standards and enforce fines. With high-profile cyber breakdowns on credit unions and airlines affecting consumers in incredibly stressful ways, public interest in cybersecurity may be heightening. Creating a public scorecard can help create consumer pressure on businesses to do better, but unfortunately won’t do much for utilities that are natural monopolies. We’ll have to rely on regulators to set standards and enforce them.

Conclusion

Legislators at the federal, state, and municipal level are trying to wrap their heads around the risks posed by AI. Dialogue often focuses on extreme speculative threats like superintelligence taking over the world, or on immediate but less catastrophic concerns like bias in hiring. Cybersecurity risk falls somewhere in the middle—it’s an escalation of risk we already have today, and the first two threat scenarios are possible with relatively small technological leaps.

We’ll likely see an escalation of cybercriminals using AI for profit-motivated attacks before a major catastrophe. This will give us time to understand new cyber threats from AI, build up protections, and establish good communication between AI labs, the government, and targets. Of the many fears we have about AI, this one is among the most concrete and immediate—but also the most immediately addressable.