The Case for AI Safety Advocacy to the Public

tl;dr: Advocacy to the public is a large and neglected opportunity to advance AI Safety. AI Safety as a field is unfamiliar with advocacy, and it has reservations, some founded and others not. A deeper understanding of the dynamics of social change reveals the promise of pursuing outside game strategies to complement the already strong inside game strategies. I support an indefinite global Pause on frontier AI and I explain why Pause AI is a good message for advocacy. Because I’m American and focused on US advocacy, I will mostly be drawing on examples from the US. Please bear in mind, though, that for Pause to be a true solution it will have to be global.

The case for advocacy in general

Advocacy can work

I’ve encountered many EAs who are skeptical about the role of advocacy in social change. While it is difficult to prove causality in social phenomena like this, there is a strong historical case that advocacy has been effective at bringing about the intended social change through time (whether that change ended up being desirable or not). A few examples:

Even if advocacy only worked a little of the time or only served to tip the balance of larger forces, the stakes of AI risk are so high and AI risk advocacy is currently so neglected that I see a huge opportunity.

We can now talk to the public about AI risk

With the release of ChatGPT and other advances in state-of-the-art artificial intelligence in the last year, the topic of AI risk has entered the Overton window and is no longer dismissed as “sci-fi”. But now, as Anders Sandberg put it, the Overton window is moving so fast it’s “breaking the sound barrier”. The below poll from AI Policy Institute and YouGov (release 8/​11/​23) shows comfortable majorities among US adults on questions about AI x-risk (76% worry about extinction risks from machine intelligence), slowing AI (82% say we should go slowly and deliberately), and government regulation of the AI industry (82% say tech executives can’t be trusted to self-regulate).

What having the public’s support gets us

  • Opinion polls and voters that put pressure on politicians. Constituent pressure on politicians gives the AI Safety community more power to get effective legislation passed– that is, legislation which addresses safety concerns and requires us to compromise less with other interests– and it gives the politicians more power against the AI industry lobby.

  • The ability to leverage external pressure to improve existing strategies. With external pressure, ARC, for example, wouldn’t have to worry as much about being denied access to frontier models. With enough external pressure, the AI companies might be begging ARC to evaluate their work so that the people and the government get off their back! There isn’t much reason for AI companies to agree to corporate campaigns that ask them to adopt voluntary changes now, but those campaigns would be a lot more successful if pledging to them allowed AI companies to improve their image to a skeptical public.

  • The power of the government to slow industry on our side. Usually, it is considered a con of regulation that it slows or inhibits an industry. But, here, that’s the idea! Just to give an example, even policies that require actors to simply enumerate the possible effects of their proposals can immensely slow large projects. The National Environmental Protection Act (NEPA) requires assessments of possible effects on the environment to be written ahead of certain construction projects. It has been estimated (pdf, p. 14-15) that government agencies alone spend $1 billion a year preparing these assessments, and the average time it takes a federal entity to complete an assessment is 3.4 years. (The cost to private industry is not reported but expected to be commensurate.) In a world where some experts are talking about 5 year AI x-risk timelines, adding a few years to the AI development process would be a godsend.

Social change works best inside and outside the system

I believe that, for a contingent historical reason, EA AI Safety is only exploiting half of the spectrum of interventions. Because AI risk was previously outside the Overton window (and also, I believe, due to the technical predilections and expertise of its members), the well-developed EA AI Safety interventions tend to be “inside game”. Inside game means working within the system you’re trying to change, and outside game means working outside the system to put external pressure on it. The inside-outside game dynamic has been written about before in EA, to describe animal welfare tactics (incrementalist and welfarist vs radical liberationist). I also find it easier to start with the well-fleshed out example of my former cause area.

Here I’ve laid out some examples of interventions that fall across the inside-outside game spectrum in animal advocacy. I believe the spread in animal advocacy as a whole is fairly well-balanced. Effective animal altruist (EAA) interventions tend to be on the inside side, and rightly so, as these interventions were more neglected in the existing space when EAAs came on the scene. Animal welfare corporate campaigns and the ballot initiatives Question 3 (Massachusetts) and Proposition 12 (California) are responsible for a huge amount of EA’s impact. I have heard from several people/​organizations that they want to create “a Humane League for AI” and do corporate campaigns for AI reforms as a superior alternative to the public advocacy I was proposing, but the Humane League holds protests. Not only that, the Humane League is situated in an ecosystem where, if they don’t highlight the conditions animals are in, someone else, like the more radical Direct Action Everywhere, might do something like break into factory farms in the middle of the night, rescue some animals, and film the conditions they were in. The animal agriculture companies are more disposed to make voluntary welfare commitments when doing so is better for them than the alternative. This is how inside-gamers and outside-gamers can work together to get more of the change they want.

Here’s an inside-outside game spectrum I made for AI Safety to match the above one. This time I’ve added what I perceive as the Overton window for interventions within the AI Safety community, which is mostly inside game. This makes sense considering that it was not common knowledge that AI Safety itself was in the Overton window until this year– inside game interventions were the only kind that were tractable.
But now that AI risk is in the Overton window, there is a huge, wide-open opportunity to occupy that part of the spectrum. Not only are these outside game interventions highly neglected– they are also, in my experience, the ones that the public finds most legible and acceptable. In animal advocacy, the public is often confused by inside game interventions that involve the activists getting their hands dirty, so to speak.

In AI Safety, inside game interventions, like working at an AI company, can confuse the public about the level of danger we are in, because it’s not intuitive to most people that you would help build a dangerous technology to protect us from it. This can lead the public to think inside gamers are insincere, or attempting something like “regulatory capture”.

There’s also a risk to the AI Safety movement of getting captured by industry by hanging around inside it or depending on it for a paycheck. I believe AI Safety is already too complacent with the harms of AI companies and too friendly to AI company-adjacent narratives, such as that AI isn’t too dangerous to build because other technologies have been made safely, that (essentially) only their technology can solve alignment, or that cooperation with them to gain access to their models is the best way to pursue alignment (based on private conversations, I believe this is ARC’s approach). Not only does outside game directly synergize with inside game, but advocacy of the vanguard position pushes the Overton window in that direction, further increasing the chance of success through inside game by making those interventions seem more moderate and less controversial.

Pros and potential pitfalls of advocacy

Other pros of advocacy

Besides being the biggest opportunity we have at the moment, there are other pluses to advocacy:

  • We can recruit entirely new people to do it, because it draws on a different skill set. There’s no need to compete with alignment or existing governance work. I myself was not working on AI Safety before Pause advocacy was on the table.

  • It’s relatively financially cheap. The efficacy of many forms of advocacy, like open letters and protests, depends on sweat equity and getting buy-in, which are hard, but they don’t take a lot of money or materials. (Right now I’m fundraising for my own salary and very little else.)

  • Unlike technical alignment or policy initiatives, advocacy is ready to go, because advocacy’s role is not to provide granular policy mechanisms (the letter of the law), but to communicate what we want and why (the spirit of the law).

  • Personally, I feel wrong not to share warnings with the public (when we aren’t on track to fix the problem ourselves in a way that requires secrecy), so raising awareness of the problem along with a viable solution to the problem feels like a good in itself.

  • And, finally, I predict that advocacy activities could be a big morale boost, if we’d let them. Do you remember the atmosphere of burnout and resignation after the “Death with Dignity” post? The feeling of defeat on technical alignment? Well, there’s a new intervention to explore! And it flexes different muscles! And it could even be a good time!

Misconceptions about advocacy

“AI Safety advocacy will reflect negatively on the entire community and harm inside gamers.”

Many times since April, people have expressed their fear to me that the actions of anyone under a banner of AI Safety will reflect on everyone working on the cause and alienate potential allies. The most common fear is that AI companies will not cooperate with the AI Safety community if other people in the community are uncooperative with them, but I’ve also heard concerns about new and inexperienced people getting into AI policy, such as that uncouth advocates will upset diplomatic relationships in Washington. This does seem possible to me, but I can’t help but think the new people are mostly where the current DC AI insiders were ~5 years ago and they will probably catch up.

Funnily enough, even though animal advocates do radical stunts, you do not hear this fear expressed much in animal advocacy. If anything, in my experience, the existence of radical vegans can make it easier for “the reasonable ones” to gain access to institutions. Even just within EAA, Good Food Institute celebrates that meat-producer Tyson Foods invests in a clean meat startup at the same time the Humane League targets Tyson in social media campaigns. When the community was much smaller and the idea of AI risk more fringe, it may have been truer that what one member did would be held against the entire group. But today x-risk is becoming a larger and larger topic of conversation that more people have their own opinions on, and the risk of the idea of AI risk getting contaminated by what some people do in its name grows smaller.

“The asks are not specific or realistic enough.”

The goal of advocacy is not to formulate ideal policies. The letter of the law is not ultimately in any EA’s hands because, for example, US bills go through democratically elected legislatures that have their own processes of debate and compromise. Suggesting mechanistic policies (the letter of the law) is important work, but it is not sufficient— advocacy means communicating and showing popular support for what we want those policies to achieve (the spirit of the law) to have the greatest chance of obtaining effective policies that address the real problems.

Downsides to advocacy

The biggest downside I see to advocacy is the potential of politicizing AI risk. Just imagine this nightmare scenario: Trump says he wants to “Pause AI”. Overnight, the phrase becomes useless, just another shibboleth, and now anyone who wants to regulate AI is lumped in with Trump. People who oppose Trump begin to identify with accelerationism because Pause just seems unsavory now. Any discussion of the matter among politicians devolves into dog whistles to their bases. This is a real risk that any cause runs when it seeks public attention, and unfortunately I don’t think there’s much we can do to avoid it. Unfortunately, though, AI is going to become politicized whether we get involved in it or not. (I would argue that many of the predominant positions on AI in the community are already markers of grey tribe membership.) One way to mitigate the risk of our message becoming politicized is to have a big tent movement with a diverse coalition under it, making it harder to pigeonhole us.

Downside risks of continuing the status quo

If we change nothing and the AI Safety community remains overwhelmingly focused on technical alignment and people quietly attempting to reach positions of influence in government, I predict:

  • AI labs will control the narrative. If we continue to abdicate advocacy, the public will still receive advocacy messages. They will just be from the AI industry or politicians hoping to spin the issue in their favor. The politics will still get done, just by someone else, with less concern about or insight into AI Safety.

  • The EA AI Safety community will continue to entrench itself in strategies that help AI labs, but much of the influence that was hoped for in return will never materialize. The AI companies do not need the EA community and have little reason more than beneficence to do what the EA community wants. We should not be entrusting our future to Sam Altman’s beneficence.

  • The EA AI Safety community will continue entrusting much of its alignment-only agenda to racing AI labs, despite suspecting that timelines are likely too short for it to work.

  • It may be too late to initiate a Pause later, even if more EAs conclude that is needed, because the issue has become too politicized or because AI labs are too mixed up in their own regulation.

  • Society becomes “entangled” with advanced AI and likes using it. People are less amenable to pausing later, even if the danger is clearer.

The case for advocating AI Pause

My broad goal for AI Safety advocacy is to shift the burden of proof to its rightful place– onto AI companies to prove their product is safe–, rather than where it currently seems to be– on the rest of us to prove that AGI is potentially dangerous. There are other paths to victory, but, in my opinion, AI Pause is the best message to get us there. When I say AI Pause, I mean a global, indefinite moratorium on the development of frontier models until it is safe to proceed.

Pros and pitfalls of AI Pause

The public is shockingly supportive of pausing AI development. YouGov did a poll of US adults in the week following the FLI Letter release which showed majority support (58-61% across different framings) for a pause on AI research (Rethink Priorities replicated the poll and got 51%). Since then, support for similar statements has remained high among Americans. The most recent such poll, conducted August 21-27, shows 58% support for a 6-month pause.

Pause is possible to implement by taking advantage of chokepoints in the current development pipeline, such as Nvidia’s near monopoly on chip production and large amount of compute needed for training. It may not always be the case that we have these chokepoints, but by instituting a Pause now, we can slow further changes to the development landscape and have more time to adapt to them.

Because of the substantial resources required, there are also a small number of actors trying to develop frontier ML models today, which makes monitoring not only possible, but realistic. A future where monitoring and compute limitations are very difficult is conceivable, but if we start a Pause ASAP, we can bootstrap our way to at least slowing illicit model development indefinitely, if necessary.

Pause is a robust position and message. Advocacy messages have to be short, clear, and memorable. Many related AI Safety messages take far too many words to accurately convey (and even then, there’s always room for debate!). I do not consider “Align AI” a viable message for advocacy because the topic of alignment is nuanced and complex and misunderstanding even subtle aspects could lead to bad policies and bad outcomes. Similarly, a message like “Regulate AI” is confusing because there are many conflicting ways to regulate any aspect of AI depending on the goal. “Pause AI” is a simple and clear ask that is hard to misinterpret in a harmful way, and it entails sensible regulation and time for alignment research.

A Pause would address all AI harms (or keep those that have already arrived from getting worse), from employment displacement and labor issues to misinformation and the manipulation of social reality to weaponization to x-risk. Currently, AI companies are ahead of regulatory authorities and voters, who are still wrapping their heads around new AI developments. By default, they are being allowed to proceed until their product is proven too dangerous. The Pause message turns that around and puts it on labs to prove that their product is safe enough to develop.

Pause gives the chance for more and better alignment research to take place and it allows for the possibility that alignment doesn’t happen. On balance, I think Pause is not just a good advocacy message but would actually be the best way forward. There is, however, one major potential harm from a Pause policy that merits mentioning: hardware overhang, specifically hardware overhang due to improvements in training algorithms. If compute is limited but the algorithms for using compute to train models continue to get better, which seems likely and is more difficult to regulate than hardware, then using those algorithms on more compute could lead to discontinuities in capabilities that are hard to predict. It could mean that the next large training run puts us unexpectedly over the line into unaligned superintelligence. If putting limits on compute could make increases in training compute more dangerous, that risk needs to be weighed and accounted for. It’s conceivable that this form of overhang could present such a risk that a Pause was too dangerous. One way of mitigating this possibility is to regulate scheduled, controlled increases in compute allowed for training, which Jaime Sevilla has referred to as a “moving bright line”. I don’t believe this compute overhang objection defeats Pause because, on balance, I expect Pause to get us more time for alignment research and to implement solutions to overhang, such as more tightly controlling the production of hardware or through a greater understanding of the training process.

Pause advocacy can be helpful immediately because it doesn’t require us to hammer out exact policies. I have heard the argument that a Pause would be best if it stopped development right at the cusp of superintelligence, so that we could study the models most like superintelligence, and so Pause advocacy should start later. (1) I don’t know how people think they know where the line is right to up to superintelligence before which development is safe, (2) the Pause would be much more precarious if the next step after breaching it was superintelligence, so we should aim to stop with some cushion if we want the Pause to work, and (3) it will take an unknown amount of time to win support for and implement a Pause, so it’s risky to try to time its execution precisely.

How AI Pause advocacy can effect change

AI Pause advocacy could reduce p(doom) and p(AI harm) via many paths.

  1. If we advocate Pause, it could lead to a Pause, which would be good (as discussed in the previous section).

    1. Politicians will take note of public opinion via polls, letters, calls, public writing, protests, etc. and will consider Pause proposals safer bets.

    2. Some people in power will become directly convinced that Pause is the right policy and use their influence to advocate it.

    3. Some voters will be convinced that Pause is the right way forward and vote accordingly.

  2. When we advocate for Pause, it pushes the Overton window for many other AI Safety interventions aimed at x-risk which would also be good if implemented, including alignment or other regulatory schemes. The Pause message combats memes about safety having to be balanced with “progress”, so it creates more room for other kinds of regulation that are focused on x-risk mitigation.

  3. When we advocate Pause, it shifts the burden of proof from us to prove AI could be dangerous onto those making AI to prove it is safe. This is helpful for many AI Safety strategies, not just Pause.

    1. For example, a rigorous licensing and monitoring regime or regulations that put an economic burden on the AI industry will become more realistic when the public sees AI development as a risky activity, because politicians will have the support/​pressure they need to combat the industry lobby.

    2. AI companies may voluntarily adopt stronger safety measures for the sake of their public images and to gain the favor of regulators.

  4. When we advocate Pause, it re-anchors the discussion on a (I think, more appropriate) baseline of not developing AGI instead of the status quo. This will reduce loss aversion toward capabilities gains that might currently seem inevitable and reduce the ability of opponents to paint AI Safety as “Luddite”.

  5. Much of the public is baffled by the debate about AI Safety, and out of that confusion, AI companies can position themselves as the experts and seize control of the conversation. AI Safety is playing catch-up, and alignment is a difficult topic to teach the masses. Pause is a simple and clear message that the public can understand and get behind that bypasses complex technical jargon and gets right to the heart of the debate– if AI is so risky to build, why are we building it?

  6. Pause as a message fails safe– it doesn’t pattern match to anything dangerous the way that alignment proposals that involve increasing capabilities do. We have to be aware of how little control we will have over the specifics of how a message to the public manifests in policy. More complex and subtle proposals may be more appealing to EAs, but each added bit of complexity that is necessary to get the proposal right makes it more likely to be corrupted.

Audience questions

Comments on these topics would be helpful to me:

  • If you think you’ve identified a double crux with me, please share below!

  • To those working in more “traditional” AI Safety: Where in your work would it be helpful to have public support?

  • If there’s something unappealing to you about advocacy that wasn’t addressed here or previously in the debate, can you articulate it?

This post is part of AI Pause Debate Week. Please see this sequence for other posts in the debate.