We’re Not Ready: thoughts on “pausing” and responsible scaling policies

Views are my own, not Open Philanthropy’s. I am married to the President of Anthropic and have a financial interest in both Anthropic and OpenAI via my spouse.

Over the last few months, I’ve spent a lot of my time trying to help out with efforts to get responsible scaling policies adopted. In that context, a number of people have said it would be helpful for me to be publicly explicit about whether I’m in favor of an AI pause. This post will give some thoughts on these topics.

I think transformative AI could be soon, and we’re not ready

I have a strong default to thinking that scientific and technological progress is good and that worries will tend to be overblown. However, I think AI is a big exception here because of its potential for unprecedentedly rapid and radical transformation.1

I think sufficiently advanced AI would present enormous risks to the world. I’d put the risk of a world run by misaligned AI (or an outcome broadly similar to that) between 10-90% (so: above 10%) if it is developed relatively soon on something like today’s trajectory. And there are a whole host of other issues (e.g.) that could be just as important if not more so, that it seems like no one has really begun to get a handle on.

Is that level of AI coming soon, and could the world be “ready” in time? Here I want to flag that timelines to transformative or even catastrophically risky AI are very debatable, and I have tried to focus my work on proposals that make sense even for people who disagree with me on the below points. But my own views are that:

  • There’s a serious (>10%) risk that we’ll see transformative AI2 within a few years.

  • In that case it’s not realistic to have sufficient protective measures for the risks in time.

  • Sufficient protective measures would require huge advances on a number of fronts, including information security that could take years to build up and alignment science breakthroughs that we can’t put a timeline on given the nascent state of the field, so even decades might or might not be enough time to prepare, even given a lot of effort.

If it were all up to me, the world would pause now—but it isn’t, and I’m more uncertain about whether a “partial pause” is good

In a hypothetical world where everyone shared my views about AI risks, there would (after deliberation and soul-searching, and only if these didn’t change my current views) be a global regulation-backed pause on all investment in and work on (a) general3 enhancement of AI capabilities beyond the current state of the art, including by scaling up large language models; (b) building more of the hardware (or parts of the pipeline most useful for more hardware) most useful for large-scale training runs (e.g., H100’s); (c) algorithmic innovations that could significantly contribute to (a).

The pause would end when it was clear how to progress some amount further with negligible catastrophic risk and reinstitute the pause before going beyond negligible catastrophic risks. (This means another pause might occur shortly afterward. Overall, I think it’s plausible that the right amount of time to be either paused or in a sequence of small scaleups followed by pauses could be decades or more, though this depends on a lot of things.) This would require a strong, science-backed understanding of AI advances such that we could be assured of quickly detecting early warning signs of any catastrophic-risk-posing AI capabilities we didn’t have sufficient protective measures for.

I didn’t have this view a few years ago. Why now?

  • I think today’s state-of-the-art AIs are already in the zone where (a) we can already learn a huge amount (about AI alignment and other things) by studying them; (b) it’s hard to rule out that a modest scaleup from here—or an improvement in “post-training enhancements” (advances that make it possible to do more with an existing AI than before, without having to do a new expensive training run)4 - could lead to models that pose catastrophic risks.

  • I think we’re pretty far from being ready even for early versions of catastrophic-risk-posing models (for example, I think information security is not where it needs to be, and this won’t be a quick fix).

  • If a model’s weights were stolen and became widely available, it would be hard to rule out that model becoming more dangerous later via post-training enhancements. So even training slightly bigger models than today’s state of the art seems to add nontrivially to the risks.

All of that said, I think that advocating for a pause now might lead instead to a “partial pause” such as:

  • Regulation-mandated pauses in some countries and not others, with many researchers going elsewhere to work on AI scaling.

  • Temporary bans on large training runs, but not on post-training improvements or algorithmic improvements or expansion of hardware capacity. In this case, an “unpause”—including via new scaling methods that didn’t technically fall under the purview of the regulatory ban, or via superficially attractive but insufficient protective measures, or via a sense that the pause advocates had “cried wolf”—might lead to extraordinarily fast progress, much faster than the default and with a more intense international race.

  • Regulation with poor enough design and/​or enough loopholes as to create a substantial “honor system” dynamic, which might mean that people more concerned about risks become totally uninvolved in AI development while people less concerned about risks race ahead. This in turn could mean a still-worse ratio of progress on AI capabilities to progress on protective measures.

  • No regulation or totally mis-aimed regulation (e.g., restrictions on deploying large language models but not on training them), accompanied by the same dynamic from the previous bullet point.

It’s much harder for me to say whether these various forms of “partial pause” would be good.

To pick a couple of relatively simple imaginable outcomes and how I’d feel about them:

  • If there were a US-legislated moratorium on training runs exceeding a compute threshold in line with today’s state-of-the-art models, with the implicit intention of doing so until there was a convincing and science-backed way of bounding the risks—with broad but not necessarily overwhelming support from the general public—I’d consider this to be probably a good thing. I’d think this even if the ban (a) didn’t yet come with signs of progress on international enforcement; (b) started with only relatively weak domestic enforcement; and (c) didn’t include any measures to slow production of hardware, advances in algorithmic efficiency or post-training enhancements. In this case I would be hopeful about progress on (a) and (b), as well as on protective measures generally, because of the strong signal this moratorium would send internationally about the seriousness of the threat and the urgency of developing a better understanding of the risks, and of making progress on protective measures. I have very low confidence in my take here and could imagine changing my mind easily.

  • If a scaling pause were implemented using executive orders that were likely to be overturned next time the party in power changed, with spotty enforcement and no effects on hardware and algorithmic progress, I’d consider this pause a bad thing. This is also a guess that I’m not confident in.

Overall I don’t have settled views on whether it’d be good for me to prioritize advocating for any particular policy.5 At the same time, if it turns out that there is (or will be) a lot more agreement with my current views than there currently seems to be, I wouldn’t want to be even a small obstacle to big things happening, and there’s a risk that my lack of active advocacy could be confused with opposition to outcomes I actually support.

I feel generally uncertain about how to navigate this situation. For now I am just trying to spell out my views and make it less likely that I’ll get confused for supporting or opposing something I don’t.

Responsible scaling policies (RSPs) seem like a robustly good compromise with people who have different views from mine (with some risks that I think can be managed)

My sense is that people have views all over the map about AI risk, such that it would be hard to build a big coalition around the kind of pause I’d support most.

  • Some people think that the kinds of risks I’m worried about are far off, farfetched or ridiculous.

  • Some people think such risks might be real and soon, but that we’ll make enough progress on security, alignment, etc. to handle the risks—and indeed, that further scaling is an important enabler of this progress (e.g., a lot of alignment research will work better with more advanced systems).

  • Some people think the risks are real and soon, but might be relatively small, and that it’s therefore more important to focus on things like the U.S. staying ahead of other countries on AI progress.

I’m excited about RSPs partly because it seems like people in those categories—not just people who agree with my estimates about risks—should support RSPs. This raises the possibility of a much broader consensus around conditional pausing than I think is likely around immediate (unconditional) pausing. And with a broader consensus, I expect an easier time getting well-designed, well-enforced regulation.

I think RSPs represent an opportunity for wide consensus that pausing under certain conditions would be good, and this seems like it would be an extremely valuable thing to establish.

Importantly, agreeing that certain conditions would justify a pause is not the same as agreeing that they’re the only such conditions. I think agreeing that a pause needs to be prepared for at all seems like the most valuable step, and revising pause conditions can be done from there.

Another reason I am excited about RSPs: I think optimally risk-reducing regulation would be very hard to get right. (Even the hypothetical, global-agreement-backed pause I describe above would be hugely challenging to design in detail.) When I think something is hard to design, my first instinct is to hope for someone to take a first stab at it (or at least at some parts of it), learn what they can about the shortcomings, and iterate. RSPs present an opportunity to do something along these lines, and that seems much better than focusing all efforts and hopes on regulation that might take a very long time to come.

There is a risk that RSPs will be seen as a measure that is sufficient to contain risks by itself—e.g., that governments may refrain from regulation, or simply enshrine RSPs into regulation, rather than taking more ambitious measures. Some thoughts on this:

  • I think it’s good for proponents of RSPs to be open about the sorts of topics I’ve written about above, so they don’t get confused with e.g. proposing RSPs as a superior alternative to regulation. This post attempts to do that on my part. And to be explicit: I think regulation will be necessary to contain AI risks (RSPs alone are not enough), and should almost certainly end up stricter than what companies impose on themselves.

  • In a world where there’s significant political support for regulations well beyond what companies support, I expect that any industry-backed setup will be seen as a minimum for regulation. In a world where there isn’t such political support, I think it would be a major benefit for industry standards to include conditional pauses. So overall, the risk seems relatively low and worth it here.

  • I think it’d be unfortunate to try to manage the above risk by resisting attempts to build consensus around conditional pauses, if one does in fact think conditional pauses are better than the status quo. Actively fighting improvements on the status quo because they might be confused for sufficient progress feels icky to me in a way that’s hard to articulate.

Footnotes

  1. The other notable exception I’d make here is biology advances that could facilitate advanced bioweapons, again because of how rapid and radical the destruction potential is. I default to optimism and support for scientific and technological progress outside of these two cases.

  2. I like this discussion of why improvements on pretty narrow axes for today’s AI systems could lead quickly to broadly capable transformative AI.

  3. People would still be working on making AI better at various specific things (for example, resisting attempts to jailbreak harmlessness training, or just narrow applications like search and whatnot). It’s hard to draw a bright line here, and I don’t think it could be done perfectly using policy, but in the “if everyone shared my views” construction everyone would be making at least a big effort to avoid finding major breakthroughs that were useful for general enhancement of very broad and hard-to-bound suites of AI capabilities.

  4. Examples include improved fine-tuning methods and datasets, new plugins and tools for existing models, new elicitation methods in the general tradition of chain-of-thought reasoning, etc.

  5. I do think that at least someone should be trying it. There’s a lot to be learned from doing this—e.g., about how feasible it is to mobilize the general public—and this could inform expectations about what kinds of “partial victories” are likely.

Crossposted from LessWrong (199 points, 33 comments)