How could a moratorium fail?

The debates this week have clarified a number of things in my mind, and I think that has been useful. At the same time, I think there’s a lack of clarity about what was proposed, and what the objections are. Given that, I wanted to summarize my position, and then, despite being convinced that there were some pitfalls which I had not considered, explain why the arguments made have not changed my mind overall.

What is being proposed?

The overall proposal is premised on considering future AI systems potentially dangerous. Because of the danger they can pose, we should not allow such systems to be trained unless and until each such system is evaluated for likely risks, and then should not allow it to be released until it’s been shown to be safe via testing. (As an aside, this isn’t treating these future AI systems like nuclear weapons, it’s treating them like buildings, where builders submit plans for approval before building, the plans are checked to ensure the building is safe, up to code, and not breaking laws, and after all that occurs, inspections are required before it can be used.) The proposed mechanism for not allowing such models to be created is an international treaty, because unilateral action is not helpful. Despite this, as outlined below, the treaty could fail. It might be rejected, might be accepted but fail to stop some dangerous systems, or might restrict AI more than is ideal. But if it succeeds, there would be an enforceable set of guidelines that will be developed, mitigating or eliminating the risk of such systems. In addition, such a treaty needs to be backed up by a compliance and verification regime, which would also need to be negotiated. This regime should include restrictions that make it unlikely that potentially dangerous AI is being developed outside the review framework. There are legitimate concerns people might have about the details or general idea of such a treaty, but the arguments I have seen don’t convince me that pushing for a moratorium like the one outlined isn’t the most critical avenue for the world to pursue in order to mitigate what many agree is an existential risk.

Why do people disagree?

Opponents to a moratorium, including Nora Belrose have offered many reasons that efforts around negotiating a global treaty to prevent dangerously powerful and insufficiently aligned systems would fail—both in their essays, and in the comments responding to both their own and other essays. Given that, I want to break down some of the objections and clarify why I don’t find any of these objections compelling. I think there are three general points being made; that many plans are likely to fail, that failure makes things worse, or that success would be bad. Before that, there’s a often-repeated supposition among those opposed to any such measures that maybe alignment is easy. I think this is possible, especially given sufficient attention from researchers, AI companies, and governments. I’m still skeptical, but certainly don’t think it’s fair to make plans that depend on the assumption that it’s impossible, or at least can’t be done given current plans, without defending that position—as Rob Bensinger would, and has done elsewhere.

Many plans are likely to fail

It seems implausible to opponents of a moratorium that the world coordinates well enough to stop future malevolent AI. I think this presumes that any efforts are all-or-nothing, and in some cases, explicitly claims that we need a global dictatorship to make it work. But we’re not talking about a thought experiment, we’re talking about an actual process—so we need to look at what actual processes will lead to to understand what would happen, and what is worth pursuing. I think the claim that it’s impossible to stop AI misunderstands both how global treaties and regulation work, and how such negotiations fail. Countries generally negotiate treaties based on input from the public, experts, and diplomats. They may have mechanisms for verification that succeed or fail, they may have provisions that are implicitly ignored or explicitly rejected, or they may not be accepted or ratified. In each case, this isn’t the end of the story. Nuclear arms reductions negotiations were extensive and often stalled, but they kept getting restarted because both sides had an incentive to reduce risk. Whatever treaties get negotiated, if any, may not be sufficient to stop dangerous AI development. Moreover, I agree that treaties are nearly certain to fail at their most extreme goals, such as fully stopping progress or completely preventing advances by non-complying nations. At the same time that I agree those failure modes are likely, I think there is essentially no way they overshoot and lock everyone into a position where everyone wants to build mutually-beneficial AI, but cannot because the treaty prevents it, short of a global government that’s legally or bureaucratically incapable of making changes. And the reasons that won’t happen include the fact that every country wants some degree of sovereignty, there’s no path for it to happen, and so on. It’s bizarre to hypothesize that any concrete governance building process will metastasize into a global dictatorship without anyone noticing and stopping it. But it’s entirely possible that the world fails to make a treaty.

Failure makes things worse

There are a few ways that trying to get a treaty and having it fail could be bad. The first set of failures is if a treaty is created, but it doesn’t achieve its objectives. The second set is if no treaty is created. Perhaps a failed international treaty differentially advantages the bad guys, those who refuse to cooperate or comply with international consensus. Canada doesn’t have nuclear weapons, because they followed the rules, but North Korea does. This is a real problem—but I’ll note that nuclear programs that existed in the 1970s, including South Africa, which renounced its nuclear program, mostly didn’t continue. The nuclear non-proliferation treaty undoubtedly slowed the process of technology diffusion. Analogously, it seems implausible that any treaty on AI would cause signatory states to renounce all AI research, and those states and the world’s researchers would explicitly be trying to develop safety measures and continue to progress on building safe AI—and any treaty would, implicitly or explicitly, allow them to continue if it becomes clear that non-signatory states are racing to build AI. On the other hand, perhaps a treaty bans building powerful processors without controls, and those evading the controls have an advantage, and again, no-one bothers to relax restrictions for other nations. Supposing a failure that precludes these responses is assuming, preposterously, that no-one has bothered considering this, and that treaties would be created without any means to address it. Another risk is that if no treaty is created, the attempts to create one could lead to a focus on international rules that might distract from useful work elsewhere. I admit this, and think it requires care. There is definitely room for a combination of industry initiatives to ensure safety and agree not to scale to dangerous levels or allow or pursue dangerous applications, as well as national responses on regulating safety and unsafe large models. They don’t replace an international agreement, but it’s not an either-or question. So I certainly agree that there’s good reason for companies to build industry standards via self-governance, and countries can and should regulate dangerous models internally, whether or not any treaty is pursued or agreed to. These are complementary approaches, not alternative ones.

Success may be bad

The idea of a temporary pause without any further action is a bad idea. There may be those who disagree, but it seems that even those who support a pause, in the most literal, short term, and naive sense, have said that it’s not sufficient. I would go further, because I think that any simple pause with a fixed term would be useless and damaging. It could create hardware or algorithmic overhang, it wouldn’t actually make companies stop, and so on. But a more permanent moratorium doesn’t mean stopping forever, for several reasons. First, as mentioned above, any treaty will have actual provisions governing what can happen, and what cannot. No-one seems to be advocating a treaty that doesn’t allow some mechanism for review. I’m very concerned that the criteria are going to be far too weak, rather than too strong—but pushing not to have a treaty certainly doesn’t fix that. In any case, if a model is safe according to criteria specified, and/​or it passes safety review mechanisms, it would be allowed. And if AI labs think that the criteria are too strict, they have every ability to push for changes. Second, there are concerns that a treaty with enforcement mechanisms would trigger a nuclear war. This, again, seems to ignore what actually happens in treaty negotiations. Even if a treaty explicitly permits individual member states to act militarily to enforce it, countries need to choose to go to war—treaties aren’t smart contracts in blockchains with automated responses. And even treaties that are widely agreed on, but later become irrelevant, don’t actually stay in force. Countries acting to enforce a treaty do so by consensus, or they cause international incidents or wars—but that can happen in an escalating arms race anyways, and treaties often provide otherwise missing mechanisms to resolve disputes—for example, as mentioned above, review of models. Third, if a treaty is so successful that it actually stops progress in AI, opponents seem to stipulate a dichotomy in which there are only two ways for AI progress to continue; either we have unfettered AI development and safety is solved because safety is easy, or we have global dictatorship via omnisurveillance and a ML-powered boot stomping humanity forever. (I, in contrast, think that developing AGI under the control of governments is a far riskier proposition in terms of enabling or creating dictatorships!) To address the concern about dictatorships being the result of not pursuing AGI, I’ll again note that governments would need to agree to this. And looking at other domains, even the strongest proposed versions of nuclear arms deals put nuclear weapons under control of international bodies, and even the strongest proposals for controlling AI don’t include much more than stopping production of high-end GPUs, monitoring manufacturing, and requiring that a very small number of people not do what they want in building AI. A global CERN for AI isn’t emperor Palpatine, or Kang the conqueror. Even banning chips that allow better video on computers, better animation in movies, and better games on next-gen consoles is not a big inconvenience for most people—and if it is tragic for gamers and AI researchers, the reduced access to compute is still eminently survivable even for them. Next, to address the question of whether alignment is easy, I think we need to not assume it will be. Some opponents are incredibly optimistic that we can solve the problems of AI safety on the fly, but others strongly disagree. And if the optimists assume they are correct, they are betting all our lives on their prediction, and I think we want to agree not to allow unilateralist risk taking on this scale. And given that, contra Matthew Barnett, the treaties [being discussed] aren’t permanent. Once the systems are aligned with our values, and agreed to be beneficial, they should and would be deployed—the critical issue, which Holly Elmore points out, is that the burden of proof must be on those building the systems, not on those opposed to them. But once that burden of proof is addressed, with suitable reviews for safety, these systems would be trained and deployed. To return to the final part of this objection, slowing progress is incredibly costly—I disagree with Paul Graham’s lack of caution, among other things, but agree that AI promises tremendous benefits, and delaying those benefits is not something to do lightly. The counter argument is that if and when we know it’s safe, and there are governance mechanisms in place, the benefits can be realized, and there is nothing forcing us to rush forward and take excess risks. This is a real tradeoff, but given the stakes—uncertain timelines, uncertainty about whether alignment is solvable, irreversible proliferation of models, and all of our lives on the line—my view is that AI requires significant caution.

What I changed my mind about

My biggest surprise was how misleading the terms being used were, and think that many opponents were opposed to something different than what supporters were interested in suggesting. Second, I was very surprised to find opposition to the claim that AI might not be safe, and could pose serious future risks, largely because the systems would be aligned by default—i.e. without any enforced mechanisms for safety. I also found out that there was a non-trivial group that wants to roll back AI progress to before GPT-4 for safety reasons, as opposed to job displacement and copyright reasons. I was convinced by Gerald Monroe that getting a full moratorium was harder than I have previously argued based on an analogy to nuclear weapons. (I was not convinced that it “isn’t going to happen without a series of extremely improbable events happening simultaneously”—largely because I think that countries will be motivated to preserve the status quo.) I am mostly convinced by Matthew Barnett’s claim that advanced AI could be delayed by a decade, if restrictions are put in place—I was less optimistic, or what he would claim is pessimistic. As explained above, I was very much not convinced that a policy which was agreed to be irrelevant would remain in place indefinitely. I also didn’t think that there’s any reason to expect a naive pause for a fixed period, but he convinced me that this is more plausible than I had previously thought—and I agree with him, and disagree with Rob Bensinger, about how bad this might be. Lastly, I have been convinced by Nora that the vast majority of the differences in positions is predictive, rather than about values. Those optimistic about alignment are against pausing, and in most cases, I think those pessimistic about alignment are open to evidence that specific systems are safe. This is greatly heartening, because I think that over time, we’ll continue to see evidence in one direction or another about what is likely, and if we can stay in a scout-mindset, we will (eventually) agree on the path forward.


To conclude on a slightly less hopeful note, I want to re-emphasize another dimension to the discussion, which is timing. Waiting for the evidence to be completely convincing even to skeptics, as the world seems to have done with global warming, in order to put plans in place is abandoning hope for a solution prematurely. The negotiations needed for a treaty will take time, and it’s far easier to step back or abandon a treaty if really robustly safe systems are being built, or there is clear progress on safety, and we conclude the risk was overblown. But “robustly safe systems” do not describe what is happening now, and I don’t think we should bet all of our lives on getting this right on the first try. We also can’t afford a foolhardy plan to wait until everyone is more confident that there is a danger, then really quickly get a global moratorium discussed, debated, negotiated, put into effect, and enforced. Opposing enforceable treaties which have the ability to ban dangerous projects is pushing to prevent democratic governance to evaluate and possibly stop risky models. While we’re uncertain, blanket opposition seems unjustifiable. The details of those mechanisms are critical, but they are things which should be debated in detail, not dismissed or opposed a priori because of some specific contingent detail.

This post is part of AI Pause Debate Week. Please see this sequence for other posts in the debate.