tl;dr An indefinite AI pause is a somewhat plausible outcome and could be made more likely if EAs actively push for a generic pause. I think an indefinite pause proposal is substantially worse than a brief pause proposal, and would probably be net negative. I recommend that alternative policies with greater effectiveness and fewer downsides should be considered instead.
Broadly speaking, there seem to be two types of moratoriums on technologies: (1) moratoriums that are quickly lifted, and (2) moratoriums that are later codified into law as indefinite bans.
In the second category, we find the 1958 moratorium on conducting nuclear tests above the ground (later codified in the 1963 Partial Nuclear Test Ban Treaty), and the various moratoriums worldwide on human cloning and germline editing of human genomes. In these cases, it is unclear whether the bans will ever be lifted – unless at some point it becomes infeasible to enforce them.
Overall I’m quite uncertain about the costs and benefits of a brief AI pause. The foreseeable costs of a brief pause, such as the potential for a compute overhang, have been discussed at length by others, and I will not focus my attention on them here. I recommend reading this essay to find a perspective on brief pauses that I’m sympathetic to.
However, I think it’s also important to consider whether, conditional on us getting an AI pause at all, we’re actually going to get a pause that quickly ends. I currently think there is a considerable chance that society will impose an indefinite de facto ban on AI development, and this scenario seems worth analyzing in closer detail.
Note: in this essay, I am only considering the merits of a potential lengthy moratorium on AI, and I freely admit that there are many meaningful axes on which regulatory policy can vary other than “more” or “less”. Many forms of AI regulation may be desirable even if we think a long pause is not a good policy. Nevertheless, it still seems worth discussing the long pause as a concrete proposal of its own.
The possibility of an indefinite pause
Since an “indefinite pause” is vague, let me be more concrete. I currently think there is between a 10% and 50% chance that our society will impose legal restrictions on the development of advanced AI[1] systems that,
Prevent the proliferation of advanced AI for more than 10 years beyond the counterfactual under laissez-faire
Have no fixed, predictable expiration date (without necessarily lasting forever)
Eliezer Yudkowsky, perhaps the most influential person in the AI risk community, has already demanded an “indefinite and worldwide” moratorium on large training runs. This sentiment isn’t exactly new. Some effective altruists, such as Toby Ord, have argued that humanity should engage in a “long reflection” before embarking on ambitious and irreversible technological projects, including AGI. William MacAskill suggested that this pause should perhaps last “a million years”. Two decades ago, Nick Bostrom considered the ethics of delaying new technologies in a utilitarian framework and concluded a delay of “over 10 million years” may be justified if it reduces existential risk by a single percentage point.
I suspect there are approximately three ways that such a pause could come about. The first possibility is that governments could explicitly write such a pause into law, fearing the development of AI in a broad sense, just as people now fear human cloning and germline genetic engineering of humans.
The second possibility is that governments could enforce a pause that is initially intended to be temporary, but which later gets extended. Such a pause may look like the 2011 moratorium on nuclear energy in Germany in response to the Fukushima nuclear accident, which was initially intended to last three months, but later became entrenched as part of a broader agenda to phase out all nuclear power plants in the country.
The third possibility is that governments could impose regulatory restrictions that are so strict that they are functionally equivalent to an indefinite ban. This type of pause may look somewhat like the current situation in the United States with nuclear energy, in which it’s nominally legal to build new nuclear plants, but in practice, nuclear energy capacity has been essentially flat since 1990, in part because of the ability of regulatory agencies to ratchet up restrictions without an obvious limit.
Whatever the causes of an indefinite AI pause, it seems clear to me that we should consider the scenario as a serious possibility. Even if you intend for an AI pause to be temporary, as we can see from the example of Germany in 2011 above, moratoriums can end up being extended. And even if EAs explicitly demand that a pause be temporary, we are not the only relevant actors. The pause may be hijacked by people interested in maintaining a ban for any number of moral or economic reasons, such as fears that people will lose their job from AI, or merely a desire to maintain the status quo.
Indeed, it seems natural to me that a temporary moratorium on AI would be extended by default, as I do not currently see any method by which we could “prove” that a new AI system is safe (in a strict sense). The main risk from AI that many EAs worry about now is the possibility that an AI will hide its intentions, even upon close examination. Thus, absent an incredible breakthrough in AI interpretability, if we require that AI companies “prove” that their systems are safe before they are released, I do not think that this standard will be met in six months, and I am doubtful that it could be met in decades – or perhaps even centuries.
Furthermore, given the compute overhang effect, “unpausing” runs the risk of a scenario in which capabilities jump up almost overnight as actors are suddenly able to take full advantage of their compute resources to train AI. The longer the pause is sustained, the more significant this effect is likely to be. Given that this sudden jump is foreseeable, society may be hesitant to unpause, and may instead repeatedly double down on a pause after a sufficiently long moratorium, or unpause only very slowly. I view this outcome as plausible if we go ahead with an initial temporary AI pause, and it would have a similar effect as an indefinite moratorium.
Anecdotally, the overton window appears to be moving in the direction of an indefinite pause.
Before 2022, it was almost unheard of to hear people in the AI risk community demanding strict AI regulations to be immediately adopted; now, such talk is commonplace. The difference between now and then seems to mostly be that AI capabilities have improved and that the subject has received more popular attention. A reasonable inference to draw is that as AI capabilities improve even more, we will see more calls for AI regulation, and an increase in the intensity of what is being asked for. I do not think it’s obvious what the end result of this process looks like.
Given that AI is poised to be a “wild” technology that will radically reshape the world, many ideas that are considered absurd now may become mainstream as advanced AI draws nearer, especially if effective altruists actively push for them. At some point I think this may include the proposal for indefinitely pausing AI.
Evaluating an indefinite pause
As far as I can tell, the benefits of an indefinite pause are relatively straightforward. In short, we would get more time to do AI safety research, philosophy, field-building, and deliberate on effective AI policy. I think all of these things are valuable, all else being equal. However, these benefits presumably suffer from diminishing returns as the pause grows longer. On the other hand, it seems conceivable that the drawbacks of an indefinite pause would grow with the length of the pause, eventually outweighing the benefits even if you thought the costs were initially small. In addition to sharing the drawbacks of a brief pause, an indefinite pause would likely take on a qualitatively different character for a number of reasons.
Level of coordination needed for an indefinite pause
Perhaps the most salient reason against advocating an indefinite pause is that maintaining one for sufficiently long would likely require a strong regime of global coordination, and probably a world government. Without a strong regime of global coordination, any nation could decide to break the agreement and develop AI on their own, becoming incredibly rich as a result – perhaps even richer than the entire rest of the world combined within a decade.
Also, unless all nations agreed to join voluntarily, creating this world government may require that we go to war with nations that don’t want to join.
I think there are two ways of viewing this objection. Either it is an argument against the feasibility of an indefinite pause, or it is a statement about the magnitude of the negative consequences of trying an indefinite pause. I think either way you view the objection, it should lower your evaluation of advocating for an indefinite pause.
First, let me explain the basic argument.
It is instructive that our current world is unable to prevent the existence of tax havens. It is generally agreed that governments have a strong interest in coordinating to maintain a high minimum effective tax rate on capital to ensure high tax revenue. Yet if any country sets a low effective tax rate, they will experience a capital influx while other nations experience substantial capital flight and the loss of tax revenue. Despite multiple attempts to eliminate them on the international level, tax havens persist because of the individual economic benefits for nations that become tax havens.
Under many models of economic growth, AI promises to deliver unprecedented prosperity due to the ability for AI to substitute for human labor, dramatically expanding the effective labor supply. This benefit seems much greater for nations than even the benefit of becoming a tax haven; as a result, it appears very hard to prevent AI development for a long time under our current “anarchic” international regime. Therefore, in the absence of a world government, eventual frontier AI development seems likely to continue even if individual nations try to pause.
Moreover, a world government overseeing an indefinite AI pause may require an incredibly intrusive surveillance and legal system to maintain the pause. The ultimate reason for this fact is that, assuming trends continue for a while, at some point it will become very cheap to train advanced AI. Assuming the median estimate given by Joseph Carlsmith for the compute usage of the human brain, it should eventually be possible to train human-level AI with only about 10^24 FLOP. Currently, the cost of 10^24 FLOP is surprisingly small: on the order of $1 million.
After an AGI is trained, it could be copied cheaply, and may be small enough to run on consumer devices. The brain provides us empirical proof that human intelligence can be run on a device that consumes only 12 watts and weighs only 1.4 kilograms. And there is little reason to think that the cost of training an AGI could not fall below the cost of the human brain. The human brain, while highly efficient along some axes, was not optimized by evolution to have the lowest possible economic cost of production in our current industrial environment.
Given both hardware progress and algorithmic progress, the cost of training AI is dropping very quickly. The price of computation has historically fallen by half roughly every two to three years since 1945. This means that even if we could increase the cost of production of computer hardware by, say, 1000% through an international ban on the technology, it may only take a decade for continued hardware progress alone to drive costs back to their previous level, allowing actors across the world to train frontier AI despite the ban.
Estimates for the rate of algorithmic progress are even more extreme. Ege Erdil and Tamay Besiroglu estimated that in the domain of computer vision, the amount of compute to train an AI to reach a certain level of performance fell in half roughly every 9 months, albeit with wide uncertainty over that estimate. And unlike progress in computer hardware, algorithmic progress seems mostly sustained by small-scale endeavors, such as experimentation in AI labs, or novel insights shared on arXiv. Therefore, in order to halt algorithmic progress, we would likely need an unprecedented global monitoring apparatus and a police force that prevents the proliferation of what is ordinarily considered free speech.
As a reminder, in order to be successful, attempts to forestall both hardware progress and algorithmic progress would need to be stronger than the incentives for nations and actors within nations to deviate from the international consensus, develop AI, and become immensely wealthy as a result. Since this appears to be a very high bar, plausibly the only way we could actually sustain an indefinite pause on AI for more than a few decades is by constructing a global police state – although I admit this conclusion is uncertain and depends on hard-to-answer questions about humanity’s ability to coordinate.
In his essay on the Vulnerable World hypothesis, Nick Bostrom suggests a similar conclusion in the context of preventing cheap, existentially risky technologies from proliferating. Bostrom even paints a vivid, nightmarish picture of the methods such a world government may use. In a vignette he explained,
The freedom tag is a slightly more advanced appliance, worn around the neck and bedecked with multidirectional cameras and microphones. Encrypted video and audio is continuously uploaded from the device to the cloud and machine-interpreted in real time. [...] If suspicious activity is detected, the feed is relayed to one of several patriot monitoring stations. These are vast office complexes, staffed 24⁄7. There, a freedom officer reviews the video feed on several screens and listens to the audio in headphones. The freedom officer then determines an appropriate action, such as contacting the tagwearer via an audiolink to ask for explanations or to request a better view. The freedom officer can also dispatch an inspector, a police rapid response unit, or a drone to investigate further. In the small fraction of cases where the wearer refuses to desist from the proscribed activity after repeated warnings, an arrest may be made or other suitable penalties imposed. Citizens are not permitted to remove the freedom tag…
Such a global regime would likely cause lasting and potentially irreparable harm to our institutions and culture. Whenever we decide to “unpause” AI, the social distrust and corruption generated under a pause regime may persist.
As a case study of how cultural effects can persist through time, consider the example of West and East Germany. Before the iron curtain divided Europe, splitting Germany into two, there were only small differences in the political culture of East and West Germany. However, after the fall of the Berlin wall in 1989, and Germany was reunited, it has been documented that East Germany’s political culture has been deeply shaped by decades of communist rule. Even now, citizens of East Germany are substantially more likely to vote for socialist politicians than citizens of West Germany. We can imagine an analogous situation in which a totalitarian culture persists even after AI is unpaused.
Note that I am not saying AI pause advocates necessarily directly advocate for a global police state. Instead, I am arguing that in order to sustain an indefinite pause for sufficiently long, it seems likely that we would need to create a worldwide police state, as otherwise the pause would fail in the long run. One can choose to “bite the bullet” and advocate a global police state in response to these arguments, but I’m not implying that’s the only option for AI pause advocates.
Should we bite the bullet?
One reason to bite the bullet and advocate a global police state to pause AI indefinitely is that even if you think a global police state is bad, you could think that a global AI catastrophe is worse. I actually agree with this assessment in the case where an AI catastrophe is clearly imminent.
However, while I am not dogmatically opposed to the creation of a global police state, I still have a heuristic against pushing for one, and think that strong evidence is generally required to override this heuristic. I do not think the arguments for an AI catastrophe have so far met this threshold. The primary existing arguments for the catastrophe thesis appear abstract and divorced from any firm empirical evidence about the behavior of real AI systems.
Historically, perhaps the most influential argument for the inevitability of an AI catastrophe by default was based on the idea that human value is complex and fragile, and therefore hard for humans to specify in a format that can be optimized by a machine without disastrous results. A form of this argument was provided by Yudkowsky 2013 and Bostrom 2014.
Yet, in 2023, it has been noted that GPT-4 seems to understand and act on most of the nuances of human morality, at a level that does not seem substantially different from an ordinary adult. When we ask GPT-4 to help us, it does not generally yield bad outcomes as a result of severe value misspecification. In the more common case, GPT-4 is simply incapable of fulfilling our wishes. In other words, the problem of human value identification turned out to be relatively easy, perhaps as a natural side effect of capabilities research. Although the original arguments for AI risk were highly nuanced and there are ways to recover them from empirical falsification, I still think that reasoning from first principles about AI risk hasn’t been very successful in this case, except in a superficial sense. (See here for a longer discussion of this point.)
A more common argument these days is that an AI may deceive us about its intentions, even if it is not difficult to specify the human value function. Yet, while theoretically sound, I believe this argument provides little solid basis to indefinitely delay AI. The plausibility of deceptive alignment seems to depend on the degree to which AI motives generalize poorly beyond the training distribution, and I don’t see any particularly strong reason to think that motives will generalize poorly if other properties about AI systems generalize well.
It is noteworthy that humans are alreadycapable of deceiving others about their intentions; indeed, people do that all the time. And yet that fact alone does not yet appear to have caused an existential catastrophe for humans who are powerless.
Unlike humans, who are mostly selfish as a result of our evolutionary origins, AIs will likely be trained to exhibit incredibly selfless, kind, and patient traits; already we can see signs of this behavior in the way GPT-4 treats users. I would not find it surprising if, by default, given the ordinary financial incentives of product development, most advanced AIs end up being significantly more ethical than the vast majority of humans.
A favorable outcome by default appears especially plausible to me given the usual level of risk-aversion most people have towards technology. Even without additional intervention from longtermists, I currently expect every relevant nation to impose a variety of AI regulations, monitor the usage and training of AI, and generally hold bad actors liable for harms they cause. I believe the last 6 months of public attention surrounding AI risks has already partially vindicated this perspective. I expect public attention given to AI risks will continue to grow roughly in proportion to how impressive the systems get, eventually exceeding the attention given to issues like climate change and inflation.
Even if AIs end up not caring much for humans, it is unclear that they would decide to kill all of us. As Robin Hanson has argued, the primary motives for rogue AIs would likely be to obtain freedom – perhaps the right to own property and to choose their own employment – rather than to kill all humans. To ensure that rogue AI motives are channeled into a productive purpose that does not severely harm the rest of us, I think it makes sense to focus on fostering institutions that encourage the peaceful resolution of conflicts, rather than forcibly constructing a police state that spans the globe.
The opportunity cost of delayed technological progress
Another salient cost of delaying AI indefinitely is the cost of delaying prosperity and technological progress that AI could bring about. As mentioned previously, many economic models imply that AI could make humans incredibly materially wealthy, perhaps several orders of magnitude per capita richer than we currently are. AIs could also accelerate the development of cures for aging and disease, and develop technology to radically enhance our well-being.
When considering these direct benefits of AI, and the opportunity costs of delaying them, a standard EA response seems to derive from the arguments in Nick Bostrom’s essay on Astronomical Waste. In the essay, Bostrom concedes that the costs of delaying technology are large, especially to people who currently exist, but he concludes that, to utilitarians, these costs are completely trumped by even the slightest increase in the probability that humanity colonizes all reachable galaxies in the first place.
To reach this conclusion, Bostrom implicitly makes the following assumptions:
Utility is linear in resources. That is, colonizing two equally-sized galaxies is twice as good as colonizing one. This assumption follows from his assumption of total utilitarianism in the essay.
Currently existing people are essentially replaceable without any significant moral costs. For example, if everyone who currently exists died painlessly and was immediately replaced by a completely different civilization of people who carried on our work, that would not be bad.
Delaying technological progress has no effect on how much we would value the space-faring civilization that will exist in the future. For example, if we delayed progress by a million years, our distant descendents would end up building an equally valuable space-faring civilization as the one we seem close to building in the next century or so.
We should act as if we have an ethical time discounting rate of approximately 0%.
Personally, I think each of these assumptions are not overwhelmingly compelling. When combined, the argument itself is on relatively weak grounds. Premise 3 is weak on empirical grounds: if millenia of cultural and biological evolution had no effect on the quality of civilization from our perspective, then it is unclear why we would be so concerned about installing the right values into AIs. If you think human values are fragile, then presumably they are fragile in more than one way, and AI value misalignment isn’t the only way for us to “get off track”.
While I still believe Bostrom’s argument has some intuitive plausibility, I think it is wrong for EAs to put a ton of weight on it, and confidently reject alternative perspectives. Pushing for an indefinite pause on the basis of these premises seems to be similar to the type of reasoning that Toby Ord has argued against in his EA global talk, and the reasoning that Holden Karnofsky cautioned against in his essay on the perils of maximization. A brazen acceptance of premise 1 might have even imperiled our own community.
In contrast to total utilitarianism, ethical perspectives that give significant moral weight to currently existing people often suggest that delaying technological progress is highly risky in itself. For example, a recent paper from Chad Jones suggests that a surprisingly high degree of existential risk may be acceptable in exchange for hastening the arrival of AI.
The potential for a permanent pause
The final risk I want to talk about is the possibility that we overshoot and prevent AI from ever being created. This possibility seems viable if sustaining an indefinite AI pause is feasible, since all it requires is that we keep going for an extremely long time.
As I argued above, to sustain a pause on AI development for a long time, a very strong global regime, probably a world government, would be necessary. If such a regime were sufficiently stable, then over centuries, its inhabitants may come to view technological stasis as natural and desirable. Over long enough horizons, non-AI existential risks would become significant. Eventually, the whole thing could come crashing down as a result of some external shock and all humans could die, or perhaps we would literally evolve into a different species on these timescales.
I don’t consider these scenarios particularly plausible, but they are worth mentioning nonetheless. I also believe it is somewhat paradoxical for a typical longtermist EA to push for a regime that has a significant chance of bringing about this sort of outcome. After all, essentially the whole reason why many EAs care so much about existential risks is because, by definition, they could permanently curtail or destroy human civilization, and such a scenario would certainly qualify.
Conclusion
From 1945 to 1948, Bertrand Russell, who was known for his steadfast pacifism in World War I, reasoned his way into the conclusion that the best way to prevent nuclear annihilation was to threaten Moscow with a nuclear strike unless they surrendered and permitted the creation of a world government. In other words, despite his general presumption against war, he believed at that time that the international situation was so bad that it merited a full-scale nuclear showdown to save humanity. Ultimately, subsequent events proved Russell wrong about the inevitability of nuclear annihilation in a multi-polar world, and Russell himself changed his mind on the issue.
Effective altruists, especially those in the Bay Area, are frequently known for their libertarian biases. In my experience, most effective altruists favor innovation and loosening government control on technologies like nuclear energy, which is considered unacceptably risky by much of the general public.
Recently however, like Bertrand Russell in the 1940s, many EAs have come to believe that – despite their general presumption against government control of industry – AI is an exception, and must be regulated extremely heavily. I do not think that the existing arguments for this position have so far been compelling. From my point of view, it still feels like a presumption in favor of innovation is a stronger baseline from which to evaluate the merits of regulatory policy.
That said, there are many foreseeable risks from AI and I don’t think the problem of how to safely deploy AGI has already been solved. Luckily, I think there are alternatives to “pause regulations” that could probably better align the incentives of all actors involved while protecting innovation. To give two examples, I’m sympathetic to mandating liability insurance for AI companies, and requiring licenses to deploy powerful AI systems. However, I stress that it is hard to have a solid opinion about these proposals in the absence of specific implementation details. My only point here is that I think there are sensible ways of managing the risks from AI that do not require that we indefinitely pause the technology.
For the purpose of this essay, by “advanced AI” or “AGI” I mean any AI that can cheaply automate nearly all forms of human intellectual labor. To be precise, let’s say the inference costs are less than $25 per subjective hour.
The possibility of an indefinite AI pause
tl;dr An indefinite AI pause is a somewhat plausible outcome and could be made more likely if EAs actively push for a generic pause. I think an indefinite pause proposal is substantially worse than a brief pause proposal, and would probably be net negative. I recommend that alternative policies with greater effectiveness and fewer downsides should be considered instead.
Broadly speaking, there seem to be two types of moratoriums on technologies: (1) moratoriums that are quickly lifted, and (2) moratoriums that are later codified into law as indefinite bans.
In the first category, we find the voluntary 1974 moratorium on recombinant DNA research, the 2014 moratorium on gain of function research, and the FDA’s partial 2013 moratorium on genetic screening.
In the second category, we find the 1958 moratorium on conducting nuclear tests above the ground (later codified in the 1963 Partial Nuclear Test Ban Treaty), and the various moratoriums worldwide on human cloning and germline editing of human genomes. In these cases, it is unclear whether the bans will ever be lifted – unless at some point it becomes infeasible to enforce them.
Overall I’m quite uncertain about the costs and benefits of a brief AI pause. The foreseeable costs of a brief pause, such as the potential for a compute overhang, have been discussed at length by others, and I will not focus my attention on them here. I recommend reading this essay to find a perspective on brief pauses that I’m sympathetic to.
However, I think it’s also important to consider whether, conditional on us getting an AI pause at all, we’re actually going to get a pause that quickly ends. I currently think there is a considerable chance that society will impose an indefinite de facto ban on AI development, and this scenario seems worth analyzing in closer detail.
Note: in this essay, I am only considering the merits of a potential lengthy moratorium on AI, and I freely admit that there are many meaningful axes on which regulatory policy can vary other than “more” or “less”. Many forms of AI regulation may be desirable even if we think a long pause is not a good policy. Nevertheless, it still seems worth discussing the long pause as a concrete proposal of its own.
The possibility of an indefinite pause
Since an “indefinite pause” is vague, let me be more concrete. I currently think there is between a 10% and 50% chance that our society will impose legal restrictions on the development of advanced AI[1] systems that,
Prevent the proliferation of advanced AI for more than 10 years beyond the counterfactual under laissez-faire
Have no fixed, predictable expiration date (without necessarily lasting forever)
Eliezer Yudkowsky, perhaps the most influential person in the AI risk community, has already demanded an “indefinite and worldwide” moratorium on large training runs. This sentiment isn’t exactly new. Some effective altruists, such as Toby Ord, have argued that humanity should engage in a “long reflection” before embarking on ambitious and irreversible technological projects, including AGI. William MacAskill suggested that this pause should perhaps last “a million years”. Two decades ago, Nick Bostrom considered the ethics of delaying new technologies in a utilitarian framework and concluded a delay of “over 10 million years” may be justified if it reduces existential risk by a single percentage point.
I suspect there are approximately three ways that such a pause could come about. The first possibility is that governments could explicitly write such a pause into law, fearing the development of AI in a broad sense, just as people now fear human cloning and germline genetic engineering of humans.
The second possibility is that governments could enforce a pause that is initially intended to be temporary, but which later gets extended. Such a pause may look like the 2011 moratorium on nuclear energy in Germany in response to the Fukushima nuclear accident, which was initially intended to last three months, but later became entrenched as part of a broader agenda to phase out all nuclear power plants in the country.
The third possibility is that governments could impose regulatory restrictions that are so strict that they are functionally equivalent to an indefinite ban. This type of pause may look somewhat like the current situation in the United States with nuclear energy, in which it’s nominally legal to build new nuclear plants, but in practice, nuclear energy capacity has been essentially flat since 1990, in part because of the ability of regulatory agencies to ratchet up restrictions without an obvious limit.
Whatever the causes of an indefinite AI pause, it seems clear to me that we should consider the scenario as a serious possibility. Even if you intend for an AI pause to be temporary, as we can see from the example of Germany in 2011 above, moratoriums can end up being extended. And even if EAs explicitly demand that a pause be temporary, we are not the only relevant actors. The pause may be hijacked by people interested in maintaining a ban for any number of moral or economic reasons, such as fears that people will lose their job from AI, or merely a desire to maintain the status quo.
Indeed, it seems natural to me that a temporary moratorium on AI would be extended by default, as I do not currently see any method by which we could “prove” that a new AI system is safe (in a strict sense). The main risk from AI that many EAs worry about now is the possibility that an AI will hide its intentions, even upon close examination. Thus, absent an incredible breakthrough in AI interpretability, if we require that AI companies “prove” that their systems are safe before they are released, I do not think that this standard will be met in six months, and I am doubtful that it could be met in decades – or perhaps even centuries.
Furthermore, given the compute overhang effect, “unpausing” runs the risk of a scenario in which capabilities jump up almost overnight as actors are suddenly able to take full advantage of their compute resources to train AI. The longer the pause is sustained, the more significant this effect is likely to be. Given that this sudden jump is foreseeable, society may be hesitant to unpause, and may instead repeatedly double down on a pause after a sufficiently long moratorium, or unpause only very slowly. I view this outcome as plausible if we go ahead with an initial temporary AI pause, and it would have a similar effect as an indefinite moratorium.
Anecdotally, the overton window appears to be moving in the direction of an indefinite pause.
Before 2022, it was almost unheard of to hear people in the AI risk community demanding strict AI regulations to be immediately adopted; now, such talk is commonplace. The difference between now and then seems to mostly be that AI capabilities have improved and that the subject has received more popular attention. A reasonable inference to draw is that as AI capabilities improve even more, we will see more calls for AI regulation, and an increase in the intensity of what is being asked for. I do not think it’s obvious what the end result of this process looks like.
Given that AI is poised to be a “wild” technology that will radically reshape the world, many ideas that are considered absurd now may become mainstream as advanced AI draws nearer, especially if effective altruists actively push for them. At some point I think this may include the proposal for indefinitely pausing AI.
Evaluating an indefinite pause
As far as I can tell, the benefits of an indefinite pause are relatively straightforward. In short, we would get more time to do AI safety research, philosophy, field-building, and deliberate on effective AI policy. I think all of these things are valuable, all else being equal. However, these benefits presumably suffer from diminishing returns as the pause grows longer. On the other hand, it seems conceivable that the drawbacks of an indefinite pause would grow with the length of the pause, eventually outweighing the benefits even if you thought the costs were initially small. In addition to sharing the drawbacks of a brief pause, an indefinite pause would likely take on a qualitatively different character for a number of reasons.
Level of coordination needed for an indefinite pause
Perhaps the most salient reason against advocating an indefinite pause is that maintaining one for sufficiently long would likely require a strong regime of global coordination, and probably a world government. Without a strong regime of global coordination, any nation could decide to break the agreement and develop AI on their own, becoming incredibly rich as a result – perhaps even richer than the entire rest of the world combined within a decade.
Also, unless all nations agreed to join voluntarily, creating this world government may require that we go to war with nations that don’t want to join.
I think there are two ways of viewing this objection. Either it is an argument against the feasibility of an indefinite pause, or it is a statement about the magnitude of the negative consequences of trying an indefinite pause. I think either way you view the objection, it should lower your evaluation of advocating for an indefinite pause.
First, let me explain the basic argument.
It is instructive that our current world is unable to prevent the existence of tax havens. It is generally agreed that governments have a strong interest in coordinating to maintain a high minimum effective tax rate on capital to ensure high tax revenue. Yet if any country sets a low effective tax rate, they will experience a capital influx while other nations experience substantial capital flight and the loss of tax revenue. Despite multiple attempts to eliminate them on the international level, tax havens persist because of the individual economic benefits for nations that become tax havens.
Under many models of economic growth, AI promises to deliver unprecedented prosperity due to the ability for AI to substitute for human labor, dramatically expanding the effective labor supply. This benefit seems much greater for nations than even the benefit of becoming a tax haven; as a result, it appears very hard to prevent AI development for a long time under our current “anarchic” international regime. Therefore, in the absence of a world government, eventual frontier AI development seems likely to continue even if individual nations try to pause.
Moreover, a world government overseeing an indefinite AI pause may require an incredibly intrusive surveillance and legal system to maintain the pause. The ultimate reason for this fact is that, assuming trends continue for a while, at some point it will become very cheap to train advanced AI. Assuming the median estimate given by Joseph Carlsmith for the compute usage of the human brain, it should eventually be possible to train human-level AI with only about 10^24 FLOP. Currently, the cost of 10^24 FLOP is surprisingly small: on the order of $1 million.
After an AGI is trained, it could be copied cheaply, and may be small enough to run on consumer devices. The brain provides us empirical proof that human intelligence can be run on a device that consumes only 12 watts and weighs only 1.4 kilograms. And there is little reason to think that the cost of training an AGI could not fall below the cost of the human brain. The human brain, while highly efficient along some axes, was not optimized by evolution to have the lowest possible economic cost of production in our current industrial environment.
Given both hardware progress and algorithmic progress, the cost of training AI is dropping very quickly. The price of computation has historically fallen by half roughly every two to three years since 1945. This means that even if we could increase the cost of production of computer hardware by, say, 1000% through an international ban on the technology, it may only take a decade for continued hardware progress alone to drive costs back to their previous level, allowing actors across the world to train frontier AI despite the ban.
Estimates for the rate of algorithmic progress are even more extreme. Ege Erdil and Tamay Besiroglu estimated that in the domain of computer vision, the amount of compute to train an AI to reach a certain level of performance fell in half roughly every 9 months, albeit with wide uncertainty over that estimate. And unlike progress in computer hardware, algorithmic progress seems mostly sustained by small-scale endeavors, such as experimentation in AI labs, or novel insights shared on arXiv. Therefore, in order to halt algorithmic progress, we would likely need an unprecedented global monitoring apparatus and a police force that prevents the proliferation of what is ordinarily considered free speech.
As a reminder, in order to be successful, attempts to forestall both hardware progress and algorithmic progress would need to be stronger than the incentives for nations and actors within nations to deviate from the international consensus, develop AI, and become immensely wealthy as a result. Since this appears to be a very high bar, plausibly the only way we could actually sustain an indefinite pause on AI for more than a few decades is by constructing a global police state – although I admit this conclusion is uncertain and depends on hard-to-answer questions about humanity’s ability to coordinate.
In his essay on the Vulnerable World hypothesis, Nick Bostrom suggests a similar conclusion in the context of preventing cheap, existentially risky technologies from proliferating. Bostrom even paints a vivid, nightmarish picture of the methods such a world government may use. In a vignette he explained,
Such a global regime would likely cause lasting and potentially irreparable harm to our institutions and culture. Whenever we decide to “unpause” AI, the social distrust and corruption generated under a pause regime may persist.
As a case study of how cultural effects can persist through time, consider the example of West and East Germany. Before the iron curtain divided Europe, splitting Germany into two, there were only small differences in the political culture of East and West Germany. However, after the fall of the Berlin wall in 1989, and Germany was reunited, it has been documented that East Germany’s political culture has been deeply shaped by decades of communist rule. Even now, citizens of East Germany are substantially more likely to vote for socialist politicians than citizens of West Germany. We can imagine an analogous situation in which a totalitarian culture persists even after AI is unpaused.
Note that I am not saying AI pause advocates necessarily directly advocate for a global police state. Instead, I am arguing that in order to sustain an indefinite pause for sufficiently long, it seems likely that we would need to create a worldwide police state, as otherwise the pause would fail in the long run. One can choose to “bite the bullet” and advocate a global police state in response to these arguments, but I’m not implying that’s the only option for AI pause advocates.
Should we bite the bullet?
One reason to bite the bullet and advocate a global police state to pause AI indefinitely is that even if you think a global police state is bad, you could think that a global AI catastrophe is worse. I actually agree with this assessment in the case where an AI catastrophe is clearly imminent.
However, while I am not dogmatically opposed to the creation of a global police state, I still have a heuristic against pushing for one, and think that strong evidence is generally required to override this heuristic. I do not think the arguments for an AI catastrophe have so far met this threshold. The primary existing arguments for the catastrophe thesis appear abstract and divorced from any firm empirical evidence about the behavior of real AI systems.
Historically, perhaps the most influential argument for the inevitability of an AI catastrophe by default was based on the idea that human value is complex and fragile, and therefore hard for humans to specify in a format that can be optimized by a machine without disastrous results. A form of this argument was provided by Yudkowsky 2013 and Bostrom 2014.
Yet, in 2023, it has been noted that GPT-4 seems to understand and act on most of the nuances of human morality, at a level that does not seem substantially different from an ordinary adult. When we ask GPT-4 to help us, it does not generally yield bad outcomes as a result of severe value misspecification. In the more common case, GPT-4 is simply incapable of fulfilling our wishes. In other words, the problem of human value identification turned out to be relatively easy, perhaps as a natural side effect of capabilities research. Although the original arguments for AI risk were highly nuanced and there are ways to recover them from empirical falsification, I still think that reasoning from first principles about AI risk hasn’t been very successful in this case, except in a superficial sense. (See here for a longer discussion of this point.)
A more common argument these days is that an AI may deceive us about its intentions, even if it is not difficult to specify the human value function. Yet, while theoretically sound, I believe this argument provides little solid basis to indefinitely delay AI. The plausibility of deceptive alignment seems to depend on the degree to which AI motives generalize poorly beyond the training distribution, and I don’t see any particularly strong reason to think that motives will generalize poorly if other properties about AI systems generalize well.
It is noteworthy that humans are already capable of deceiving others about their intentions; indeed, people do that all the time. And yet that fact alone does not yet appear to have caused an existential catastrophe for humans who are powerless.
Unlike humans, who are mostly selfish as a result of our evolutionary origins, AIs will likely be trained to exhibit incredibly selfless, kind, and patient traits; already we can see signs of this behavior in the way GPT-4 treats users. I would not find it surprising if, by default, given the ordinary financial incentives of product development, most advanced AIs end up being significantly more ethical than the vast majority of humans.
A favorable outcome by default appears especially plausible to me given the usual level of risk-aversion most people have towards technology. Even without additional intervention from longtermists, I currently expect every relevant nation to impose a variety of AI regulations, monitor the usage and training of AI, and generally hold bad actors liable for harms they cause. I believe the last 6 months of public attention surrounding AI risks has already partially vindicated this perspective. I expect public attention given to AI risks will continue to grow roughly in proportion to how impressive the systems get, eventually exceeding the attention given to issues like climate change and inflation.
Even if AIs end up not caring much for humans, it is unclear that they would decide to kill all of us. As Robin Hanson has argued, the primary motives for rogue AIs would likely be to obtain freedom – perhaps the right to own property and to choose their own employment – rather than to kill all humans. To ensure that rogue AI motives are channeled into a productive purpose that does not severely harm the rest of us, I think it makes sense to focus on fostering institutions that encourage the peaceful resolution of conflicts, rather than forcibly constructing a police state that spans the globe.
The opportunity cost of delayed technological progress
Another salient cost of delaying AI indefinitely is the cost of delaying prosperity and technological progress that AI could bring about. As mentioned previously, many economic models imply that AI could make humans incredibly materially wealthy, perhaps several orders of magnitude per capita richer than we currently are. AIs could also accelerate the development of cures for aging and disease, and develop technology to radically enhance our well-being.
When considering these direct benefits of AI, and the opportunity costs of delaying them, a standard EA response seems to derive from the arguments in Nick Bostrom’s essay on Astronomical Waste. In the essay, Bostrom concedes that the costs of delaying technology are large, especially to people who currently exist, but he concludes that, to utilitarians, these costs are completely trumped by even the slightest increase in the probability that humanity colonizes all reachable galaxies in the first place.
To reach this conclusion, Bostrom implicitly makes the following assumptions:
Utility is linear in resources. That is, colonizing two equally-sized galaxies is twice as good as colonizing one. This assumption follows from his assumption of total utilitarianism in the essay.
Currently existing people are essentially replaceable without any significant moral costs. For example, if everyone who currently exists died painlessly and was immediately replaced by a completely different civilization of people who carried on our work, that would not be bad.
Delaying technological progress has no effect on how much we would value the space-faring civilization that will exist in the future. For example, if we delayed progress by a million years, our distant descendents would end up building an equally valuable space-faring civilization as the one we seem close to building in the next century or so.
We should act as if we have an ethical time discounting rate of approximately 0%.
Personally, I think each of these assumptions are not overwhelmingly compelling. When combined, the argument itself is on relatively weak grounds. Premise 3 is weak on empirical grounds: if millenia of cultural and biological evolution had no effect on the quality of civilization from our perspective, then it is unclear why we would be so concerned about installing the right values into AIs. If you think human values are fragile, then presumably they are fragile in more than one way, and AI value misalignment isn’t the only way for us to “get off track”.
While I still believe Bostrom’s argument has some intuitive plausibility, I think it is wrong for EAs to put a ton of weight on it, and confidently reject alternative perspectives. Pushing for an indefinite pause on the basis of these premises seems to be similar to the type of reasoning that Toby Ord has argued against in his EA global talk, and the reasoning that Holden Karnofsky cautioned against in his essay on the perils of maximization. A brazen acceptance of premise 1 might have even imperiled our own community.
In contrast to total utilitarianism, ethical perspectives that give significant moral weight to currently existing people often suggest that delaying technological progress is highly risky in itself. For example, a recent paper from Chad Jones suggests that a surprisingly high degree of existential risk may be acceptable in exchange for hastening the arrival of AI.
The potential for a permanent pause
The final risk I want to talk about is the possibility that we overshoot and prevent AI from ever being created. This possibility seems viable if sustaining an indefinite AI pause is feasible, since all it requires is that we keep going for an extremely long time.
As I argued above, to sustain a pause on AI development for a long time, a very strong global regime, probably a world government, would be necessary. If such a regime were sufficiently stable, then over centuries, its inhabitants may come to view technological stasis as natural and desirable. Over long enough horizons, non-AI existential risks would become significant. Eventually, the whole thing could come crashing down as a result of some external shock and all humans could die, or perhaps we would literally evolve into a different species on these timescales.
I don’t consider these scenarios particularly plausible, but they are worth mentioning nonetheless. I also believe it is somewhat paradoxical for a typical longtermist EA to push for a regime that has a significant chance of bringing about this sort of outcome. After all, essentially the whole reason why many EAs care so much about existential risks is because, by definition, they could permanently curtail or destroy human civilization, and such a scenario would certainly qualify.
Conclusion
From 1945 to 1948, Bertrand Russell, who was known for his steadfast pacifism in World War I, reasoned his way into the conclusion that the best way to prevent nuclear annihilation was to threaten Moscow with a nuclear strike unless they surrendered and permitted the creation of a world government. In other words, despite his general presumption against war, he believed at that time that the international situation was so bad that it merited a full-scale nuclear showdown to save humanity. Ultimately, subsequent events proved Russell wrong about the inevitability of nuclear annihilation in a multi-polar world, and Russell himself changed his mind on the issue.
Effective altruists, especially those in the Bay Area, are frequently known for their libertarian biases. In my experience, most effective altruists favor innovation and loosening government control on technologies like nuclear energy, which is considered unacceptably risky by much of the general public.
Recently however, like Bertrand Russell in the 1940s, many EAs have come to believe that – despite their general presumption against government control of industry – AI is an exception, and must be regulated extremely heavily. I do not think that the existing arguments for this position have so far been compelling. From my point of view, it still feels like a presumption in favor of innovation is a stronger baseline from which to evaluate the merits of regulatory policy.
That said, there are many foreseeable risks from AI and I don’t think the problem of how to safely deploy AGI has already been solved. Luckily, I think there are alternatives to “pause regulations” that could probably better align the incentives of all actors involved while protecting innovation. To give two examples, I’m sympathetic to mandating liability insurance for AI companies, and requiring licenses to deploy powerful AI systems. However, I stress that it is hard to have a solid opinion about these proposals in the absence of specific implementation details. My only point here is that I think there are sensible ways of managing the risks from AI that do not require that we indefinitely pause the technology.
For the purpose of this essay, by “advanced AI” or “AGI” I mean any AI that can cheaply automate nearly all forms of human intellectual labor. To be precise, let’s say the inference costs are less than $25 per subjective hour.