When safety is dangerous: risks of an indefinite pause on AI development, and call for realistic alternatives

The data is clear: Artificial superintelligence (ASI)--an automated system that exceeds human intellectual capabilities in nearly all areas, whether or not it is sentient or conscious—is coming. According to the most recent AI expert survey, the vast majority of industry professionals believe ASI is coming in the next few decades or less. Both in the Effective Altruism (EA) world and throughout wider society, the possible outcomes of ASI development, ranging from universal bliss to total extinction, with every dystopia in between, has caused justifiable concern.

Some EAs who are concerned with AI safety (including myself) have argued that current approaches to AI alignment are insufficient and that an indefinite pause (IP) on research progress could reduce existential risk from advanced AI. However, an IP faces challenges around technical feasibility, perverse incentives for rogue actors, and political coordination. This article weighs the evidence on whether an IP would likely increase or decrease existential risks relative to continuing the status quo (that is, no IP and high levels of x-risk).
_________________

Benefits of an Indefinite Pause

In the 16 years since EA philosopher and AI safety expert Eliezer Yudkowsky published Artificial Intelligence as a Positive and Negative Factor in Global Risk, the field of artificial intelligence has progressed enormously. AI systems can now exceed the best humans at complex games like Chess, Go and DotA 2, generate realistic imagery, speech, and music, automate customer service conversations, and even create literature: all skills once thought to be illustrative of human uniqueness.

Yet as capabilities advance, so does concern that both technical alignment and governance efforts seem to be lagging behind. Moreover, an alarming number of AI experts and safety advocates assign a probability of five percent to outcomes at least as bad as human extinction with the development of ASI (Grace, 2022). Considering the average person wouldn’t accept a five percent chance of crashing their car by driving in a snowstorm, it beggars belief why so many seem okay with the same level of probability that you, I, and everyone else will die at once at some time in the next few decades due to AI action.

This raises the question—isn’t it obvious that we pursue global policies to pause research until we solve technical alignment (even if it takes centuries)?

Unfortunately, it’s not so simple. While an indefinite pause could temporarily reduce accident risk, its efficacy as a strategy to reduce AI risks is questionable at best for the following reasons. It’s important to note that these are not the only risks of an indefinite pause, merely the ones that I found most pressing. My epistemic status here is around 60 percent, uncertain, but increasingly confident in the data available to me.

Political Risks

Critics of an indefinite pause aren’t just your proverbial “e/acc” techno-cultists: they include EA thinkers and AI safety advocates such as Matthew Barnett (2023), who concluded that pursuing an indefinite pause would actually increase our x-risk due to several characteristics of national governments and supranational organizations like the European Union (the entities most likely to enforce an indefinite pause). Specifically, Barnett identified three key weaknesses of governments that could render any indefinite pause useless and possibly dangerous:

1. Insufficient internal coordination: governments can’t provide enough oversight to stop AI development. Corruption, bureaucracy, special interests, and other mundane weaknesses of government can all weaken any AI enforcement regime, and any loopholes these inefficiencies create will be exploited by pro-ASI actors.

2. Non-participatory nations: some nations won’t agree to a global pause under any conditions. As AI policy expert Allan Dafoe notes, bans on highly destructive technologies often fail without widespread multilateral cooperation, as the incentives for defection are strong (Dafoe, 2022). If, for example, the United States, Taiwan and European Union paused but China and Russia did not, it could exacerbate risky dynamics and lead to large-scale nuclear conflict.

3. Lack of enforcement power: national governments are limited to their own borders and supranational organizations can’t back up the global ban with arrests and prosecutions. Without constitutionally empowering multi-national organizations such as the United Nations, NATO and African Union to actually pursue and prosecute rogue AI developers across international boundaries, a ban will simply restrict advancement in one country but allow (likely unethical and unaligned) unfettered growth in another.

4. Individual risk: in addition to state-level incentives, individual researchers or corporations could also defect from an indefinite pause, even one with the force of law—and this can be made worse by instability at the p0litical level and irrationality of legislation and governance strategies. As an example, Russia recently announced plans to ignore intellectual property rights around certain AI innovations (Naumov, 2022), potentially allowing unscrupulous individuals and businesses to steal intellectual property or create harmful technologies.

Technological Risks

As AI capabilities rapidly advance, some researchers have proposed deliberately slowing progress to allow time to improve safety practices (Askell et al., 2023). However, the massive accumulation of computing hardware globally seriously undermines the viability of long-term bans. Even if major players paused research, this oversupply of chips, servers, and accelerators entails could fuel rapid AI model advancement once restrictions ended. The explosion of compute applies equally to memory, data storage, cloud servers, and auxiliary electronics, all of which will likely advance during an AI pause.

The sheer economic power of the technology industry also works against placing limits on future compute growth. From algorithmic trading in finance to product recommendation engines in e-commerce, high performance computing is deeply entrenched into our economy. And if one thing is clear about a capitalist society, it’s that the love of money can break down any wall.

The Need for Practicable Alternatives: Conclusion and Invitation for Discussion

In light of the factors above, an indefinite pause seems (to me) like an ineffective way to mitigate x-risk associated with AGI, and may actually increase this risk. However, I do think that these approaches can be helpful as we start thinking about ways to limit our risk and improving long-term outcomes for humanity relative to AI development:

Supporting selective restrictions: Bans on training AIs on personal data or particularly dangerous capabilities (like facial recognition and photorealistic image generation) could help, even if limiting general advancement remains infeasible (Hadfield, 2022).
Incentivizing commercial safety: Policies like differential tax schemes for human-AI teams can help workers retain jobs during the transition away from a human-led world (Gunning et al., 2022).
Facilitating testbed institutions: Dafoe proposes intermediating bodies to “stress test” systems at scale, evaluating behaviors across simulated environments (Dafoe, 2022). Such infrastructure enables accountable development.

By inviting open discussion around the risks and benefits of an indefinite pause, it’s my goal to get the EA community thinking about alternative ways to keep us all safe and thriving in this age of AI. Feel free to leave your thoughts in the comments.

References & Further Reading

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2022). Anthropic: AI Safety for Advanced Systems. Anthropic. https://www.anthropic.com

Askell, A., Brundage, M., Hadfield, G. K., Herbert-Voss, A., Kruel, S., Riedl, M., Roberts, S., Sastry, P., Wang, J., Wu, Y., & Yudkowsky, E. (2022). Towards an ethical framework for open access publication involving dual-use research in the life sciences. Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science. Advanced online publication. https://doi.org/10.1089/bsp.2021.0088

Barnett, Matthew (2023): The Possibility of an Indefinite AI Pause. Retrieved on January 18, 2024 from https://forum.effectivealtruism.org/posts/k6K3iktCLCTHRMJsY/the-possibility-of-an-indefinite-ai-pause

Bostrom, N. (2022). The Vulnerable World Hypothesis. Global Policy, 13(1), 115-119.

Byford, S. (2022). AI Safety is Now the Top Cause Xrisk Area Amongst Longtermists. The gradients of progress. https://thegradientsofprogress.substack.com/p/ai-safety-is-now-the-top-cause-xrisk

Christian, B. (2022). What are plausible timelines for advanced AI capabilities? Centre for the Study of Existential Risk.

Critch, A., & Krueger, T. (2022). AI Timelines: Short Version. Center for Human-Compatible AI.

Dafoe, A. (2022). AI governance: opportunities and challenges for strategic decisions. Center for Security and Emerging Technology. https://cset.georgetown.edu/publication/ai-governance-opportunities-and-challenges-for-strategic-decisions/

Grace, Katya (2022): 2022 AI Expert Survey Results. Retrieved on January 18, 2024 from https://forum.effectivealtruism.org/posts/mjB9osLTJJM4zKhoq/2022-ai-expert-survey-results

Grace, K. et al. (2022). Revisiting AI timelines. arXiv preprint arXiv:2206.12574.

Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G. Z. (2022). XAI—Explainable artificial intelligence. Science Robotics, 7(62), eabj8752.

Hadfield, G. K. (2022). Governing the Global Race for AI Power. Regulation, 45, 36.

Hubinger, E., Van Merwijk, C., Mikulik, V., Skalse, J., & Garrabrant, S. (2019). Risks from learned optimization in advanced machine learning systems. arXiv preprint arXiv:1906.01820.

Kenton, Z., Everitt, T., Weidinger, L., Gabriel, I., Mikulik, V., & Bansal, M. (2021). Alignment of Language Agents. arXiv preprint arXiv:2112.04083.

Naumov, V. (2022). Russia to let companies use trademarks without consent for AI development. Reuters. https://www.reuters.com/technology/russia-let-companies-use-trademarks-without-consent-ai-development-2022-11-02/

Ord, T. (2022). The Precipice: Existential Risk and the Future of Humanity. Hachette UK.

Russell, S. (2022). Human compatible: Artificial intelligence and the problem of control. Penguin.

Shah, H., Chandakkar, P., Liu, Z., Pipeline, J., Trivedi, K., Zhang, M., Zhao, T., Baumli, K., Brockschmidt, M., Glebov, R. I., … & Nematzadeh, A. (2023). Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2302.00020.

Triolo, P., Kania, E., & Webster, G. (2022). Translation: Full Text of China’s “New Generation AI Development Plan”. DigiChina.

Turchin, A., & Denkenberger, D. (2022). Classification of global catastrophic vs. existential risks from artificial intelligence. Futures, 104872.

Weinstein, E. A. (2022). Innovating responsible AI for international peace and security. Ethics & International Affairs, 36(1), 71-86.

Yudkowsky, E. (2013). Intelligence explosion microeconomics. Machine Intelligence Research Institute Technical Report, (2013-1).

Yudkowsky, E. (2017). There’s no fire alarm for artificial general intelligence. Machine Intelligence Research Institute.

Yudkowsky, Eliezer (2008): Artificial intelligence as a positive and negative factor in global risk. Retrieved on January 18, 2024 from https://intelligence.org/files/AIPosNegFactor.pdf