AGI Catastrophe and Takeover: Some Reference Class-Based Priors

This is a linkpost for https://docs.google.com/document/d/1k0FTUSB0yOPVl_nQp9AfcAOxKMM3MwPONw3Mm8imPu8/edit?usp=drive_link

I am grateful to Holly Elmore, Michael Aird, Bruce Tsai, Tamay Besiroglu, Zach Stein-Perlman, Tyler John, and Kit Harris for pointers or feedback on this document.

Executive Summary

Overview

In this document, I collect and describe reference classes for the risk of catastrophe from superhuman artificial general intelligence (AGI). On some accounts, reference classes are the best starting point for forecasts, even though they often feel unintuitive. To my knowledge, nobody has previously attempted this for risks from superhuman AGI. This is to a large degree because superhuman AGI is in a real sense unprecedented. Yet there are some reference classes or at least analogies people have cited to think about the impacts of superhuman AI, such as the impacts of human intelligence, corporations, or, increasingly, the most advanced current AI systems.

My high-level takeaway is that different ways of integrating and interpreting reference classes generate priors on AGI-caused human extinction by 2070 anywhere between 1/10000 and ¹⁄₆ (mean of ~0.03%-4%). Reference classes offer a non-speculative case for concern with AGI-related risks. On this account, AGI risk is not a case of Pascal’s mugging, but most reference classes do not support greater-than-even odds of doom. The reference classes I look at generate a prior for AGI control over current human resources anywhere between 5% and 60% (mean of ~16-26%). The latter is a distinctive result of the reference class exercise: the expected degree of AGI control over the world looks to far exceed the odds of human extinction by a sizable margin on these priors. The extent of existential risk, including permanent disempowerment, should fall somewhere between these two ranges.

This effort is a rough, non-academic exercise and requires a number of subjective judgment calls. At times I play a bit fast and loose with the exact model I am using; the work lacks the ideal level of theoretical grounding. Nonetheless, I think the appropriate prior is likely to look something like what I offer here. I encourage intuitive updates and do not recommend these priors as the final word.

Approach

I collect sets of events that superhuman AGI-caused extinction or takeover would be plausibly representative of, ex ante. Interpreting and aggregating them requires a number of data collection decisions, the most important of which I detail here:

For each reference class, I collect benchmarks for the likelihood of one or two things:
1. Human extinction
2. AI capture of humanity’s available resources.
Many risks and reference classes are properly thought of as annualised risks (e.g., the yearly chance of a major AI-related disaster or extinction from asteroid), but some make more sense as risks from a one-time event (e.g., the chance that the creation of a major AI-related disaster or a given asteroid hit causes human extinction). For this reason, I aggregate three types of estimates (see the full document for the latter two types of estimates):
1. 50-Year Risk (e.g. risk of a major AI disaster in 50 years)
2. 10-Year Risk (e.g. risk of a major AI disaster in 10 years)
3. Risk Per Event (e.g. risk of a major AI disaster per invention)
Given that there are dozens or hundreds of reference classes, I summarise them in a few ways:
1. Minimum and maximum
2. Weighted arithmetic mean (i.e., weighted average)
  1. I “winsorise”, i.e. replace 0 or 1 with the next-most extreme value.
  2. I intuitively downweight some reference classes. For details on weights, see the methodology.
3. Weighted geometric mean

Findings for Fifty-Year Impacts of Superhuman AI

See the full document and spreadsheet for further details on how I arrive at these figures.

Reference Class Descriptions and Summaries

Color scale: green = most credible and informative, red = least

What I Estimate

What’s Included

Summary

Emergence of Relatively Superintelligent Species/Genus

What share of species go extinct because a newly capable species or genus arises?

What share of pre-existing species’ resources do a newly capable species or genera capture?

Share of species extinct because of newly superintelligent species

- Share of megafauna that went extinct shortly after human arrival

- Projections of eventual excess mammal species extinction rate in the anthropocene

- Adjustments of the above rates to account for the fact that humans are exceptional

- Effect of invasive mammal species on island bird extinctions

Minimum: 0

Maximum: 67%

Weighted arithmetic mean: 6.69%

Weighted geometric mean: 0.524%

Share of resources controlled by newly superintelligent species

- Share of land modified or used by humans

- Share of Earth’s surface used by humans

- Share of global or animall biomass consisting of or domesticated by humans

- Average population decline across wildlife species in the anthropocene

- Adjustments of the above rates to account for the fact that humans are exceptional

Minimum: 7.72 x 10^-11

Maximum: 50%

Weighted arithmetic mean: 5.99%

Weighted geometric mean: 0.0141%

Reasons to believe this reference class:

- This is a common analogy in arguments about risk from superhuman AGI.

- Most arguments about superhuman AGI (e.g. convergence theses) are about the idea of intelligence or discontinuous capabilities and thus apply to intelligent species as well.

Reasons not to believe it:

- Biological causes of extinction may differ from AGI-related causes.

- Intelligent species, including humans may be qualitatively different from superhuman AGI.

Known Human Extinction Risks

What is the chance humanity goes extinct from a plausibly alleged extinction threat?

Estimated odds of human extinction

- Chances of 8 billion deaths from bioterror and biowarfare assuming a power law

- Likelihood of mass extinction from an asteroid

- Likelihood of mass extinction from a supernova

- Likelihood of mass extinction from a gamma ray burst

- Yearly chance of “infinite impact” from the Global Challenges Foundation for various causes

Minimum: 0

Maximum: 0.056%

Weighted arithmetic mean: 0.00539%

Weighted geometric mean: 0.000365%

Reasons to believe this reference class:

- Since we largely have only intuitive arguments for AGI risk, looking at other “things people argue could cause extinction” offers a natural benchmark.

- Extinction from cause X should be less likely the harder it is for humans to go extinct.

Reasons not to believe it:

- Since AGI is agential, it is likely more damaging than accidental risks.

- AGI might be seen as more speculative than the risks included here (and therefore lower).

- Observation selection may select for worlds with low natural risks relative to anthropogenic ones.

Power of Social Organisations (Governments and Corporations)

What share of resources are controlled by organised groups of people compared to individuals?

Share of resources controlled by social organisations

- Government or central government spending as share of GDP, US

- Share of humans who are citizens of a nation-state

- Share of people employed by government in OECD countries

- Corporate or government assets as share of global assets

Minimum: 4.7%

Maximum: 50%

Weighted arithmetic mean: 20%

Weighted geometric mean: 29.4%

Reasons to believe this reference class:

- Organised of individual humans are in some sense a superintelligent entity relative to individuals.

Reasons not to believe it:

- It is ambiguous how to distinguish what belongs to a collective and what belongs to individuals.

- Socal orgnisations may be less (or more) intelligent than superhuman AGI.

Naïve Posteriors from Previous Technologies

How likely can human extinction be from a threatening invention given prior inventions?

How likely can transformative change be from a major invention given prior major inventions?

Chance of extinction from a threatening invention

- Chance of extinction from a given category of threatening inventions (subjectively defined) given a Beta (0.5, 0.5) prior

- Chance of extinction from a given threatening invention (subjectively defined) given a Beta (0.5, 0.5) prior

In addition, I estimate what rate of extinction would imply a <1% chance of seeing as many threatening inventions as we have seen.

Rate given Beta (0.5, 0.5) prior (<1% likelihood rate in parentheses):

Minimum: 0.61% (5.53%)

Maximum: 6.33% (48.7%)

Weighted arithmetic mean: 3.94% (31.4%)

Weighted geometric mean: 2.78% (33.4%)

Chance of transformation from a major invention

- Chance an ex-ante potentially transformative invention (subjectively defined) is actually transformative (subjectively defined) given a Beta (0.5, 0.5) prior

- Chance a historic invention (subjectively defined) is actually transformative (subjectively defined) given a Beta (0.5, 0.5) prior

I also compute <1% likelihood estimates as for the extinction measure.

Rate given Beta (0.5, 0.5) prior (<1% likelihood rate in parentheses):

Minimum: 1.25% (5.1%)

Maximum: 17.8% (53.7%)

Weighted arithmetic mean: 13.67% (41.6%)

Weighted geometric mean: 9.17% (29.8%)

Reasons to believe this reference class:

- It is perhaps the reference class where it is most obvious superhuman AGI fits in.

Reasons not to believe it:

- The definition of an invention that would have seemed threatening or major is subjective.

- Extinction estimates depend heavily on the prior since we have never observed human extinction.

- Here I am taking “chance of transformation” as another estimate of “share of resources controlled”, but it is quite a different way of thinking about that (in probabilities rather than fixed shares).

Damages from and Power of AI Systems to Date

How likely is it that current AI systems would cause human extinction?

What share of current economic activity can be automated by existing AI technologies?

Likelihood of a current AI system killing 8 billion people

- Frequency of “critical” incidents from AI systems

- Likelihood a critical incident kills 8 billion people based on various distributions. Note: tenuous and poor fit.

Minimum: 0

Maximum: 0.104%

Weighted arithmetic mean: 0.0718%

Weighted geometric mean: 0.139%

Forecasted AI share of

economy

Naïve extrapolations of the following:

- Share of 2017 work tasks that could be automated

- Contribution of automation to GDP

Minimum: 34.3%

Maximum: 69.3%

Weighted arithmetic mean: 30.5%

Weighted geometric mean: 55.8%

Reasons to believe this reference class:

- This is perhaps the second-most natural reference class after the previous-technologies one.

Reasons not to believe it:

- Extinction likelihoods depend on extrapolations and judgment calls that are difficult to defend.

- Economic estimates are currently very naïve and likely unrealistic.

Rates of Product Defects

How often do various consumer products exhibit major defects?

Share of products with a serious defect

- Share of cars or car components subject to recall (with or without risk of death)

- Share of drugs withdrawn from market or with a post-market safety issue

- Share of meat recalled by weight

- U.S. standard for acceptable cancer risk

Minimum: 2.1 x 10^-10

Maximum: 58.5%

Weighted arithmetic mean: 2.29%

Weighted geometric mean: 0.0645%

Reasons to believe this reference class:

- This reference class seems somewhat natural, and data is available.

Reasons not to believe it:

- Determining what counts as catastrophe requires delicate judgment calls.

- These sorts of products and defects are likely quite different from AGI misalignment.