I find myself confused about the operationalizations of a few things:
In a few places in the report, the term “extinction” is used and some arguments are specifically about extinction being unlikely. I put a much lower probability on human extinction than extremely bad outcomes due to AI (perhaps extinction is 5x lower probability) while otherwise having similar probabilities as the “concerned” group. So I find the focus on extinction confusing and possibly misleading.
As far as when “AI will displace humans as the primary force that determines what happens in the future”, does this include scenerios where humans defer to AI advisors that actually do represent their best interests? What about scenarios in which humans slowly self-enhance and morph into artificial intelligences? Or what about situations in which humans careful select aligned successors to control their resources which are AIs?
It feels like this question rests on a variety of complex considerations and operationalizations that seem mostly unrelated to the thing we actually seem to care about: “how powerful is AI”. Thus, I find it hard to interpret the responses here.
Perhaps more interesting questions on a similar topic could be something like:
By what point will AIs be sufficiently smart and capable that the gap in capabilities between them and currently existing humans is similar to the gap in intelligence and abilities between currently existing humans and field mice. (When we say AIs are capable of something, we mean the in principle ability to do something if all AIs worked together and we put aside intentionally imposed checks on AI power.)
Conditional on the continued existence of some civilization and this civilization wanting to harness vast amounts of energy, at what point will usefully harnessed energy in a given year be >1/100 of the sun’s yearly energy output.
Just to be clear, the primary outcome we looked at (after considering various definitions and getting agreement from some key ‘concerned’ people) was “existential catastrophe,” defined as either extinction or “unrecoverable collapse,” with the latter defined as “(a) a global GDP of less than $1 trillion annually in 2022 dollars for at least a million years (continuously), beginning before 2100; or (b) a human population remaining below 1 million for at least a million years (continuously), beginning before 2100.”
However, we also sanity checked (see p. 14) our findings by asking about the probability that more than 60% of humans would die within a 5-year period before 2100. The median concerned participant forecasted 32%, and the median skeptic forecasted 1%. So, this outcome was considered much more likely by skeptics (median of 1% vs. 0.12% for existential catastrophe). But, a very large gap between the groups still existed. And it also did not seem that focusing on this alternative outcome made a major difference to crux rankings when we collected a small amount of data on it. So, for the most part we focus on the “existential catastrophe” outcome and expect that most of the key points in the debate would still hold for somewhat less extreme outcomes (with the exception of the debate about how difficult it is to kill literally everyone, though that point is relevant to at least people who do argue for high probabilities on literal extinction).
We also had a section of the report (“Survey on long-term AI outcomes”) where we asked both groups to consider other severe negative outcomes such as major decreases in human well-being (median <4/10 on an “Average Life Evaluation” scale) and 50% population declines.
Do you have alternative “extremely bad” outcomes that you wish had been considered more?
We added this question in part because some participants and early readers wanted to explore debates about “AI takeover,” since some say that is the key negative outcome they are worried about rather than large-scale death or civilizational collapse. However, we found this difficult to operationalize and agree that our question is highly imperfect; we welcome better proposals. In particular, as you note, our operationalization allows for positive ‘displacement’ outcomes where humans choose to defer to AI advisors and is ambiguous in the ‘AI merges with humans’ case.
Your articulations of extremely advanced AI capabilities and energy use seem useful to ask about also, but do not directly get at the “takeover” question as we understood it.
Nevertheless, our existing ‘displacement’ question at least points to some major difference in world models between the groups, which is interesting even if the net welfare effect of the outcome is difficult to pin down. A median year for ‘displacement’ (as currently defined) of 2045 for the concerned group vs. 2450 for the skeptics is a big gap that illustrates major differences in how the groups expect the future to play out. This helped to inspire the elaboration on skeptics’ views on AI risk in the “What long-term outcomes from AI do skeptics expect?” section.
Finally, I want to acknowledge that one of the top questions we wished we asked related to superintelligent-like AI capabilities. We hope to dig more into this in follow-up studies and will consider the definitions you offered.
Thanks again for taking the time to consider this and propose operationalizations that would be useful to you!
Just to be clear, the primary outcome we looked at (after considering various definitions and getting agreement from some key ‘concerned’ people) was “existential catastrophe,” defined as either extinction or “unrecoverable collapse,” with the latter defined as “(a) a global GDP of less than $1 trillion annually in 2022 dollars for at least a million years (continuously), beginning before 2100; or (b) a human population remaining below 1 million for at least a million years (continuously), beginning before 2100.”
I think this definition of existential catastrophe is probably only around 1⁄4 of the existential catastrophe due to AI (takeover) that I expect. I don’t really see why the economy would collapse or human population[1] would go that low in typical AI takeover scenarios.[2] By default I expect:
A massively expanding economy due to the singularity
The group in power to keep some number of humans around[3]
However, as you note, it seems as though the “concerned” group disagrees with me (though perhaps the skeptics agree):
However, we also sanity checked (see p. 14) our findings by asking about the probability that more than 60% of humans would die within a 5-year period before 2100. The median concerned participant forecasted 32%, and the median skeptic forecasted 1%.
More details on existential catastrophes that don’t meet the criteria you use
Some scenarios I would call “existential catastrophe” (due to AI takeover) which seem reasonably central to me and don’t meet the criteria for “existential catastrophe” you used:
AIs escape or otherwise end up effectively uncontrolled by humans. These AIs violently take over the world, killing billions (or at least 100s of millions) of people in the process (either in the process of taking over or to secure the situation after mostly having de facto control). However, a reasonable number of humans remain alive. In the long run, nearly all resources are effectively controlled by these AIs or their successors. But, some small fraction of resources (perhaps 1 billionth or 1 trillionth) are given from the AI to humans (perhaps for acausal trade reasons or due to a small amount of kindness in the AI), and thus (if humans want to), they can easily support an extremely large (digital) population of humans
In this scenario, global GDP stays high (it even grows rapidly) and the human population never goes below 1 million.
AIs end up in control of some AI lab and eventually they partner with a powerful country. They are able to effectively take control of this powerful country due to a variety of mechanisms. These AIs end up participating in the economy and in international diplomacy. The AIs quickly acquire more and more power and influence, but there isn’t any point at which killing a massive number of humans is a good move. (Perhaps because initially they have remaining human allies which would be offended by this and offending these human allies would be risky. Eventually the AIs are unilaterally powerful enough that human allies are unimportant, but at this point, they have sufficient power that slaughtering humans is no longer useful.)
AIs end up in a position where they have some power and after some negotiation, AIs are given various legal rights. They compete peacefully in the economy and respect the most clear types of property rights (but not other property rights like space belonging to mankind) and eventually acquire most power and resources via their labor. At no point do they end up slaughtering humans for some reason (perhaps due to the reasons expressed in the bullet above).
AIs escape or otherwise end up effectively uncontrolled by humans and have some specific goals or desires with respect to existing humans. E.g., perhaps they want to gloat to existing humans or some generalization of motivations acquired from training is best satisfied by keeping these humans around. These specific goals with respect to existing humans result in these humans being subjected to bad things they didn’t consent to (e.g. being forced to perform some activities).
AIs take over and initially slaughter nearly all humans (e.g. fewer than 1 million alive). However, to keep option value, they cryopreserve a moderate number (still <1 million) and ensure that they could recreate a biological human population if desired. Later, the AIs decide to provide humanity with a moderate amount of resources.
All of these scenarios involve humanity losing control over the future and losing power. This includes existing governments on Earth losing their power and most of the cosmic resources being controlled by AIs don’t represent the interests of the original humans in power. (One way to operationalize this is that if the AIs in control wanted to kill or torture humans, they could easily do so.)
To be clear, I think people might disagree about whether (2) and (3) are that bad because these cases look OK from the perspective of ensuring that existing humans get to live full lives with a reasonable amount of resources. (Of course, ex-ante it will unclear if it will go this way if AIs which don’t represent human interests end up in power.)
They all count as existential catastrophes because that just reflects long run potential.
I’m also counting choosen successors of humanity as human even if they aren’t biologically human. E.g., due to emulated minds or further modifications.
Existential risk due to AI, but not due to AI takeover (e.g. due to humanity going collectively insane or totalitarian lock in) also probably doesn’t result in economic collapse or a tiny human population.
Thanks, Ryan, this is great. These are the kinds of details we are hoping for in order to inform future operationalizations of “AI takeover” and “existential catastrophe” questions.
For context: We initially wanted to keep our definition of “existential catastrophe” closer to Ord’s definition, but after a few interviews with experts and back-and-forths we struggled to get satisfying resolution criteria for the “unrecoverable dystopia” and (especially) “destruction of humanity’s longterm potential” aspects of the definition. Our ‘concerned’ advisors thought the “extinction” and “unrecoverable collapse” parts would cover enough of the relevant issues and, as we saw in the forecasts we’ve been discussing, it seems like it captured a lot of the risk for the ‘concerned’ participants in this sample. But, we’d like to figure out better operationalizations of “AI takeover” or related “existential catastrophes” for future projects, and this is helpful on that front.
Broadly, it seems like the key aspect to carefully operationalize here is “AI control of resources and power.” Your suggestion here seems like it’s going in a helpful direction:
“One way to operationalize this is that if the AIs in control wanted to kill or torture humans, they could easily do so.”
We’ll keep reflecting on this, and may reach out to you when we write “takeover”-related questions for our future projects and get into the more detailed resolution criteria phase.
Thanks for taking the time to offer your detailed thoughts on the outcomes you’d most like to see forecasted.
it seems like the key aspect to carefully operationalize here is “AI control of resources and power.”
Yep, plus something like “these AIs in control either weren’t intended to be successors or were intended to be successors but are importantly misaligned (e.g. the group that appointed them would think ex-post that it would have been much better if these AIs were “better aligned” or if they could retain control)”.
It’s unfortunate that the actual operationalization has to be so complex.
I find myself confused about the operationalizations of a few things:
In a few places in the report, the term “extinction” is used and some arguments are specifically about extinction being unlikely. I put a much lower probability on human extinction than extremely bad outcomes due to AI (perhaps extinction is 5x lower probability) while otherwise having similar probabilities as the “concerned” group. So I find the focus on extinction confusing and possibly misleading.
As far as when “AI will displace humans as the primary force that determines what happens in the future”, does this include scenerios where humans defer to AI advisors that actually do represent their best interests? What about scenarios in which humans slowly self-enhance and morph into artificial intelligences? Or what about situations in which humans careful select aligned successors to control their resources which are AIs?
It feels like this question rests on a variety of complex considerations and operationalizations that seem mostly unrelated to the thing we actually seem to care about: “how powerful is AI”. Thus, I find it hard to interpret the responses here.
Perhaps more interesting questions on a similar topic could be something like:
By what point will AIs be sufficiently smart and capable that the gap in capabilities between them and currently existing humans is similar to the gap in intelligence and abilities between currently existing humans and field mice. (When we say AIs are capable of something, we mean the in principle ability to do something if all AIs worked together and we put aside intentionally imposed checks on AI power.)
Conditional on the continued existence of some civilization and this civilization wanting to harness vast amounts of energy, at what point will usefully harnessed energy in a given year be >1/100 of the sun’s yearly energy output.
Hi Ryan,
Thanks for the comment!
Regarding “extinction”:
Just to be clear, the primary outcome we looked at (after considering various definitions and getting agreement from some key ‘concerned’ people) was “existential catastrophe,” defined as either extinction or “unrecoverable collapse,” with the latter defined as “(a) a global GDP of less than $1 trillion annually in 2022 dollars for at least a million years (continuously), beginning before 2100; or (b) a human population remaining below 1 million for at least a million years (continuously), beginning before 2100.”
However, we also sanity checked (see p. 14) our findings by asking about the probability that more than 60% of humans would die within a 5-year period before 2100. The median concerned participant forecasted 32%, and the median skeptic forecasted 1%. So, this outcome was considered much more likely by skeptics (median of 1% vs. 0.12% for existential catastrophe). But, a very large gap between the groups still existed. And it also did not seem that focusing on this alternative outcome made a major difference to crux rankings when we collected a small amount of data on it. So, for the most part we focus on the “existential catastrophe” outcome and expect that most of the key points in the debate would still hold for somewhat less extreme outcomes (with the exception of the debate about how difficult it is to kill literally everyone, though that point is relevant to at least people who do argue for high probabilities on literal extinction).
We also had a section of the report (“Survey on long-term AI outcomes”) where we asked both groups to consider other severe negative outcomes such as major decreases in human well-being (median <4/10 on an “Average Life Evaluation” scale) and 50% population declines.
Do you have alternative “extremely bad” outcomes that you wish had been considered more?
Regarding “displacement” (footnote 10 on p. 6 for full definition):
We added this question in part because some participants and early readers wanted to explore debates about “AI takeover,” since some say that is the key negative outcome they are worried about rather than large-scale death or civilizational collapse. However, we found this difficult to operationalize and agree that our question is highly imperfect; we welcome better proposals. In particular, as you note, our operationalization allows for positive ‘displacement’ outcomes where humans choose to defer to AI advisors and is ambiguous in the ‘AI merges with humans’ case.
Your articulations of extremely advanced AI capabilities and energy use seem useful to ask about also, but do not directly get at the “takeover” question as we understood it.
Nevertheless, our existing ‘displacement’ question at least points to some major difference in world models between the groups, which is interesting even if the net welfare effect of the outcome is difficult to pin down. A median year for ‘displacement’ (as currently defined) of 2045 for the concerned group vs. 2450 for the skeptics is a big gap that illustrates major differences in how the groups expect the future to play out. This helped to inspire the elaboration on skeptics’ views on AI risk in the “What long-term outcomes from AI do skeptics expect?” section.
Finally, I want to acknowledge that one of the top questions we wished we asked related to superintelligent-like AI capabilities. We hope to dig more into this in follow-up studies and will consider the definitions you offered.
Thanks again for taking the time to consider this and propose operationalizations that would be useful to you!
I think this definition of existential catastrophe is probably only around 1⁄4 of the existential catastrophe due to AI (takeover) that I expect. I don’t really see why the economy would collapse or human population[1] would go that low in typical AI takeover scenarios.[2] By default I expect:
A massively expanding economy due to the singularity
The group in power to keep some number of humans around[3]
However, as you note, it seems as though the “concerned” group disagrees with me (though perhaps the skeptics agree):
More details on existential catastrophes that don’t meet the criteria you use
Some scenarios I would call “existential catastrophe” (due to AI takeover) which seem reasonably central to me and don’t meet the criteria for “existential catastrophe” you used:
AIs escape or otherwise end up effectively uncontrolled by humans. These AIs violently take over the world, killing billions (or at least 100s of millions) of people in the process (either in the process of taking over or to secure the situation after mostly having de facto control). However, a reasonable number of humans remain alive. In the long run, nearly all resources are effectively controlled by these AIs or their successors. But, some small fraction of resources (perhaps 1 billionth or 1 trillionth) are given from the AI to humans (perhaps for acausal trade reasons or due to a small amount of kindness in the AI), and thus (if humans want to), they can easily support an extremely large (digital) population of humans
In this scenario, global GDP stays high (it even grows rapidly) and the human population never goes below 1 million.
AIs end up in control of some AI lab and eventually they partner with a powerful country. They are able to effectively take control of this powerful country due to a variety of mechanisms. These AIs end up participating in the economy and in international diplomacy. The AIs quickly acquire more and more power and influence, but there isn’t any point at which killing a massive number of humans is a good move. (Perhaps because initially they have remaining human allies which would be offended by this and offending these human allies would be risky. Eventually the AIs are unilaterally powerful enough that human allies are unimportant, but at this point, they have sufficient power that slaughtering humans is no longer useful.)
AIs end up in a position where they have some power and after some negotiation, AIs are given various legal rights. They compete peacefully in the economy and respect the most clear types of property rights (but not other property rights like space belonging to mankind) and eventually acquire most power and resources via their labor. At no point do they end up slaughtering humans for some reason (perhaps due to the reasons expressed in the bullet above).
AIs escape or otherwise end up effectively uncontrolled by humans and have some specific goals or desires with respect to existing humans. E.g., perhaps they want to gloat to existing humans or some generalization of motivations acquired from training is best satisfied by keeping these humans around. These specific goals with respect to existing humans result in these humans being subjected to bad things they didn’t consent to (e.g. being forced to perform some activities).
AIs take over and initially slaughter nearly all humans (e.g. fewer than 1 million alive). However, to keep option value, they cryopreserve a moderate number (still <1 million) and ensure that they could recreate a biological human population if desired. Later, the AIs decide to provide humanity with a moderate amount of resources.
All of these scenarios involve humanity losing control over the future and losing power. This includes existing governments on Earth losing their power and most of the cosmic resources being controlled by AIs don’t represent the interests of the original humans in power. (One way to operationalize this is that if the AIs in control wanted to kill or torture humans, they could easily do so.)
To be clear, I think people might disagree about whether (2) and (3) are that bad because these cases look OK from the perspective of ensuring that existing humans get to live full lives with a reasonable amount of resources. (Of course, ex-ante it will unclear if it will go this way if AIs which don’t represent human interests end up in power.)
They all count as existential catastrophes because that just reflects long run potential.
I’m also counting choosen successors of humanity as human even if they aren’t biologically human. E.g., due to emulated minds or further modifications.
Existential risk due to AI, but not due to AI takeover (e.g. due to humanity going collectively insane or totalitarian lock in) also probably doesn’t result in economic collapse or a tiny human population.
For more discussion, see here, here, and here.
Thanks, Ryan, this is great. These are the kinds of details we are hoping for in order to inform future operationalizations of “AI takeover” and “existential catastrophe” questions.
For context: We initially wanted to keep our definition of “existential catastrophe” closer to Ord’s definition, but after a few interviews with experts and back-and-forths we struggled to get satisfying resolution criteria for the “unrecoverable dystopia” and (especially) “destruction of humanity’s longterm potential” aspects of the definition. Our ‘concerned’ advisors thought the “extinction” and “unrecoverable collapse” parts would cover enough of the relevant issues and, as we saw in the forecasts we’ve been discussing, it seems like it captured a lot of the risk for the ‘concerned’ participants in this sample. But, we’d like to figure out better operationalizations of “AI takeover” or related “existential catastrophes” for future projects, and this is helpful on that front.
Broadly, it seems like the key aspect to carefully operationalize here is “AI control of resources and power.” Your suggestion here seems like it’s going in a helpful direction:
We’ll keep reflecting on this, and may reach out to you when we write “takeover”-related questions for our future projects and get into the more detailed resolution criteria phase.
Thanks for taking the time to offer your detailed thoughts on the outcomes you’d most like to see forecasted.
Yep, plus something like “these AIs in control either weren’t intended to be successors or were intended to be successors but are importantly misaligned (e.g. the group that appointed them would think ex-post that it would have been much better if these AIs were “better aligned” or if they could retain control)”.
It’s unfortunate that the actual operationalization has to be so complex.