Ryan Greenblatt comments on Results from an Adversarial Collaboration on AI Risk (FRI)

Ryan Greenblatt 13 Mar 2024 19:39 UTC
26 points
3 ∶ 0
Just to be clear, the primary outcome we looked at (after considering various definitions and getting agreement from some key ‘concerned’ people) was “existential catastrophe,” defined as either extinction or “unrecoverable collapse,” with the latter defined as “(a) a global GDP of less than $1 trillion annually in 2022 dollars for at least a million years (continuously), beginning before 2100; or (b) a human population remaining below 1 million for at least a million years (continuously), beginning before 2100.”
I think this definition of existential catastrophe is probably only around ¹⁄₄ of the existential catastrophe due to AI (takeover) that I expect. I don’t really see why the economy would collapse or human population^[1] would go that low in typical AI takeover scenarios.^[2] By default I expect:
- A massively expanding economy due to the singularity
- The group in power to keep some number of humans around^[3]
However, as you note, it seems as though the “concerned” group disagrees with me (though perhaps the skeptics agree):
However, we also sanity checked (see p. 14) our findings by asking about the probability that more than 60% of humans would die within a 5-year period before 2100. The median concerned participant forecasted 32%, and the median skeptic forecasted 1%.
More details on existential catastrophes that don’t meet the criteria you use
Some scenarios I would call “existential catastrophe” (due to AI takeover) which seem reasonably central to me and don’t meet the criteria for “existential catastrophe” you used:
1. AIs escape or otherwise end up effectively uncontrolled by humans. These AIs violently take over the world, killing billions (or at least 100s of millions) of people in the process (either in the process of taking over or to secure the situation after mostly having de facto control). However, a reasonable number of humans remain alive. In the long run, nearly all resources are effectively controlled by these AIs or their successors. But, some small fraction of resources (perhaps 1 billionth or 1 trillionth) are given from the AI to humans (perhaps for acausal trade reasons or due to a small amount of kindness in the AI), and thus (if humans want to), they can easily support an extremely large (digital) population of humans
  1. In this scenario, global GDP stays high (it even grows rapidly) and the human population never goes below 1 million.
2. AIs end up in control of some AI lab and eventually they partner with a powerful country. They are able to effectively take control of this powerful country due to a variety of mechanisms. These AIs end up participating in the economy and in international diplomacy. The AIs quickly acquire more and more power and influence, but there isn’t any point at which killing a massive number of humans is a good move. (Perhaps because initially they have remaining human allies which would be offended by this and offending these human allies would be risky. Eventually the AIs are unilaterally powerful enough that human allies are unimportant, but at this point, they have sufficient power that slaughtering humans is no longer useful.)
3. AIs end up in a position where they have some power and after some negotiation, AIs are given various legal rights. They compete peacefully in the economy and respect the most clear types of property rights (but not other property rights like space belonging to mankind) and eventually acquire most power and resources via their labor. At no point do they end up slaughtering humans for some reason (perhaps due to the reasons expressed in the bullet above).
4. AIs escape or otherwise end up effectively uncontrolled by humans and have some specific goals or desires with respect to existing humans. E.g., perhaps they want to gloat to existing humans or some generalization of motivations acquired from training is best satisfied by keeping these humans around. These specific goals with respect to existing humans result in these humans being subjected to bad things they didn’t consent to (e.g. being forced to perform some activities).
5. AIs take over and initially slaughter nearly all humans (e.g. fewer than 1 million alive). However, to keep option value, they cryopreserve a moderate number (still <1 million) and ensure that they could recreate a biological human population if desired. Later, the AIs decide to provide humanity with a moderate amount of resources.
All of these scenarios involve humanity losing control over the future and losing power. This includes existing governments on Earth losing their power and most of the cosmic resources being controlled by AIs don’t represent the interests of the original humans in power. (One way to operationalize this is that if the AIs in control wanted to kill or torture humans, they could easily do so.)
To be clear, I think people might disagree about whether (2) and (3) are that bad because these cases look OK from the perspective of ensuring that existing humans get to live full lives with a reasonable amount of resources. (Of course, ex-ante it will unclear if it will go this way if AIs which don’t represent human interests end up in power.)
They all count as existential catastrophes because that just reflects long run potential.
1. ^
  I’m also counting choosen successors of humanity as human even if they aren’t biologically human. E.g., due to emulated minds or further modifications.
2. ^
  Existential risk due to AI, but not due to AI takeover (e.g. due to humanity going collectively insane or totalitarian lock in) also probably doesn’t result in economic collapse or a tiny human population.
3. ^
  For more discussion, see here, here, and here.
- Jhrosenberg 14 Mar 2024 15:53 UTC
  5 points
  0 ∶ 0
  Parent
  Thanks, Ryan, this is great. These are the kinds of details we are hoping for in order to inform future operationalizations of “AI takeover” and “existential catastrophe” questions.
  For context: We initially wanted to keep our definition of “existential catastrophe” closer to Ord’s definition, but after a few interviews with experts and back-and-forths we struggled to get satisfying resolution criteria for the “unrecoverable dystopia” and (especially) “destruction of humanity’s longterm potential” aspects of the definition. Our ‘concerned’ advisors thought the “extinction” and “unrecoverable collapse” parts would cover enough of the relevant issues and, as we saw in the forecasts we’ve been discussing, it seems like it captured a lot of the risk for the ‘concerned’ participants in this sample. But, we’d like to figure out better operationalizations of “AI takeover” or related “existential catastrophes” for future projects, and this is helpful on that front.
  Broadly, it seems like the key aspect to carefully operationalize here is “AI control of resources and power.” Your suggestion here seems like it’s going in a helpful direction:
  “One way to operationalize this is that if the AIs in control wanted to kill or torture humans, they could easily do so.”
  We’ll keep reflecting on this, and may reach out to you when we write “takeover”-related questions for our future projects and get into the more detailed resolution criteria phase.
  Thanks for taking the time to offer your detailed thoughts on the outcomes you’d most like to see forecasted.
  - Ryan Greenblatt 14 Mar 2024 19:33 UTC
    5 points
    1 ∶ 0
    Parent
    it seems like the key aspect to carefully operationalize here is “AI control of resources and power.”
    Yep, plus something like “these AIs in control either weren’t intended to be successors or were intended to be successors but are importantly misaligned (e.g. the group that appointed them would think ex-post that it would have been much better if these AIs were “better aligned” or if they could retain control)”.
    
    It’s unfortunate that the actual operationalization has to be so complex.

Ryan Greenblatt comments on Results from an Adversarial Collaboration on AI Risk (FRI)

More details on existential catastrophes that don’t meet the criteria you use