Jhrosenberg comments on Results from an Adversarial Collaboration on AI Risk (FRI)

Jhrosenberg 14 Mar 2024 15:53 UTC
5 points
0 ∶ 0
Thanks, Ryan, this is great. These are the kinds of details we are hoping for in order to inform future operationalizations of “AI takeover” and “existential catastrophe” questions.
For context: We initially wanted to keep our definition of “existential catastrophe” closer to Ord’s definition, but after a few interviews with experts and back-and-forths we struggled to get satisfying resolution criteria for the “unrecoverable dystopia” and (especially) “destruction of humanity’s longterm potential” aspects of the definition. Our ‘concerned’ advisors thought the “extinction” and “unrecoverable collapse” parts would cover enough of the relevant issues and, as we saw in the forecasts we’ve been discussing, it seems like it captured a lot of the risk for the ‘concerned’ participants in this sample. But, we’d like to figure out better operationalizations of “AI takeover” or related “existential catastrophes” for future projects, and this is helpful on that front.
Broadly, it seems like the key aspect to carefully operationalize here is “AI control of resources and power.” Your suggestion here seems like it’s going in a helpful direction:
“One way to operationalize this is that if the AIs in control wanted to kill or torture humans, they could easily do so.”
We’ll keep reflecting on this, and may reach out to you when we write “takeover”-related questions for our future projects and get into the more detailed resolution criteria phase.
Thanks for taking the time to offer your detailed thoughts on the outcomes you’d most like to see forecasted.
- Ryan Greenblatt 14 Mar 2024 19:33 UTC
  5 points
  1 ∶ 0
  Parent
  it seems like the key aspect to carefully operationalize here is “AI control of resources and power.”
  Yep, plus something like “these AIs in control either weren’t intended to be successors or were intended to be successors but are importantly misaligned (e.g. the group that appointed them would think ex-post that it would have been much better if these AIs were “better aligned” or if they could retain control)”.
  
  It’s unfortunate that the actual operationalization has to be so complex.