HaydnBelfield comments on Draft report on existential risk from power-seeking AI

HaydnBelfield 29 Apr 2021 22:35 UTC
8 points
0 ∶ 0
Hey Joe!
Great report, really fascinating stuff. Draws together lots of different writing on the subject, and I really like how you identify concerns that speak to different perspectives (eg to Drexler’s CAIS and classic Bostrom superintelligence).
Three quick bits of feedback:
1. I feel like some of Jess Whittlestone and collaborators’ recent research would be helpful in your initial framing, eg
  1. Prunkl, C. and Whittlestone, J. (2020). Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society. - on capability vs impact
  2. Gruetzemacher, R. and Whittlestone, J. (2019). The Transformative Potential of Artificial Intelligence. - on different scales of impact
  3. Cremer, C. Z., & Whittlestone, J. (2021). Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI. - on milestones and limitations
2. I don’t feel like you do quite enough to argue for premise 5 “Some of this power-seeking will scale (in aggregate) to the point of permanently disempowering ~all of humanity | (1)-(4).”
  Which is, unfortunately, a pretty key premise and the one I have the most questions about! My impression is that section 6.3 is where that argumentation is intended to occur, but I didn’t leave it with a sense of how you thought this would scale, disempower everyone, and be permanent. Would love for you to say more on this.
3. On a related, but distinct point, one thing I kept thinking is “does it matter that much if its an AI system that takes over the world and disempowers most people?”. Eg you set out in 6.3.1 a number of mechanisms by which an AI system could gain power—but 10 out of the 11 you give (all except Destructive capacity) seem relevant to a small group of humans in control of advanced capabilities too.
  Presumably we should also be worried about a small group doing this as well? For example, consider a scenario in which a powerhungry small group, or several competing groups, use aligned AI systems with advanced capabilities (perhaps APS, perhaps not) to the point of permanently disempowering ~all of humanity.
  If I went through and find-replaced all the “PS-misaligned AI system” with “power-hungry small group”, would it read that differently? To borrow Tegmark’s terms, does it matter if its Omega Team or Prometheus?
  I’d be interested in seeing some more from you about whether you’re also concerned about that scenario, whether you’re more/less concerned, and how you think its different from the AI system scenario.
Again, really loved the report, it is truly excellent work.
- Joe_Carlsmith 1 May 2021 1:18 UTC
  4 points
  0 ∶ 0
  Parent
  Hi Hadyn,
  Thanks for your kind words, and for reading.
  1. Thanks for pointing out these pieces. I like the breakdown of the different dimensions of long-term vs. near-term.
  2. Broadly, I agree with you that the document could benefit from more about premise 5. I’ll consider revising to add some.
  3. I’m definitely concerned about misuse scenarios too (and I think lines here can get blurry—see e.g. Katja Grace’s recent post); but I wanted, in this document, to focus on misalignment in particular. The question of how to weigh misuse vs. misalignment risk, and how the two are similar/different more generally, seems like a big one, so I’ll mostly leave it for another time (one big practical difference is that misalignment makes certain types of technical work more relevant).
  4. Eventually, the disempowerment has to scale to ~all of humanity (a la premise 5), so that would qualify as TAI in the “transition as big of a deal as the industrial revolution” sense. However, it’s true that my timelines condition in premise 1 (e.g., APS systems become possible and financially feasible) is weaker than Ajeya’s.
- HaydnBelfield 30 Apr 2021 10:45 UTC
  3 points
  0 ∶ 0
  Parent
  Oh and:
  4. Cotra aims to predict when it will be possible for “a single computer program [to] perform a large enough diversity of intellectual labor at a high enough level of performance that it alone can drive a transition similar to the Industrial Revolution.”—that is a “growth rate [of the world economy of] 20%-30% per year if used everywhere it would be profitable to use”
  Your scenario is premise 4 “Some deployed APS systems will be exposed to inputs where they seek power in unintended and high-impact ways (say, collectively causing >$1 trillion dollars of damage), because of problems with their objectives” (italics added).
  Your bar is (much?) lower, so we should expect your scenario to come (much?) earlier.