MaxRa comments on Three pillars for avoiding AGI catastrophe: Technical alignment, deployment decisions, and coordination

MaxRa 4 Aug 2022 8:42 UTC
7 points
0 ∶ 0
Thanks again for writing this up! Just a random thought, have you considered what happens when you loosen this assumption:
Background assumption: Deploying unaligned AGI means doom. If humanity builds and deploys unaligned AGI, it will almost certainly kill us all. We won’t be saved by being able to stop the unaligned AGI, or by it happening to converge on values that make it want to let us live, or by anything else.
I’m thinking about scenarios where humanity is able to keep the first 1 to 2 generations of AGI under control (e.g. by restricting applications, by using sufficiently good interpretability to detect most deception, due to very gradual capability increases).
Some spontaneous thoughts what pillars might be additionally interesting then:
- Coordination, but focussed more on labs sharing incidents, insights, tools
- Humanity’s ability to detect and fight power-seeking agents
  - Generic state capacity
  - Generic international cooperation
  - Cybersecurity to prevent rogue agents getting access to resources and weapons, to prevent debilitating cyberattacks
  - Surveillance capabilities
  - Robustness against bioweapons