Thanks again for writing this up! Just a random thought, have you considered what happens when you loosen this assumption:
Background assumption: Deploying unaligned AGI means doom. If humanity builds and deploys unaligned AGI, it will almost certainly kill us all. We won’t be saved by being able to stop the unaligned AGI, or by it happening to converge on values that make it want to let us live, or by anything else.
I’m thinking about scenarios where humanity is able to keep the first 1 to 2 generations of AGI under control (e.g. by restricting applications, by using sufficiently good interpretability to detect most deception, due to very gradual capability increases).
Some spontaneous thoughts what pillars might be additionally interesting then:
Coordination, but focussed more on labs sharing incidents, insights, tools
Humanity’s ability to detect and fight power-seeking agents
Generic state capacity
Generic international cooperation
Cybersecurity to prevent rogue agents getting access to resources and weapons, to prevent debilitating cyberattacks
Thanks again for writing this up! Just a random thought, have you considered what happens when you loosen this assumption:
I’m thinking about scenarios where humanity is able to keep the first 1 to 2 generations of AGI under control (e.g. by restricting applications, by using sufficiently good interpretability to detect most deception, due to very gradual capability increases).
Some spontaneous thoughts what pillars might be additionally interesting then:
Coordination, but focussed more on labs sharing incidents, insights, tools
Humanity’s ability to detect and fight power-seeking agents
Generic state capacity
Generic international cooperation
Cybersecurity to prevent rogue agents getting access to resources and weapons, to prevent debilitating cyberattacks
Surveillance capabilities
Robustness against bioweapons