I think there’s a range of things that could happen with lower-level AGI, with increasing levels of ‘fire-alarm-ness’ (1-4), but decreasing levels of likelihood. Here’s a list; my (very tentative) model would be that I expect lots of 1s and a few 2s within my default scenario, and this will be enough to slow down the process and make our trajectory slightly less dangerous.
Forgive the vagueness, but these are the kind of things I have in mind:
1. Mild fire alarm:
- Hacking (prompt injections?) within current realms of possibility (but amped up a bit) - Human manipulation within current realms of possibility (IRA disinformation *5) - Visible, unexpected self-improvement/ escape (without severe harm) - Any lethal autonomous weapon use (even if generally aligned) especially by rogue power - Everyday tech (phones, vehicles, online platforms) doing crazy, but benign misaligned stuff - Stock market manipulation causing important people to lose a lot of money
2. Moderate fire alarm:
- Hacking beyond current levels of possibility - Extreme mass manipulation - Collapsing financial or governance systems causing minor financial or political crisis - Deadly use of autonomous AGI in weapons systems by rogue group (killing over 1000 people) - Misaligned, but less deadly, use in weapons systems - Unexpected self-improvement/ escape of a system causing multiple casualties/ other chaos - Attempted (thwarted) acquisition of WMDs/ biological weapons - Unsuccessful (but visible) attempts to seize political power
3. Major fire alarm:
- Successful attempts to seize political power - Effective global mass manipulation - Successful acquisition of WMDs, bioweapons - Complete financial collapse— Complete destruction of online systems- internet becomes unuseable etc. - Misaligned, very deadly use in weapons systems
4. The fire alarm has been destroyed, so now it’s just some guy hitting a rock with a scorched fencepost:
- Actual triggering of nuclear/ bio conflict/ other genuine civilisational collapse scenario (destroying AI in the process)
Great list, thanks! My current tentative expectation is that we’ll see a couple things in 1, but nothing in 2+, until it’s already too late (i.e. until humanity is already basically in a game of chess with a superior opponent, i.e. until there’s no longer a realistic hope of humanity coordinating to stop the slide into oblivion, by contrast with today where we are on a path to oblivion but there’s a realistic possibility of changing course.)
I think there’s a range of things that could happen with lower-level AGI, with increasing levels of ‘fire-alarm-ness’ (1-4), but decreasing levels of likelihood. Here’s a list; my (very tentative) model would be that I expect lots of 1s and a few 2s within my default scenario, and this will be enough to slow down the process and make our trajectory slightly less dangerous.
Forgive the vagueness, but these are the kind of things I have in mind:
1. Mild fire alarm:
- Hacking (prompt injections?) within current realms of possibility (but amped up a bit)
- Human manipulation within current realms of possibility (IRA disinformation *5)
- Visible, unexpected self-improvement/ escape (without severe harm)
- Any lethal autonomous weapon use (even if generally aligned) especially by rogue power
- Everyday tech (phones, vehicles, online platforms) doing crazy, but benign misaligned stuff
- Stock market manipulation causing important people to lose a lot of money
2. Moderate fire alarm:
- Hacking beyond current levels of possibility
- Extreme mass manipulation
- Collapsing financial or governance systems causing minor financial or political crisis
- Deadly use of autonomous AGI in weapons systems by rogue group (killing over 1000 people)
- Misaligned, but less deadly, use in weapons systems
- Unexpected self-improvement/ escape of a system causing multiple casualties/ other chaos
- Attempted (thwarted) acquisition of WMDs/ biological weapons
- Unsuccessful (but visible) attempts to seize political power
3. Major fire alarm:
- Successful attempts to seize political power
- Effective global mass manipulation
- Successful acquisition of WMDs, bioweapons
- Complete financial collapse—
Complete destruction of online systems- internet becomes unuseable etc.
- Misaligned, very deadly use in weapons systems
4. The fire alarm has been destroyed, so now it’s just some guy hitting a rock with a scorched fencepost:
- Actual triggering of nuclear/ bio conflict/ other genuine civilisational collapse scenario (destroying AI in the process)
Great list, thanks!
My current tentative expectation is that we’ll see a couple things in 1, but nothing in 2+, until it’s already too late (i.e. until humanity is already basically in a game of chess with a superior opponent, i.e. until there’s no longer a realistic hope of humanity coordinating to stop the slide into oblivion, by contrast with today where we are on a path to oblivion but there’s a realistic possibility of changing course.)