Yeah I should have taken more care to explain myself: I do think the sorts of large-but-not-catastrophic harms you are talking about might happen, I just think that more likely than not, they won’t happen, because timelines are short. (My 50% mark for AGI, or if you want to be more precise, AI capable of disempowering humanity, is 2027)
So, my answers to your questions would be: 1. It seems we are on the cusp of agentic AGI right now in 2023, and that godlike AGI will come around 2027 or so. 2. Unclear. Could be quite chaotic & dangerous, but I’m thinking it probably won’t be. Human governments and AI companies have a decent amount of control, at least up until about a year before godlike AGI, and they’ll probably use that control to maintain stability and peace rather than fight each other or sow chaos. I’m not particularly confident though. 3. I think it depends on the details of the bad thing that happened. I’d be interested to hear what sort of bad things you have in mind.
I think there’s a range of things that could happen with lower-level AGI, with increasing levels of ‘fire-alarm-ness’ (1-4), but decreasing levels of likelihood. Here’s a list; my (very tentative) model would be that I expect lots of 1s and a few 2s within my default scenario, and this will be enough to slow down the process and make our trajectory slightly less dangerous.
Forgive the vagueness, but these are the kind of things I have in mind:
1. Mild fire alarm:
- Hacking (prompt injections?) within current realms of possibility (but amped up a bit) - Human manipulation within current realms of possibility (IRA disinformation *5) - Visible, unexpected self-improvement/ escape (without severe harm) - Any lethal autonomous weapon use (even if generally aligned) especially by rogue power - Everyday tech (phones, vehicles, online platforms) doing crazy, but benign misaligned stuff - Stock market manipulation causing important people to lose a lot of money
2. Moderate fire alarm:
- Hacking beyond current levels of possibility - Extreme mass manipulation - Collapsing financial or governance systems causing minor financial or political crisis - Deadly use of autonomous AGI in weapons systems by rogue group (killing over 1000 people) - Misaligned, but less deadly, use in weapons systems - Unexpected self-improvement/ escape of a system causing multiple casualties/ other chaos - Attempted (thwarted) acquisition of WMDs/ biological weapons - Unsuccessful (but visible) attempts to seize political power
3. Major fire alarm:
- Successful attempts to seize political power - Effective global mass manipulation - Successful acquisition of WMDs, bioweapons - Complete financial collapse— Complete destruction of online systems- internet becomes unuseable etc. - Misaligned, very deadly use in weapons systems
4. The fire alarm has been destroyed, so now it’s just some guy hitting a rock with a scorched fencepost:
- Actual triggering of nuclear/ bio conflict/ other genuine civilisational collapse scenario (destroying AI in the process)
Great list, thanks! My current tentative expectation is that we’ll see a couple things in 1, but nothing in 2+, until it’s already too late (i.e. until humanity is already basically in a game of chess with a superior opponent, i.e. until there’s no longer a realistic hope of humanity coordinating to stop the slide into oblivion, by contrast with today where we are on a path to oblivion but there’s a realistic possibility of changing course.)
In the near term, I’d personally think of prompt injections by some malicious actor which cause security breaches in some big companies. Perhaps a lot of money lost, and perhaps important information leaked. I don’t have expertise on this but I’ve seen some concern about it from security experts after the GPT plugins. Since that seems like it could cause a lot of instability even without agentic AI & it feels rather straightforward to me, I’d expect more chaos on 2.
Oh, I thought you had much more intense things in mind than that. Malicious actor using LLMs in some hacking scheme to get security breaches seems probable to me.
But that wouldn’t cause instability to go above baseline. Things like this happen every year. Russia invaded Ukraine last year, for example—for the world to generally become less stable there needs to be either events that are a much bigger deal than that invasion, or events like that invasion happening every few months.
I guess that really depends on how deep this particular problem runs. If it makes most big companies very vulnerable since most employees use LLMs which are susceptible to prompt injections, I’d expect this to cause more chaos in the US than Russia’s invasion of Ukraine. I think we’re talking slightly past each other though, I wanted to make the point that the baseline (non-existential) chaos from agentic AI should be high since near term, non-agentic AI may already cause a lot of chaos. I was not comparing it to other causes of chaos; though I’m very uncertain about how these will compare.
I’m surprised btw that you don’t expect a (sufficient) fire alarm solely on the basis of short timelines. To me, the relevant issue seems more ‘how many more misaligned AIs with what level of capabilities will be deployed before takeoff’. Since a lot more models with higher capabilities got deployed recently, it doesn’t change the picture for me. If anything, I expect non-existential disasters before takeoff more since the last few months since AI companies seem to just release every model & new feature they got. I’d also expect a slow takeoff of misaligned AI to raise the chances of a loud warning shot & the general public having a Covid-in-Feb-2020-wake-up-moment on the issue.
I definitely agree that near term, non-agentic AI will cause a lot of chaos. I just don’t expect it to be so much chaos that the world as a whole feels significantly more chaotic than usual. But I also agree that might happen too.
I also agree that this sort of thing will have a warning-shot effect that makes a Covid-in-feb-2020-type response plausible.
It seems we maybe don’t actually disagree that much?
Yeah I should have taken more care to explain myself: I do think the sorts of large-but-not-catastrophic harms you are talking about might happen, I just think that more likely than not, they won’t happen, because timelines are short. (My 50% mark for AGI, or if you want to be more precise, AI capable of disempowering humanity, is 2027)
So, my answers to your questions would be:
1. It seems we are on the cusp of agentic AGI right now in 2023, and that godlike AGI will come around 2027 or so.
2. Unclear. Could be quite chaotic & dangerous, but I’m thinking it probably won’t be. Human governments and AI companies have a decent amount of control, at least up until about a year before godlike AGI, and they’ll probably use that control to maintain stability and peace rather than fight each other or sow chaos. I’m not particularly confident though.
3. I think it depends on the details of the bad thing that happened. I’d be interested to hear what sort of bad things you have in mind.
I think there’s a range of things that could happen with lower-level AGI, with increasing levels of ‘fire-alarm-ness’ (1-4), but decreasing levels of likelihood. Here’s a list; my (very tentative) model would be that I expect lots of 1s and a few 2s within my default scenario, and this will be enough to slow down the process and make our trajectory slightly less dangerous.
Forgive the vagueness, but these are the kind of things I have in mind:
1. Mild fire alarm:
- Hacking (prompt injections?) within current realms of possibility (but amped up a bit)
- Human manipulation within current realms of possibility (IRA disinformation *5)
- Visible, unexpected self-improvement/ escape (without severe harm)
- Any lethal autonomous weapon use (even if generally aligned) especially by rogue power
- Everyday tech (phones, vehicles, online platforms) doing crazy, but benign misaligned stuff
- Stock market manipulation causing important people to lose a lot of money
2. Moderate fire alarm:
- Hacking beyond current levels of possibility
- Extreme mass manipulation
- Collapsing financial or governance systems causing minor financial or political crisis
- Deadly use of autonomous AGI in weapons systems by rogue group (killing over 1000 people)
- Misaligned, but less deadly, use in weapons systems
- Unexpected self-improvement/ escape of a system causing multiple casualties/ other chaos
- Attempted (thwarted) acquisition of WMDs/ biological weapons
- Unsuccessful (but visible) attempts to seize political power
3. Major fire alarm:
- Successful attempts to seize political power
- Effective global mass manipulation
- Successful acquisition of WMDs, bioweapons
- Complete financial collapse—
Complete destruction of online systems- internet becomes unuseable etc.
- Misaligned, very deadly use in weapons systems
4. The fire alarm has been destroyed, so now it’s just some guy hitting a rock with a scorched fencepost:
- Actual triggering of nuclear/ bio conflict/ other genuine civilisational collapse scenario (destroying AI in the process)
Great list, thanks!
My current tentative expectation is that we’ll see a couple things in 1, but nothing in 2+, until it’s already too late (i.e. until humanity is already basically in a game of chess with a superior opponent, i.e. until there’s no longer a realistic hope of humanity coordinating to stop the slide into oblivion, by contrast with today where we are on a path to oblivion but there’s a realistic possibility of changing course.)
In the near term, I’d personally think of prompt injections by some malicious actor which cause security breaches in some big companies. Perhaps a lot of money lost, and perhaps important information leaked. I don’t have expertise on this but I’ve seen some concern about it from security experts after the GPT plugins. Since that seems like it could cause a lot of instability even without agentic AI & it feels rather straightforward to me, I’d expect more chaos on 2.
Oh, I thought you had much more intense things in mind than that. Malicious actor using LLMs in some hacking scheme to get security breaches seems probable to me.
But that wouldn’t cause instability to go above baseline. Things like this happen every year. Russia invaded Ukraine last year, for example—for the world to generally become less stable there needs to be either events that are a much bigger deal than that invasion, or events like that invasion happening every few months.
I guess that really depends on how deep this particular problem runs. If it makes most big companies very vulnerable since most employees use LLMs which are susceptible to prompt injections, I’d expect this to cause more chaos in the US than Russia’s invasion of Ukraine. I think we’re talking slightly past each other though, I wanted to make the point that the baseline (non-existential) chaos from agentic AI should be high since near term, non-agentic AI may already cause a lot of chaos. I was not comparing it to other causes of chaos; though I’m very uncertain about how these will compare.
I’m surprised btw that you don’t expect a (sufficient) fire alarm solely on the basis of short timelines. To me, the relevant issue seems more ‘how many more misaligned AIs with what level of capabilities will be deployed before takeoff’. Since a lot more models with higher capabilities got deployed recently, it doesn’t change the picture for me. If anything, I expect non-existential disasters before takeoff more since the last few months since AI companies seem to just release every model & new feature they got. I’d also expect a slow takeoff of misaligned AI to raise the chances of a loud warning shot & the general public having a Covid-in-Feb-2020-wake-up-moment on the issue.
I definitely agree that near term, non-agentic AI will cause a lot of chaos. I just don’t expect it to be so much chaos that the world as a whole feels significantly more chaotic than usual. But I also agree that might happen too.
I also agree that this sort of thing will have a warning-shot effect that makes a Covid-in-feb-2020-type response plausible.
It seems we maybe don’t actually disagree that much?