Rambling question here. What’s the standard response to the idea that very bad things are likely to happen with non-existential AGI before worse things happen with extinction-level AGI?
Eliezee dismissed this as unlikely “what, self-driving cars crashing into each other?”, and I read his “There is no fire alarm” piece, but I’m unconvinced.
For example, we can imagine a range of self-improving, kinda agentic AGIs, from some kind of crappy ChaosGPT let loose online, to a perfect God-level superintelligence optimising for something weird and alien, but perfectly able to function in, conceal itself in and manipulate human systems.
It seems intuitively more likely we’ll develop many of the crappy ones first (seems to already be happening). And that they’ll be dangerous.
I can imagine flawed, agentic, and superficially self-improving AI systems going crazy online, crashing financial systems, hacking military and biosecurity, taking a shot at mass manipulation, but ultimately failing to displace humanity, perhaps because they fail to operate in analog human systems, perhaps because they’re just not that good.
Optimistically, these crappy AIs might function as a warning shot/ fire alarm. Everyone gets terrified, realises we’re creating demons, and we’re in a different world with regards to AI alignment.
My own response is that AIs which can cause very bad things (but not human disempowerment) will indeed come before AIs which can cause human disempowerment, and if we had an indefinitely long period where such AIs were widely deployed and tinkered with by many groups of humans, such very bad things would come to pass. However, instead the period will be short, since the more powerful and more dangerous kind of AI will arrive soon.
(Analogy: “Surely before an intelligent species figures out how to make AGI, it’ll figure out how to make nukes and bioweapons. Therefore whenever AGI appears in the universe, it must be in the post-apocalyptic remnants of a civilization already wracked by nuclear and biological warfare.” Wrong! These things can happen, and maybe in the limit of infinite time they have to happen, but they don’t have to happen in any given relatively short time period; our civilization is a case in point.)
Okay, I think your reference to infinite time periods isn’t particularly relevant here (seems to be a massive difference between 5 and 20 years), but I get your point that short timelines play an important role.
I guess the relevant factors that might be where we have different intuitions are:
How long will this post-agentic-AGI, pre-God-AGI phase last?
How chaotic/ dangerous will it be?
When bad stuff happens, how likely is it to seriously alter the situation? (e.g. pause in AI progress, massive increase in alignment research, major compute limitations, massive reduction on global scientific capacity etc.)
Yeah I should have taken more care to explain myself: I do think the sorts of large-but-not-catastrophic harms you are talking about might happen, I just think that more likely than not, they won’t happen, because timelines are short. (My 50% mark for AGI, or if you want to be more precise, AI capable of disempowering humanity, is 2027)
So, my answers to your questions would be: 1. It seems we are on the cusp of agentic AGI right now in 2023, and that godlike AGI will come around 2027 or so. 2. Unclear. Could be quite chaotic & dangerous, but I’m thinking it probably won’t be. Human governments and AI companies have a decent amount of control, at least up until about a year before godlike AGI, and they’ll probably use that control to maintain stability and peace rather than fight each other or sow chaos. I’m not particularly confident though. 3. I think it depends on the details of the bad thing that happened. I’d be interested to hear what sort of bad things you have in mind.
I think there’s a range of things that could happen with lower-level AGI, with increasing levels of ‘fire-alarm-ness’ (1-4), but decreasing levels of likelihood. Here’s a list; my (very tentative) model would be that I expect lots of 1s and a few 2s within my default scenario, and this will be enough to slow down the process and make our trajectory slightly less dangerous.
Forgive the vagueness, but these are the kind of things I have in mind:
1. Mild fire alarm:
- Hacking (prompt injections?) within current realms of possibility (but amped up a bit) - Human manipulation within current realms of possibility (IRA disinformation *5) - Visible, unexpected self-improvement/ escape (without severe harm) - Any lethal autonomous weapon use (even if generally aligned) especially by rogue power - Everyday tech (phones, vehicles, online platforms) doing crazy, but benign misaligned stuff - Stock market manipulation causing important people to lose a lot of money
2. Moderate fire alarm:
- Hacking beyond current levels of possibility - Extreme mass manipulation - Collapsing financial or governance systems causing minor financial or political crisis - Deadly use of autonomous AGI in weapons systems by rogue group (killing over 1000 people) - Misaligned, but less deadly, use in weapons systems - Unexpected self-improvement/ escape of a system causing multiple casualties/ other chaos - Attempted (thwarted) acquisition of WMDs/ biological weapons - Unsuccessful (but visible) attempts to seize political power
3. Major fire alarm:
- Successful attempts to seize political power - Effective global mass manipulation - Successful acquisition of WMDs, bioweapons - Complete financial collapse— Complete destruction of online systems- internet becomes unuseable etc. - Misaligned, very deadly use in weapons systems
4. The fire alarm has been destroyed, so now it’s just some guy hitting a rock with a scorched fencepost:
- Actual triggering of nuclear/ bio conflict/ other genuine civilisational collapse scenario (destroying AI in the process)
Great list, thanks! My current tentative expectation is that we’ll see a couple things in 1, but nothing in 2+, until it’s already too late (i.e. until humanity is already basically in a game of chess with a superior opponent, i.e. until there’s no longer a realistic hope of humanity coordinating to stop the slide into oblivion, by contrast with today where we are on a path to oblivion but there’s a realistic possibility of changing course.)
In the near term, I’d personally think of prompt injections by some malicious actor which cause security breaches in some big companies. Perhaps a lot of money lost, and perhaps important information leaked. I don’t have expertise on this but I’ve seen some concern about it from security experts after the GPT plugins. Since that seems like it could cause a lot of instability even without agentic AI & it feels rather straightforward to me, I’d expect more chaos on 2.
Oh, I thought you had much more intense things in mind than that. Malicious actor using LLMs in some hacking scheme to get security breaches seems probable to me.
But that wouldn’t cause instability to go above baseline. Things like this happen every year. Russia invaded Ukraine last year, for example—for the world to generally become less stable there needs to be either events that are a much bigger deal than that invasion, or events like that invasion happening every few months.
I guess that really depends on how deep this particular problem runs. If it makes most big companies very vulnerable since most employees use LLMs which are susceptible to prompt injections, I’d expect this to cause more chaos in the US than Russia’s invasion of Ukraine. I think we’re talking slightly past each other though, I wanted to make the point that the baseline (non-existential) chaos from agentic AI should be high since near term, non-agentic AI may already cause a lot of chaos. I was not comparing it to other causes of chaos; though I’m very uncertain about how these will compare.
I’m surprised btw that you don’t expect a (sufficient) fire alarm solely on the basis of short timelines. To me, the relevant issue seems more ‘how many more misaligned AIs with what level of capabilities will be deployed before takeoff’. Since a lot more models with higher capabilities got deployed recently, it doesn’t change the picture for me. If anything, I expect non-existential disasters before takeoff more since the last few months since AI companies seem to just release every model & new feature they got. I’d also expect a slow takeoff of misaligned AI to raise the chances of a loud warning shot & the general public having a Covid-in-Feb-2020-wake-up-moment on the issue.
I definitely agree that near term, non-agentic AI will cause a lot of chaos. I just don’t expect it to be so much chaos that the world as a whole feels significantly more chaotic than usual. But I also agree that might happen too.
I also agree that this sort of thing will have a warning-shot effect that makes a Covid-in-feb-2020-type response plausible.
It seems we maybe don’t actually disagree that much?
Seems like it could happen in the next year or two. I think we still need to go all out preventing it happening though, given how much suffering it will cause. So the conclusion is the same: global moratorium on AGI development.
Rambling question here. What’s the standard response to the idea that very bad things are likely to happen with non-existential AGI before worse things happen with extinction-level AGI?
Eliezee dismissed this as unlikely “what, self-driving cars crashing into each other?”, and I read his “There is no fire alarm” piece, but I’m unconvinced.
For example, we can imagine a range of self-improving, kinda agentic AGIs, from some kind of crappy ChaosGPT let loose online, to a perfect God-level superintelligence optimising for something weird and alien, but perfectly able to function in, conceal itself in and manipulate human systems.
It seems intuitively more likely we’ll develop many of the crappy ones first (seems to already be happening). And that they’ll be dangerous.
I can imagine flawed, agentic, and superficially self-improving AI systems going crazy online, crashing financial systems, hacking military and biosecurity, taking a shot at mass manipulation, but ultimately failing to displace humanity, perhaps because they fail to operate in analog human systems, perhaps because they’re just not that good.
Optimistically, these crappy AIs might function as a warning shot/ fire alarm. Everyone gets terrified, realises we’re creating demons, and we’re in a different world with regards to AI alignment.
My own response is that AIs which can cause very bad things (but not human disempowerment) will indeed come before AIs which can cause human disempowerment, and if we had an indefinitely long period where such AIs were widely deployed and tinkered with by many groups of humans, such very bad things would come to pass. However, instead the period will be short, since the more powerful and more dangerous kind of AI will arrive soon.
(Analogy: “Surely before an intelligent species figures out how to make AGI, it’ll figure out how to make nukes and bioweapons. Therefore whenever AGI appears in the universe, it must be in the post-apocalyptic remnants of a civilization already wracked by nuclear and biological warfare.” Wrong! These things can happen, and maybe in the limit of infinite time they have to happen, but they don’t have to happen in any given relatively short time period; our civilization is a case in point.)
Okay, I think your reference to infinite time periods isn’t particularly relevant here (seems to be a massive difference between 5 and 20 years), but I get your point that short timelines play an important role.
I guess the relevant factors that might be where we have different intuitions are:
How long will this post-agentic-AGI, pre-God-AGI phase last?
How chaotic/ dangerous will it be?
When bad stuff happens, how likely is it to seriously alter the situation? (e.g. pause in AI progress, massive increase in alignment research, major compute limitations, massive reduction on global scientific capacity etc.)
Yeah I should have taken more care to explain myself: I do think the sorts of large-but-not-catastrophic harms you are talking about might happen, I just think that more likely than not, they won’t happen, because timelines are short. (My 50% mark for AGI, or if you want to be more precise, AI capable of disempowering humanity, is 2027)
So, my answers to your questions would be:
1. It seems we are on the cusp of agentic AGI right now in 2023, and that godlike AGI will come around 2027 or so.
2. Unclear. Could be quite chaotic & dangerous, but I’m thinking it probably won’t be. Human governments and AI companies have a decent amount of control, at least up until about a year before godlike AGI, and they’ll probably use that control to maintain stability and peace rather than fight each other or sow chaos. I’m not particularly confident though.
3. I think it depends on the details of the bad thing that happened. I’d be interested to hear what sort of bad things you have in mind.
I think there’s a range of things that could happen with lower-level AGI, with increasing levels of ‘fire-alarm-ness’ (1-4), but decreasing levels of likelihood. Here’s a list; my (very tentative) model would be that I expect lots of 1s and a few 2s within my default scenario, and this will be enough to slow down the process and make our trajectory slightly less dangerous.
Forgive the vagueness, but these are the kind of things I have in mind:
1. Mild fire alarm:
- Hacking (prompt injections?) within current realms of possibility (but amped up a bit)
- Human manipulation within current realms of possibility (IRA disinformation *5)
- Visible, unexpected self-improvement/ escape (without severe harm)
- Any lethal autonomous weapon use (even if generally aligned) especially by rogue power
- Everyday tech (phones, vehicles, online platforms) doing crazy, but benign misaligned stuff
- Stock market manipulation causing important people to lose a lot of money
2. Moderate fire alarm:
- Hacking beyond current levels of possibility
- Extreme mass manipulation
- Collapsing financial or governance systems causing minor financial or political crisis
- Deadly use of autonomous AGI in weapons systems by rogue group (killing over 1000 people)
- Misaligned, but less deadly, use in weapons systems
- Unexpected self-improvement/ escape of a system causing multiple casualties/ other chaos
- Attempted (thwarted) acquisition of WMDs/ biological weapons
- Unsuccessful (but visible) attempts to seize political power
3. Major fire alarm:
- Successful attempts to seize political power
- Effective global mass manipulation
- Successful acquisition of WMDs, bioweapons
- Complete financial collapse—
Complete destruction of online systems- internet becomes unuseable etc.
- Misaligned, very deadly use in weapons systems
4. The fire alarm has been destroyed, so now it’s just some guy hitting a rock with a scorched fencepost:
- Actual triggering of nuclear/ bio conflict/ other genuine civilisational collapse scenario (destroying AI in the process)
Great list, thanks!
My current tentative expectation is that we’ll see a couple things in 1, but nothing in 2+, until it’s already too late (i.e. until humanity is already basically in a game of chess with a superior opponent, i.e. until there’s no longer a realistic hope of humanity coordinating to stop the slide into oblivion, by contrast with today where we are on a path to oblivion but there’s a realistic possibility of changing course.)
In the near term, I’d personally think of prompt injections by some malicious actor which cause security breaches in some big companies. Perhaps a lot of money lost, and perhaps important information leaked. I don’t have expertise on this but I’ve seen some concern about it from security experts after the GPT plugins. Since that seems like it could cause a lot of instability even without agentic AI & it feels rather straightforward to me, I’d expect more chaos on 2.
Oh, I thought you had much more intense things in mind than that. Malicious actor using LLMs in some hacking scheme to get security breaches seems probable to me.
But that wouldn’t cause instability to go above baseline. Things like this happen every year. Russia invaded Ukraine last year, for example—for the world to generally become less stable there needs to be either events that are a much bigger deal than that invasion, or events like that invasion happening every few months.
I guess that really depends on how deep this particular problem runs. If it makes most big companies very vulnerable since most employees use LLMs which are susceptible to prompt injections, I’d expect this to cause more chaos in the US than Russia’s invasion of Ukraine. I think we’re talking slightly past each other though, I wanted to make the point that the baseline (non-existential) chaos from agentic AI should be high since near term, non-agentic AI may already cause a lot of chaos. I was not comparing it to other causes of chaos; though I’m very uncertain about how these will compare.
I’m surprised btw that you don’t expect a (sufficient) fire alarm solely on the basis of short timelines. To me, the relevant issue seems more ‘how many more misaligned AIs with what level of capabilities will be deployed before takeoff’. Since a lot more models with higher capabilities got deployed recently, it doesn’t change the picture for me. If anything, I expect non-existential disasters before takeoff more since the last few months since AI companies seem to just release every model & new feature they got. I’d also expect a slow takeoff of misaligned AI to raise the chances of a loud warning shot & the general public having a Covid-in-Feb-2020-wake-up-moment on the issue.
I definitely agree that near term, non-agentic AI will cause a lot of chaos. I just don’t expect it to be so much chaos that the world as a whole feels significantly more chaotic than usual. But I also agree that might happen too.
I also agree that this sort of thing will have a warning-shot effect that makes a Covid-in-feb-2020-type response plausible.
It seems we maybe don’t actually disagree that much?
I completely agree with you and think that’s what will happen. Eliezer might disagree but many others would agree with you.
Seems like it could happen in the next year or two. I think we still need to go all out preventing it happening though, given how much suffering it will cause. So the conclusion is the same: global moratorium on AGI development.