I’m thinking of a ‘warning shot’ roughly as ‘an event where AI is widely perceived to have caused a very large amount of destruction’. Maybe loosely operationalized as ‘an event about as sudden as 9/11, and at least one-tenth as shocking, tragic, and deadly as 9/11’.
I don’t have stable or reliable probabilities here, and I expect that other MIRI people would give totally different numbers. But my current ass numbers are something like:
12% chance of a warning shot happening at some point.
6% chance of a warning shot happening more than 6 years before AGI destroys and/or saves the world.
10% chance of a warning shot happening 6 years or less before AGI destroys and/or saves the world.
My current unconditional odds on humanity surviving are very low (with most of my optimism coming from the fact that the future is just inherently hard to predict). Stating some other ass numbers:
Suppose that things go super well for alignment and timelines aren’t that short, such that we achieve AGI in 2050 and have a 10% chance of existential success. In that world, if we held as much as possible constant except that AGI comes in 2060 instead of 2050, then I’m guessing that would double our success odds to 20%.
If we invented AGI in 2050 and somehow impossibly had fifteen years to work with AGI systems before anyone would be able to destroy the world, and we knew as much, then I’d imagine our success odds maybe rising from 10% to 55%.
The default I expect instead is that the first AGI developer will have more than three months, and less than five years, before someone else destroys the world with AGI. (And on the mainline, I expect them to have ~zero chance of aligning their AGI, and I expect everyone else to have ~zero chance as well.)
If a warning shot had a large positive impact on our success probability, I’d expect it to look something like:
’20 or 30 or 40 years before we’d naturally reach AGI, a huge narrow-AI disaster unrelated to AGI risk occurs. This disaster is purely accidental (not terrorism or whatever). Its effect is mainly just to cause it to be in the Overton window that a wider variety of serious technical people can talk about scary AI outcomes at all, and maybe it slows timelines by five years or something. Also, somehow none of this causes discourse to become even dumber; e.g., people don’t start dismissing AGI risk because “the real risk is narrow AI systems like the one we just saw”, and there isn’t a big ML backlash to regulatory/safety efforts, and so on.′
I don’t expect anything at all like that to happen, not least because I suspect we may not have 20+ years left before AGI. But that’s a scenario where I could (maybe, optimistically) imagine real, modest improvements.
If I imagine that absent any warning shots, AGI is coming in 2050, and there’s a 1% chance of things going well in 2050, then:
If we add a warning shot in 2023, then I’d predict something like: 85% chance it has no major effect, 12% chance if makes the situation a lot worse, 3% chance it makes the situation a lot better. (I.e., an 80% chance that if it has a major effect, it makes things worse.)
This still strikes me as worth thinking about some, in part because these probabilities are super unreliable. But mostly I think EAs should set aside the idea of warning shots and think more about things we might be able to cause to happen, and things that have more effects like ‘shift the culture of ML specifically’ and/or ‘transmit lots of bits of information to technical people’, rather than ‘make the world as a whole panic more’.
I’m much more excited by scenarios like: ‘a new podcast comes out that has top-tier-excellent discussion of AI alignment stuff, it becomes super popular among ML researchers, and the culture, norms, and expectations of ML thereby shift such that water-cooler conversations about AGI catastrophe are more serious, substantive, informed, candid, and frequent’.
It’s rare for a big positive cultural shift like that to happen; but it does happen sometimes, and it can result in very fast changes to the Overton window. And since it’s a podcast containing many hours of content, there’s the potential to seed subsequent conversations with a lot of high-quality background thoughts.
To my eye, that seems more like the kind of change that might shift us from a current trajectory of “~definitely going to kill ourselves” to a new trajectory of “viable chance of an existential win”.
Whereas warning shots feel more unpredictable to me, and if they’re unhelpful, I expect the helpfulness to at best look like “we were almost on track to win, and then the warning shot nudged us just enough to secure a win”.
Well, it’s less effort insofar as these are very low-confidence, unstable ass numbers. I wouldn’t want to depend on a plan that assumes there will be no warning shots, or a plan that assumes there will be some.
I’m thinking of a ‘warning shot’ roughly as ‘an event where AI is widely perceived to have caused a very large amount of destruction’. Maybe loosely operationalized as ‘an event about as sudden as 9/11, and at least one-tenth as shocking, tragic, and deadly as 9/11’.
I don’t have stable or reliable probabilities here, and I expect that other MIRI people would give totally different numbers. But my current ass numbers are something like:
12% chance of a warning shot happening at some point.
6% chance of a warning shot happening more than 6 years before AGI destroys and/or saves the world.
10% chance of a warning shot happening 6 years or less before AGI destroys and/or saves the world.
My current unconditional odds on humanity surviving are very low (with most of my optimism coming from the fact that the future is just inherently hard to predict). Stating some other ass numbers:
Suppose that things go super well for alignment and timelines aren’t that short, such that we achieve AGI in 2050 and have a 10% chance of existential success. In that world, if we held as much as possible constant except that AGI comes in 2060 instead of 2050, then I’m guessing that would double our success odds to 20%.
If we invented AGI in 2050 and somehow impossibly had fifteen years to work with AGI systems before anyone would be able to destroy the world, and we knew as much, then I’d imagine our success odds maybe rising from 10% to 55%.
The default I expect instead is that the first AGI developer will have more than three months, and less than five years, before someone else destroys the world with AGI. (And on the mainline, I expect them to have ~zero chance of aligning their AGI, and I expect everyone else to have ~zero chance as well.)
If a warning shot had a large positive impact on our success probability, I’d expect it to look something like:
’20 or 30 or 40 years before we’d naturally reach AGI, a huge narrow-AI disaster unrelated to AGI risk occurs. This disaster is purely accidental (not terrorism or whatever). Its effect is mainly just to cause it to be in the Overton window that a wider variety of serious technical people can talk about scary AI outcomes at all, and maybe it slows timelines by five years or something. Also, somehow none of this causes discourse to become even dumber; e.g., people don’t start dismissing AGI risk because “the real risk is narrow AI systems like the one we just saw”, and there isn’t a big ML backlash to regulatory/safety efforts, and so on.′
I don’t expect anything at all like that to happen, not least because I suspect we may not have 20+ years left before AGI. But that’s a scenario where I could (maybe, optimistically) imagine real, modest improvements.
If I imagine that absent any warning shots, AGI is coming in 2050, and there’s a 1% chance of things going well in 2050, then:
If we add a warning shot in 2023, then I’d predict something like: 85% chance it has no major effect, 12% chance if makes the situation a lot worse, 3% chance it makes the situation a lot better. (I.e., an 80% chance that if it has a major effect, it makes things worse.)
This still strikes me as worth thinking about some, in part because these probabilities are super unreliable. But mostly I think EAs should set aside the idea of warning shots and think more about things we might be able to cause to happen, and things that have more effects like ‘shift the culture of ML specifically’ and/or ‘transmit lots of bits of information to technical people’, rather than ‘make the world as a whole panic more’.
I’m much more excited by scenarios like: ‘a new podcast comes out that has top-tier-excellent discussion of AI alignment stuff, it becomes super popular among ML researchers, and the culture, norms, and expectations of ML thereby shift such that water-cooler conversations about AGI catastrophe are more serious, substantive, informed, candid, and frequent’.
It’s rare for a big positive cultural shift like that to happen; but it does happen sometimes, and it can result in very fast changes to the Overton window. And since it’s a podcast containing many hours of content, there’s the potential to seed subsequent conversations with a lot of high-quality background thoughts.
To my eye, that seems more like the kind of change that might shift us from a current trajectory of “~definitely going to kill ourselves” to a new trajectory of “viable chance of an existential win”.
Whereas warning shots feel more unpredictable to me, and if they’re unhelpful, I expect the helpfulness to at best look like “we were almost on track to win, and then the warning shot nudged us just enough to secure a win”.
I thought this was great because it has concrete and specific details, times and probabilities. I think it’s really hard to write these out.
Well, it’s less effort insofar as these are very low-confidence, unstable ass numbers. I wouldn’t want to depend on a plan that assumes there will be no warning shots, or a plan that assumes there will be some.
Very interesting analysis—thank you.