I think there are many different ways things can go down that will all result in no doom. I’m still concerned that some of them could involve large amounts of collateral damage. Below I will sketch out 8 different scenarios that result in humanity surviving:
1.The most likely scenario for safety in my mind is that what we call an AGI will not live up to the hype. Ie, it will do some very impressive things, but will still retain significant flaws and make frequent mistakes, which render it incapable of world domination.
2.Similar to the above, it might be that conquering humanity becomes an impossibly difficult task, perhaps due to enhanced monitoring and anti-AI enforcement mechanisms.
3.Another scenario is that the first AGI we build will not be malicious, and we use those AI’s to monitor and defeat rogue AI’s.
4.Another scenario is that using a bunch of hyper-specialised “narrow” AI turns out to be better than a “general” AI, so research into the latter is essentially abandoned.
5.Another scenario is that solving alignment is a necessary step on the road to AGI. “getting the computer to do what you want” is a key part of all programming, it may be the key question that needs to be solved to even get to “AGI”.
6.Another scenario is that solving alignment turns out to be surprisingly easy, so everyone just does it.
7.Another scenario is a very high frequency of warning shots. AI does not need to be capable of winning to go rogue. It could be mistaken about it’s beliefs, or think that a small probability of success is “worth it”. A few high profile disasters might be more than enough to get the world onboard with banning AGI entirely.
8.Another scenario is that we don’t end up with enough compute power to actually run an AGI, so it doesn’t happen.
I would bet there are plenty more possible scenarios out there.
1. The premise is that we have AGI (i.e. p(doom|AGI)).
2. What do the anti-AI enforcement mechanisms look like? How are they better than the military-grade cybersecurity that we currently have (that is hardly watertight)?
3. How is the AGI (spontaneously?) not dangerous? (Malicious is a loaded word here: ~all the danger comes from it being indifferent. “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”)
4. This seems highly unlikely given the success of GPT-4, and multimodal foundation models (expect “GATO 2” or similar from Google DeepMind in the next few months—a single model able to handle text, images, video and robotics).
5. I don’t see how that follows? Or at least how you get perfect (non-doom level) alignment. So far releases have happened when the alignment is “good enough”. But it is only good enough because the models are still weaker than us.
6. Why do people think this? Where are the major breakthroughs getting us closer to this? There is even a nascent field working on proofs for alignment’s theoretical impossibility.
7. I hope, if there are warning shots, that this happens! Ideally it should be at least Paused pre-emptively, to give alignment research time to catch up.
8. Seems unlikely. We’re probably in a hardware overhang situation already. The limit may well be just having enough money in one place to spend on the compute ($10B?).
No worries! Here are some responses to your responses.
This scenario is not saying AGI won’t exist, just that they won’t be powerful enough to conquer humanity. As a simple counterexample, if you did a mind upload of a flat earther, that would count as AGI, but they would be too mentally flawed to kill us all.
This scenario would be most likely if an AI plan to conquer the world was highly difficult and required amassing large amounts of resources and power. (I suspect this is the case). In this case, such power could be opposed effectively via conventional means, the same way we prevent rogue states such as north korea from dominating the world.
Perhaps I should have used a different word than “alignment” here. I expect it to be potentially dangerous, but not world-ending. I’m working on a post about this, but turning every atom into a single goal is characteristic of a fanatical global maximiser, which does not describe neural networks or any existing intelligence, and is as such ridiculously unlikely. All that is required is that the AI is aligned enough with us to not want to commit global genocide.
Compare the performance of GPT-4 when playing chess to that of stockfish. Stockfish is massively superior to any human being, while GPT still can’t even learn the rules, despite having access to the same computing power and data. Since specialised AI are superior now, it’s quite possible they will continue as such in the future.
I don’t know what the architecture of future AI will look like, so it’s hard to speculate on this point. But misalignment definitely hurts the goals of AI companies. A premature rogue AI that kills lots of people would get companies sued into oblivion.
Again, I want to emphasise that there is a large gulf between “AI shares literally all of our values and never harms anyone” and “AI is aligned enough to not commit global genocide of all humanity”. I find it incredibly hard to believe that the latter is “theoretically impossible”, although I would be unsurprised if the former is.
People are already primed to be distrustful of AI and the people making it, so global regulation of some kind on AI is probably inevitable. The severity will depend on the severity of “warning shots”.
I just wouldn’t be sure on this one. I don’t think current designs are anywhere close to AGI’s, and I don’t think we know enough about the requirements for AGI to state confidently that we are in an overhang.
I wouldn’t say I’m confident in all of these scenarios being likely, but I’m virtually certain that at least one of them is, or another one I haven’t thought of yet. In general, I believe that the capabilities of AGI’s are being drastically overstated, and they will turn out to have flaws like literally every technology ever invented. I also believe that conquering humanity is a ridiculously difficult task, and probably fairly easy to foil.
A mind upload of a flat-Earther might not stay a flat-Earther for long if they are given access to all the world’s sensors (including orbiting cameras), and had 1000s of subjective years to think every month.
North Korean hackers have stolen billions of dollars. Imagine if there were a million times more of them. And that is mere human-level we’re talking about.
How do you get it aligned enough to not want to commit global genocide? Sounds like you’ve solved 99% of the alignment problem if you can do that!
I thought GPT-4 had learned the rules of chess? Pretty impressive for just being trained on text (show’s it has emergent internal world models).
I think you can assume that the architecture is basically “foundation transformer model / LLM” at this point. As Connor Leahy says, they are basically “general cognition engines” and will scale to full AGI in a generation or two (and with the addition of various plugins etc to aid “System 2” type thinking, which are now freely being offered by the AutoGPT enthusiasts and Open AI). We may or may not get such warning shots (look out for Google DeepMind’s next multimodal model I guess..)
I don’t think there’s as much of a gulf as there appears on the face of it. I think you are anthropomorphising to think that it will care about scale of harms in such a way (without being perfectly aligned). See also: Mesa-optimisation leading to value drift. The AI needs to be aligned on not committing global genocide indefinitely.
Hope so! And hope we don’t need any (more) lethal warning shots for it to happen. My worry is that we have very little time to get the regulation in place (hence working to try and speed that up).
I’m not sure, but I’m sure enough to be really concerned! What about the current architecture (“general cognition engine”) + AutoGPT + plugins, isn’t enough? And 100x GPT-4 of compute costs <1% of Google or Microsoft’s market capitalisation.
Even if you don’t think x-risk is likely, if you think a global catastrophe still is, I hope you can get behind calls for regulation.
Okay. But what then stops someone from creating an AGI that does? The game doesn’t end after one turn.
This seems extremely unlikely. In terms of things I can think of, if I were an AGI, I’d just infect every device with malware, and then control the entire flow of information. This could easily be done by reality warping humans (giving them a false feed of information that completely distorts their model of reality) or simply shutting them out—humans can’t coordinate if they can’t communicate on a mass scale. This would be really, really easy. Human society is incredibly fragile. We only don’t realize this because we’ve never had a concerted, competent attempt to actually break it.
The current trend is that generalization is superior, and will probably continue to be so.
With the way current models are designed, this seems extremely unlikely.
This also seems unlikely, given how many have tried, and we’re still nowhere close to solving it.
This is the most plausible of these. But the scale of the tragedy might be extremely high. Breakdown of communication, loss of supply chains, mass starvations, economic collapse, international warfare, etc. Even if it’s not extinction, I’m not sure how many shocks current civilization can endure.
This could maybe slow it, but not for long. I imagine there are far more efficient ways of running an AGI that someone would learn to implement.
This seems extremely unlikely. In terms of things I can think of, if I were an AGI, I’d just infect every device with malware, and then control the entire flow of information. This could easily be done by reality warping humans (giving them a false feed of information that completely distorts their model of reality) or simply shutting them out—humans can’t coordinate if they can’t communicate on a mass scale. This would be really, really easy. Human society is incredibly fragile. We only don’t realize this because we’ve never had a concerted, competent attempt to actually break it.
Have you heard of this thing called “all of human history before the computer age”? Human coordination and civilization do not require hackable devices to operate. This plan would be exposed in 10 seconds flat as soon as people started talking to each other and comparing notes.
In general, I think that the main issue is a ridiculously overinflated idea of what “AGI” will actually be capable of. When something doesn’t exist yet, it’s easy to imagine it as having no flaws. But that invariably never turns out to be the case.
Yeah, once upon a time, but now our civilization is interconnected and dependent on the computer age. And they wouldn’t even have to realize they needed to coordinate.
This plan would be exposed in 10 seconds flat as soon as people started talking to each other and comparing notes.
How would they do that? They’ve lost control of communication. Expose it to who?
I think there are many different ways things can go down that will all result in no doom. I’m still concerned that some of them could involve large amounts of collateral damage. Below I will sketch out 8 different scenarios that result in humanity surviving:
1.The most likely scenario for safety in my mind is that what we call an AGI will not live up to the hype. Ie, it will do some very impressive things, but will still retain significant flaws and make frequent mistakes, which render it incapable of world domination.
2.Similar to the above, it might be that conquering humanity becomes an impossibly difficult task, perhaps due to enhanced monitoring and anti-AI enforcement mechanisms.
3.Another scenario is that the first AGI we build will not be malicious, and we use those AI’s to monitor and defeat rogue AI’s.
4.Another scenario is that using a bunch of hyper-specialised “narrow” AI turns out to be better than a “general” AI, so research into the latter is essentially abandoned.
5.Another scenario is that solving alignment is a necessary step on the road to AGI. “getting the computer to do what you want” is a key part of all programming, it may be the key question that needs to be solved to even get to “AGI”.
6.Another scenario is that solving alignment turns out to be surprisingly easy, so everyone just does it.
7.Another scenario is a very high frequency of warning shots. AI does not need to be capable of winning to go rogue. It could be mistaken about it’s beliefs, or think that a small probability of success is “worth it”. A few high profile disasters might be more than enough to get the world onboard with banning AGI entirely.
8.Another scenario is that we don’t end up with enough compute power to actually run an AGI, so it doesn’t happen.
I would bet there are plenty more possible scenarios out there.
Thanks.
1. The premise is that we have AGI (i.e. p(doom|AGI)).
2. What do the anti-AI enforcement mechanisms look like? How are they better than the military-grade cybersecurity that we currently have (that is hardly watertight)?
3. How is the AGI (spontaneously?) not dangerous? (Malicious is a loaded word here: ~all the danger comes from it being indifferent. “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”)
4. This seems highly unlikely given the success of GPT-4, and multimodal foundation models (expect “GATO 2” or similar from Google DeepMind in the next few months—a single model able to handle text, images, video and robotics).
5. I don’t see how that follows? Or at least how you get perfect (non-doom level) alignment. So far releases have happened when the alignment is “good enough”. But it is only good enough because the models are still weaker than us.
6. Why do people think this? Where are the major breakthroughs getting us closer to this? There is even a nascent field working on proofs for alignment’s theoretical impossibility.
7. I hope, if there are warning shots, that this happens! Ideally it should be at least Paused pre-emptively, to give alignment research time to catch up.
8. Seems unlikely. We’re probably in a hardware overhang situation already. The limit may well be just having enough money in one place to spend on the compute ($10B?).
No worries! Here are some responses to your responses.
This scenario is not saying AGI won’t exist, just that they won’t be powerful enough to conquer humanity. As a simple counterexample, if you did a mind upload of a flat earther, that would count as AGI, but they would be too mentally flawed to kill us all.
This scenario would be most likely if an AI plan to conquer the world was highly difficult and required amassing large amounts of resources and power. (I suspect this is the case). In this case, such power could be opposed effectively via conventional means, the same way we prevent rogue states such as north korea from dominating the world.
Perhaps I should have used a different word than “alignment” here. I expect it to be potentially dangerous, but not world-ending. I’m working on a post about this, but turning every atom into a single goal is characteristic of a fanatical global maximiser, which does not describe neural networks or any existing intelligence, and is as such ridiculously unlikely. All that is required is that the AI is aligned enough with us to not want to commit global genocide.
Compare the performance of GPT-4 when playing chess to that of stockfish. Stockfish is massively superior to any human being, while GPT still can’t even learn the rules, despite having access to the same computing power and data. Since specialised AI are superior now, it’s quite possible they will continue as such in the future.
I don’t know what the architecture of future AI will look like, so it’s hard to speculate on this point. But misalignment definitely hurts the goals of AI companies. A premature rogue AI that kills lots of people would get companies sued into oblivion.
Again, I want to emphasise that there is a large gulf between “AI shares literally all of our values and never harms anyone” and “AI is aligned enough to not commit global genocide of all humanity”. I find it incredibly hard to believe that the latter is “theoretically impossible”, although I would be unsurprised if the former is.
People are already primed to be distrustful of AI and the people making it, so global regulation of some kind on AI is probably inevitable. The severity will depend on the severity of “warning shots”.
I just wouldn’t be sure on this one. I don’t think current designs are anywhere close to AGI’s, and I don’t think we know enough about the requirements for AGI to state confidently that we are in an overhang.
I wouldn’t say I’m confident in all of these scenarios being likely, but I’m virtually certain that at least one of them is, or another one I haven’t thought of yet. In general, I believe that the capabilities of AGI’s are being drastically overstated, and they will turn out to have flaws like literally every technology ever invented. I also believe that conquering humanity is a ridiculously difficult task, and probably fairly easy to foil.
A mind upload of a flat-Earther might not stay a flat-Earther for long if they are given access to all the world’s sensors (including orbiting cameras), and had 1000s of subjective years to think every month.
North Korean hackers have stolen billions of dollars. Imagine if there were a million times more of them. And that is mere human-level we’re talking about.
How do you get it aligned enough to not want to commit global genocide? Sounds like you’ve solved 99% of the alignment problem if you can do that!
I thought GPT-4 had learned the rules of chess? Pretty impressive for just being trained on text (show’s it has emergent internal world models).
I think you can assume that the architecture is basically “foundation transformer model / LLM” at this point. As Connor Leahy says, they are basically “general cognition engines” and will scale to full AGI in a generation or two (and with the addition of various plugins etc to aid “System 2” type thinking, which are now freely being offered by the AutoGPT enthusiasts and Open AI). We may or may not get such warning shots (look out for Google DeepMind’s next multimodal model I guess..)
I don’t think there’s as much of a gulf as there appears on the face of it. I think you are anthropomorphising to think that it will care about scale of harms in such a way (without being perfectly aligned). See also: Mesa-optimisation leading to value drift. The AI needs to be aligned on not committing global genocide indefinitely.
Hope so! And hope we don’t need any (more) lethal warning shots for it to happen. My worry is that we have very little time to get the regulation in place (hence working to try and speed that up).
I’m not sure, but I’m sure enough to be really concerned! What about the current architecture (“general cognition engine”) + AutoGPT + plugins, isn’t enough? And 100x GPT-4 of compute costs <1% of Google or Microsoft’s market capitalisation.
Even if you don’t think x-risk is likely, if you think a global catastrophe still is, I hope you can get behind calls for regulation.
Okay. But what then stops someone from creating an AGI that does? The game doesn’t end after one turn.
This seems extremely unlikely. In terms of things I can think of, if I were an AGI, I’d just infect every device with malware, and then control the entire flow of information. This could easily be done by reality warping humans (giving them a false feed of information that completely distorts their model of reality) or simply shutting them out—humans can’t coordinate if they can’t communicate on a mass scale. This would be really, really easy. Human society is incredibly fragile. We only don’t realize this because we’ve never had a concerted, competent attempt to actually break it.
This sounds like summoning Godzilla to fight MechaGodzilla: https://www.lesswrong.com/posts/DwqgLXn5qYC7GqExF/godzilla-strategies
The current trend is that generalization is superior, and will probably continue to be so.
With the way current models are designed, this seems extremely unlikely.
This also seems unlikely, given how many have tried, and we’re still nowhere close to solving it.
This is the most plausible of these. But the scale of the tragedy might be extremely high. Breakdown of communication, loss of supply chains, mass starvations, economic collapse, international warfare, etc. Even if it’s not extinction, I’m not sure how many shocks current civilization can endure.
This could maybe slow it, but not for long. I imagine there are far more efficient ways of running an AGI that someone would learn to implement.
Have you heard of this thing called “all of human history before the computer age”? Human coordination and civilization do not require hackable devices to operate. This plan would be exposed in 10 seconds flat as soon as people started talking to each other and comparing notes.
In general, I think that the main issue is a ridiculously overinflated idea of what “AGI” will actually be capable of. When something doesn’t exist yet, it’s easy to imagine it as having no flaws. But that invariably never turns out to be the case.
Yeah, once upon a time, but now our civilization is interconnected and dependent on the computer age. And they wouldn’t even have to realize they needed to coordinate.
How would they do that? They’ve lost control of communication. Expose it to who?