1. The premise is that we have AGI (i.e. p(doom|AGI)).
2. What do the anti-AI enforcement mechanisms look like? How are they better than the military-grade cybersecurity that we currently have (that is hardly watertight)?
3. How is the AGI (spontaneously?) not dangerous? (Malicious is a loaded word here: ~all the danger comes from it being indifferent. “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”)
4. This seems highly unlikely given the success of GPT-4, and multimodal foundation models (expect “GATO 2” or similar from Google DeepMind in the next few months—a single model able to handle text, images, video and robotics).
5. I don’t see how that follows? Or at least how you get perfect (non-doom level) alignment. So far releases have happened when the alignment is “good enough”. But it is only good enough because the models are still weaker than us.
6. Why do people think this? Where are the major breakthroughs getting us closer to this? There is even a nascent field working on proofs for alignment’s theoretical impossibility.
7. I hope, if there are warning shots, that this happens! Ideally it should be at least Paused pre-emptively, to give alignment research time to catch up.
8. Seems unlikely. We’re probably in a hardware overhang situation already. The limit may well be just having enough money in one place to spend on the compute ($10B?).
No worries! Here are some responses to your responses.
This scenario is not saying AGI won’t exist, just that they won’t be powerful enough to conquer humanity. As a simple counterexample, if you did a mind upload of a flat earther, that would count as AGI, but they would be too mentally flawed to kill us all.
This scenario would be most likely if an AI plan to conquer the world was highly difficult and required amassing large amounts of resources and power. (I suspect this is the case). In this case, such power could be opposed effectively via conventional means, the same way we prevent rogue states such as north korea from dominating the world.
Perhaps I should have used a different word than “alignment” here. I expect it to be potentially dangerous, but not world-ending. I’m working on a post about this, but turning every atom into a single goal is characteristic of a fanatical global maximiser, which does not describe neural networks or any existing intelligence, and is as such ridiculously unlikely. All that is required is that the AI is aligned enough with us to not want to commit global genocide.
Compare the performance of GPT-4 when playing chess to that of stockfish. Stockfish is massively superior to any human being, while GPT still can’t even learn the rules, despite having access to the same computing power and data. Since specialised AI are superior now, it’s quite possible they will continue as such in the future.
I don’t know what the architecture of future AI will look like, so it’s hard to speculate on this point. But misalignment definitely hurts the goals of AI companies. A premature rogue AI that kills lots of people would get companies sued into oblivion.
Again, I want to emphasise that there is a large gulf between “AI shares literally all of our values and never harms anyone” and “AI is aligned enough to not commit global genocide of all humanity”. I find it incredibly hard to believe that the latter is “theoretically impossible”, although I would be unsurprised if the former is.
People are already primed to be distrustful of AI and the people making it, so global regulation of some kind on AI is probably inevitable. The severity will depend on the severity of “warning shots”.
I just wouldn’t be sure on this one. I don’t think current designs are anywhere close to AGI’s, and I don’t think we know enough about the requirements for AGI to state confidently that we are in an overhang.
I wouldn’t say I’m confident in all of these scenarios being likely, but I’m virtually certain that at least one of them is, or another one I haven’t thought of yet. In general, I believe that the capabilities of AGI’s are being drastically overstated, and they will turn out to have flaws like literally every technology ever invented. I also believe that conquering humanity is a ridiculously difficult task, and probably fairly easy to foil.
A mind upload of a flat-Earther might not stay a flat-Earther for long if they are given access to all the world’s sensors (including orbiting cameras), and had 1000s of subjective years to think every month.
North Korean hackers have stolen billions of dollars. Imagine if there were a million times more of them. And that is mere human-level we’re talking about.
How do you get it aligned enough to not want to commit global genocide? Sounds like you’ve solved 99% of the alignment problem if you can do that!
I thought GPT-4 had learned the rules of chess? Pretty impressive for just being trained on text (show’s it has emergent internal world models).
I think you can assume that the architecture is basically “foundation transformer model / LLM” at this point. As Connor Leahy says, they are basically “general cognition engines” and will scale to full AGI in a generation or two (and with the addition of various plugins etc to aid “System 2” type thinking, which are now freely being offered by the AutoGPT enthusiasts and Open AI). We may or may not get such warning shots (look out for Google DeepMind’s next multimodal model I guess..)
I don’t think there’s as much of a gulf as there appears on the face of it. I think you are anthropomorphising to think that it will care about scale of harms in such a way (without being perfectly aligned). See also: Mesa-optimisation leading to value drift. The AI needs to be aligned on not committing global genocide indefinitely.
Hope so! And hope we don’t need any (more) lethal warning shots for it to happen. My worry is that we have very little time to get the regulation in place (hence working to try and speed that up).
I’m not sure, but I’m sure enough to be really concerned! What about the current architecture (“general cognition engine”) + AutoGPT + plugins, isn’t enough? And 100x GPT-4 of compute costs <1% of Google or Microsoft’s market capitalisation.
Even if you don’t think x-risk is likely, if you think a global catastrophe still is, I hope you can get behind calls for regulation.
Thanks.
1. The premise is that we have AGI (i.e. p(doom|AGI)).
2. What do the anti-AI enforcement mechanisms look like? How are they better than the military-grade cybersecurity that we currently have (that is hardly watertight)?
3. How is the AGI (spontaneously?) not dangerous? (Malicious is a loaded word here: ~all the danger comes from it being indifferent. “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”)
4. This seems highly unlikely given the success of GPT-4, and multimodal foundation models (expect “GATO 2” or similar from Google DeepMind in the next few months—a single model able to handle text, images, video and robotics).
5. I don’t see how that follows? Or at least how you get perfect (non-doom level) alignment. So far releases have happened when the alignment is “good enough”. But it is only good enough because the models are still weaker than us.
6. Why do people think this? Where are the major breakthroughs getting us closer to this? There is even a nascent field working on proofs for alignment’s theoretical impossibility.
7. I hope, if there are warning shots, that this happens! Ideally it should be at least Paused pre-emptively, to give alignment research time to catch up.
8. Seems unlikely. We’re probably in a hardware overhang situation already. The limit may well be just having enough money in one place to spend on the compute ($10B?).
No worries! Here are some responses to your responses.
This scenario is not saying AGI won’t exist, just that they won’t be powerful enough to conquer humanity. As a simple counterexample, if you did a mind upload of a flat earther, that would count as AGI, but they would be too mentally flawed to kill us all.
This scenario would be most likely if an AI plan to conquer the world was highly difficult and required amassing large amounts of resources and power. (I suspect this is the case). In this case, such power could be opposed effectively via conventional means, the same way we prevent rogue states such as north korea from dominating the world.
Perhaps I should have used a different word than “alignment” here. I expect it to be potentially dangerous, but not world-ending. I’m working on a post about this, but turning every atom into a single goal is characteristic of a fanatical global maximiser, which does not describe neural networks or any existing intelligence, and is as such ridiculously unlikely. All that is required is that the AI is aligned enough with us to not want to commit global genocide.
Compare the performance of GPT-4 when playing chess to that of stockfish. Stockfish is massively superior to any human being, while GPT still can’t even learn the rules, despite having access to the same computing power and data. Since specialised AI are superior now, it’s quite possible they will continue as such in the future.
I don’t know what the architecture of future AI will look like, so it’s hard to speculate on this point. But misalignment definitely hurts the goals of AI companies. A premature rogue AI that kills lots of people would get companies sued into oblivion.
Again, I want to emphasise that there is a large gulf between “AI shares literally all of our values and never harms anyone” and “AI is aligned enough to not commit global genocide of all humanity”. I find it incredibly hard to believe that the latter is “theoretically impossible”, although I would be unsurprised if the former is.
People are already primed to be distrustful of AI and the people making it, so global regulation of some kind on AI is probably inevitable. The severity will depend on the severity of “warning shots”.
I just wouldn’t be sure on this one. I don’t think current designs are anywhere close to AGI’s, and I don’t think we know enough about the requirements for AGI to state confidently that we are in an overhang.
I wouldn’t say I’m confident in all of these scenarios being likely, but I’m virtually certain that at least one of them is, or another one I haven’t thought of yet. In general, I believe that the capabilities of AGI’s are being drastically overstated, and they will turn out to have flaws like literally every technology ever invented. I also believe that conquering humanity is a ridiculously difficult task, and probably fairly easy to foil.
A mind upload of a flat-Earther might not stay a flat-Earther for long if they are given access to all the world’s sensors (including orbiting cameras), and had 1000s of subjective years to think every month.
North Korean hackers have stolen billions of dollars. Imagine if there were a million times more of them. And that is mere human-level we’re talking about.
How do you get it aligned enough to not want to commit global genocide? Sounds like you’ve solved 99% of the alignment problem if you can do that!
I thought GPT-4 had learned the rules of chess? Pretty impressive for just being trained on text (show’s it has emergent internal world models).
I think you can assume that the architecture is basically “foundation transformer model / LLM” at this point. As Connor Leahy says, they are basically “general cognition engines” and will scale to full AGI in a generation or two (and with the addition of various plugins etc to aid “System 2” type thinking, which are now freely being offered by the AutoGPT enthusiasts and Open AI). We may or may not get such warning shots (look out for Google DeepMind’s next multimodal model I guess..)
I don’t think there’s as much of a gulf as there appears on the face of it. I think you are anthropomorphising to think that it will care about scale of harms in such a way (without being perfectly aligned). See also: Mesa-optimisation leading to value drift. The AI needs to be aligned on not committing global genocide indefinitely.
Hope so! And hope we don’t need any (more) lethal warning shots for it to happen. My worry is that we have very little time to get the regulation in place (hence working to try and speed that up).
I’m not sure, but I’m sure enough to be really concerned! What about the current architecture (“general cognition engine”) + AutoGPT + plugins, isn’t enough? And 100x GPT-4 of compute costs <1% of Google or Microsoft’s market capitalisation.
Even if you don’t think x-risk is likely, if you think a global catastrophe still is, I hope you can get behind calls for regulation.