The belief is that as soon as we create an AI with at least human-level general intelligence, it will be relatively easy to use it’s superior reasoning, extensive knowledge, and superhuman thinking speed to take over the world.
This depends on what “human-level” means. There is some threshold such that an AI past that threshold could quickly take over the world, and it doesn’t really matter whether we call that “human-level” or not.
overall it seems like “make AI stupid” is a far easier task than “make the AI’s goals perfectly aligned”.
Sure. But the relevant task isn’t make something that won’t kill you. It’s more like make something that will stop any AI from killing you, or maybe find a way to do alignment without much cost and without sacrificing much usefulness. If you and I make stupid AI, great, but some lab will realize that non-stupid AI could be more useful, and will make it by default.
This is very true. However, the OP’s point still helps us, as AI that is simultaneously smart enough to be useful in a narrow domain, misaligned, but also too stupid to take over the world could help us reduce xrisk. In particular, if it is superhumanly good at alignment research, then it could output good alignment research as part of its deception phase. This would help reduce the risk from future AI’s significantly without causing xrisk as, ex hypothesi, the AI is too stupid to take over. The main question here is whether an AI could be smart enough to do very good alignment research and also too stupid to take over the world if it tried. I am skeptical but pretty uncertain, so I would give it at least a 10% chance of being true, and maybe higher.
This depends on what “human-level” means. There is some threshold such that an AI past that threshold could quickly take over the world, and it doesn’t really matter whether we call that “human-level” or not.
Indeed, this post is not an attempt to argue that AGI could never be a threat, merely that the “threshold for subjugation” is much higher than “any AGI”, as many people imply. Human-level is just a marker for a level of intelligence that most people will agree counts as AGI, but (due to mental flaws) is most likely not capable of world domination. For example, I do not believe an AI brain upload of bobby fischer could take over the world.
This makes a difference, because it means that the world in which the actual x-risk AGI comes into being is one in which a lot of earlier, non-deadly AGI already exist and can be studied, or used against the rogue.
Sure. But the relevant task isn’t make something that won’t kill you. It’s more like make something that will stop any AI from killing you, or maybe find a way to do alignment without much cost and without sacrificing much usefulness. If you and I make stupid AI, great, but some lab will realize that non-stupid AI could be more useful, and will make it by default.
Current narrow machine learning AI is extraordinarily stupid at things it isn’t trained for, and yet it still is massively funded and incredibly powerful. Nobody is hankering to put a detailed understanding of quantum mechanics into Dall-E. A “stupidity about world domination” module, focused on a few key dangerous areas like biochemistry, could potentially be implemented into most AI’s without affecting performance at all. Wouldn’t solve the problem entirely, but it would help mitigate risk.
Alternatively, if you want to “make something that will stop AI from killing us” (presumably an AGI), you need to make sure that it can’t kill us instead, and that could also be helped by deliberate flaws and ignorance. So make it an idiot savant at terminating AI’s, but not at other things.
This depends on what “human-level” means. There is some threshold such that an AI past that threshold could quickly take over the world, and it doesn’t really matter whether we call that “human-level” or not.
Sure. But the relevant task isn’t make something that won’t kill you. It’s more like make something that will stop any AI from killing you, or maybe find a way to do alignment without much cost and without sacrificing much usefulness. If you and I make stupid AI, great, but some lab will realize that non-stupid AI could be more useful, and will make it by default.
This is very true. However, the OP’s point still helps us, as AI that is simultaneously smart enough to be useful in a narrow domain, misaligned, but also too stupid to take over the world could help us reduce xrisk. In particular, if it is superhumanly good at alignment research, then it could output good alignment research as part of its deception phase. This would help reduce the risk from future AI’s significantly without causing xrisk as, ex hypothesi, the AI is too stupid to take over. The main question here is whether an AI could be smart enough to do very good alignment research and also too stupid to take over the world if it tried. I am skeptical but pretty uncertain, so I would give it at least a 10% chance of being true, and maybe higher.
Indeed, this post is not an attempt to argue that AGI could never be a threat, merely that the “threshold for subjugation” is much higher than “any AGI”, as many people imply. Human-level is just a marker for a level of intelligence that most people will agree counts as AGI, but (due to mental flaws) is most likely not capable of world domination. For example, I do not believe an AI brain upload of bobby fischer could take over the world.
This makes a difference, because it means that the world in which the actual x-risk AGI comes into being is one in which a lot of earlier, non-deadly AGI already exist and can be studied, or used against the rogue.
Current narrow machine learning AI is extraordinarily stupid at things it isn’t trained for, and yet it still is massively funded and incredibly powerful. Nobody is hankering to put a detailed understanding of quantum mechanics into Dall-E. A “stupidity about world domination” module, focused on a few key dangerous areas like biochemistry, could potentially be implemented into most AI’s without affecting performance at all. Wouldn’t solve the problem entirely, but it would help mitigate risk.
Alternatively, if you want to “make something that will stop AI from killing us” (presumably an AGI), you need to make sure that it can’t kill us instead, and that could also be helped by deliberate flaws and ignorance. So make it an idiot savant at terminating AI’s, but not at other things.