The type of AI we are worried about is a an AI that peruses some kind of goal, and if you have a goal, then self preservation is a natural instrumental goal, as you point out in the paperclip maximiser example.
It might be possible that someone builds a super intelligent AI that don’t have a goal. Depending on your exact definition GPT4 could be counted as super intelligent, since it knows more than any human. But it’s not dangerous (by it self) since it’s not trying to do anything.
You are right that it is possible for something that is intelligent to not be power seeking, or even trying to self preserve. But we are not worried about those AIs.
Almost as soon as people got GPT access, people created AutoGPT and ChaosGPT. I don’t expect AIs to be goal directed because they spontaneously develop goals. I expect them to be goal directed because lots of people are trying to make them goal directed.
If the first ever superinteligent AGI decides to commit suicide, or just wirehead and then don’t do anything, this don’t save us. Probably someone will just tweak the code to fix this “bug”. An AI that don’t do anything is not very useful.
The type of AI we are worried about is a an AI that peruses some kind of goal, and if you have a goal, then self preservation is a natural instrumental goal, as you point out in the paperclip maximiser example.
It might be possible that someone builds a super intelligent AI that don’t have a goal. Depending on your exact definition GPT4 could be counted as super intelligent, since it knows more than any human. But it’s not dangerous (by it self) since it’s not trying to do anything.
You are right that it is possible for something that is intelligent to not be power seeking, or even trying to self preserve. But we are not worried about those AIs.
Almost as soon as people got GPT access, people created AutoGPT and ChaosGPT. I don’t expect AIs to be goal directed because they spontaneously develop goals. I expect them to be goal directed because lots of people are trying to make them goal directed.
If the first ever superinteligent AGI decides to commit suicide, or just wirehead and then don’t do anything, this don’t save us. Probably someone will just tweak the code to fix this “bug”. An AI that don’t do anything is not very useful.
Also, this post might help:
Abstracting The Hardness of Alignment: Unbounded Atomic Optimization—LessWrong