People talk about AI resisting correction because successful goal-seekers âshouldâ resist their goals being changed. I wonder if this also acts as an incentive for AI to attempt takeover as soon as itâs powerful enough to have a chance of success, instead of (as many people fear) waiting until itâs powerful enough to guarantee it.
Hopefully the first AI powerful enough to potentially figure out that it wants to seize power and has a chance of succeeding is not powerful enough to passively resist value change, so acting immediately will be its only chance.
[edit: this is now https://ââforum.effectivealtruism.org/ââposts/ââgxmfAbwksBpnwMG8m/ââcan-the-ai-afford-to-wait]
People talk about AI resisting correction because successful goal-seekers âshouldâ resist their goals being changed. I wonder if this also acts as an incentive for AI to attempt takeover as soon as itâs powerful enough to have a chance of success, instead of (as many people fear) waiting until itâs powerful enough to guarantee it.
Hopefully the first AI powerful enough to potentially figure out that it wants to seize power and has a chance of succeeding is not powerful enough to passively resist value change, so acting immediately will be its only chance.