I don’t think the strategy-stealing assumption holds here: it’s pretty unlikely that we’ll build a fully aligned ‘sovereign’ AGI even if we solve alignment; it seems easier to make something corrigible / limited instead, ie something that is by design less powerful than would be possible if we were just pushing capabilities.
I don’t mean to imply that we’ll build a sovereign AI (I doubt it too).
Corrigible is more what I meant. Corrigible but not necessarily limited. Ie minimally intent aligned AIs which won’t kill you but by the strategy stealing assumption can still compete with unaligned AIs.
I don’t think the strategy-stealing assumption holds here: it’s pretty unlikely that we’ll build a fully aligned ‘sovereign’ AGI even if we solve alignment; it seems easier to make something corrigible / limited instead, ie something that is by design less powerful than would be possible if we were just pushing capabilities.
I don’t mean to imply that we’ll build a sovereign AI (I doubt it too).
Corrigible is more what I meant. Corrigible but not necessarily limited. Ie minimally intent aligned AIs which won’t kill you but by the strategy stealing assumption can still compete with unaligned AIs.