Perhaps its best strategy would be to play nice for the time being so that humans would voluntarily give it more compute and control over the world.
This is essentially the thesis of the Deceptive Alignment section of Hubinger et al’s Risks from Learned Optimization paper, and related work on inner alignment.
Hm, if an agent is consequentialist, then it will have convergent instrumental subgoals. But what if the agent isn’t consequentialist to begin with? For example, if we imagine that GPT-7 is human-level AGI, this AGI might have human-type common sense. If you asked it to get you coffee, it might try to do so in a somewhat common-sense way, without scheming about how to take over the world in the process, because humans usually don’t scheme about taking over the world or preserving their utility functions at all costs? But I don’t know if that’s right; I wonder what AI-safety experts think.
This is essentially the thesis of the Deceptive Alignment section of Hubinger et al’s Risks from Learned Optimization paper, and related work on inner alignment.
You may be interested to read more about myopic training https://www.alignmentforum.org/posts/GqxuDtZvfgL2bEQ5v/arguments-against-myopic-training