To me it seems extraordinarily unlikely that any agent capable of performing all these tasks with a high degree of proficiency would simultaneously stand firm in its conviction that the only goal it had reasons to pursue was tilling the universe with paperclips.
Seems a little anthropomorphic. A possibly less anthropomorphic argument: If we possess the algorithms required to construct an agent that’s capable of achieving decisive strategic advantage, we can also apply those algorithms to pondering moral dilemmas etc. and use those algorithms to construct the agent’s value function.
Seems a little anthropomorphic. A possibly less anthropomorphic argument: If we possess the algorithms required to construct an agent that’s capable of achieving decisive strategic advantage, we can also apply those algorithms to pondering moral dilemmas etc. and use those algorithms to construct the agent’s value function.