Superhuman agents ruthlessly optimize for a reward at the expense of anything else we might care about. The more capable the agent and the more ruthless the optimizer, the more extreme the results.
To the extent this is an empirical claim about superhuman agents we are likely to build and not merely a definition, it needs to be argued for, not merely assumed. “Ruthless” optimization could indeed be bad for us, but current AIs don’t seem well-described as ruthless optimizers.
Instead, LLMs appear corrigible more-or-less by default, and there don’t appear to be strong incentives to purposely make AIs that are ruthless agents if doing so predictably harmed us.
(There’s a more plausible argument that we have strong incentives to build non-ruthless agents, but these agents, by virtue of not being ruthless, seem much less risky.)
To the extent superhuman agents are simply ruthless by definition, I’d argue that this statement is largely irrelevant, since we don’t seem likely to want to build ruthless agents that would predictably harm us. It’s possible such agents could come about by accident, but again, this premise needs to be argued for, not merely assumed.
To the extent this is an empirical claim about superhuman agents we are likely to build and not merely a definition, it needs to be argued for, not merely assumed. “Ruthless” optimization could indeed be bad for us, but current AIs don’t seem well-described as ruthless optimizers.
Instead, LLMs appear corrigible more-or-less by default, and there don’t appear to be strong incentives to purposely make AIs that are ruthless agents if doing so predictably harmed us.
(There’s a more plausible argument that we have strong incentives to build non-ruthless agents, but these agents, by virtue of not being ruthless, seem much less risky.)
To the extent superhuman agents are simply ruthless by definition, I’d argue that this statement is largely irrelevant, since we don’t seem likely to want to build ruthless agents that would predictably harm us. It’s possible such agents could come about by accident, but again, this premise needs to be argued for, not merely assumed.