Not OP but I would say that if we end up with an ASI that can misunderstand values in that kind of way, then it will almost certainly wipe out humanity anyway.
That is the same category of mistake as “please maximize the profit of this paperclip factory” getting interpreted as “convert all available matter into paperclip machines”.
Not OP but I would say that if we end up with an ASI that can misunderstand values in that kind of way, then it will almost certainly wipe out humanity anyway.
That is the same category of mistake as “please maximize the profit of this paperclip factory” getting interpreted as “convert all available matter into paperclip machines”.
Yes, my example and the paperclip one both seem like a classic case of outer misalignment / reward misspecification.