Good question! I think the term you are looking for is “deceptive alignment”.
As you allude to, this might be okay, until the AI’s objectives are no longer maximized by continuing to be deceptive.
Good question! I think the term you are looking for is “deceptive alignment”.
As you allude to, this might be okay, until the AI’s objectives are no longer maximized by continuing to be deceptive.