Interpretability Will Not Reliably Find Deceptive AI

NIL
Crossposted from LessWrong (327 points, 68 comments)
No comments.