kokotajlod comments on Truthful AI

kokotajlod Oct 21, 2021, 3:07 PM
9 points
0 ∶ 0
One way in which this paper (or the things policymakers and CEOs might do if they read it & like it) might be net-negative:
Maybe by default AIs will mostly be trained to say whatever maximizes engagement/clicks/etc., and so they’ll say all sorts of stuff and people will quickly learn that a lot of it is bullshit and only fools will place their trust in AI. In the long run, AIs will learn to deceive us, or actually come to believe their own bullshit. But at least we won’t trust them.

But if people listen to this paper they might build all sorts of prestigious Ministries of Truth that work hard to train AIs to be truthful, where “truthful” in practice means Sticks to the Party Line. And so the same thing happens—AIs learn to deceive us (because there will be cases where the Party Line just isn’t true, and obviously so) or else actually come to believe their own bullshit (which would arguably be worse? Hard to say.) But it happens faster, because Ministries of Truth are accelerating the process. Also, and more importantly, more humans will trust the AIs more, because they’ll be saying all the right things and they’ll be certified by the right Ministries.

(Crossposted from LW)
What links here?
- Daniel Kokotajlo's comment on Truthful AI: Developing and governing AI that does not lie by Owain_Evans (LessWrong; Oct 21, 2021, 3:08 PM; 8 points)