Interesting. Hmm, wouldn‘t an AI then be incentivized even more to keep its mouth shut about a ton of its internal states/processes if it gets punished for saying things that are not among the tiny fraction of legit true statements? My initial reaction is that AI designs that will not also have proper beliefs and communicate those beliefs and have all their beliefs monitored are less useful and less realistic…
And a minor point, I had the impression that you used the term „lies“ when it’s not fully clear that the AI even is able to have the property of what you call honesty. E.g. GPT-3, while powerful, seems to me not to be able to be honest because from its perspective it doesn‘t communicate with somebody, right? In diagram 1 this was especially apparent to me.
If this looks like an issue, one could distinguish speech acts (which are supposed to meet certain standards) from the outputs of various transparency tools (which hopefully meet some standards of accuracy, but might be based on different standards).
Interesting. Hmm, wouldn‘t an AI then be incentivized even more to keep its mouth shut about a ton of its internal states/processes if it gets punished for saying things that are not among the tiny fraction of legit true statements? My initial reaction is that AI designs that will not also have proper beliefs and communicate those beliefs and have all their beliefs monitored are less useful and less realistic…
And a minor point, I had the impression that you used the term „lies“ when it’s not fully clear that the AI even is able to have the property of what you call honesty. E.g. GPT-3, while powerful, seems to me not to be able to be honest because from its perspective it doesn‘t communicate with somebody, right? In diagram 1 this was especially apparent to me.
If this looks like an issue, one could distinguish speech acts (which are supposed to meet certain standards) from the outputs of various transparency tools (which hopefully meet some standards of accuracy, but might be based on different standards).