TW123 comments on What does it mean for an AGI to be ‘safe’?

TW123 8 Oct 2022 4:53 UTC
9 points
2 ∶ 1
Unfortunately, people (and this includes AI researchers) tend to hear what they want to hear, and not what they don’t want to hear. What to call this field is extremely dependent on the nature of those misinterpretations. And the biggest misinterpretation right now does not appear to be “oh so I guess we need to build impotent systems because they’ll be safe”.

“Alignment” is already broken, in my view. You allude to this, but I want to underscore it. Instruct GPT was billed as “alignment”. Maybe it is, but it doesn’t seem to do any good for reducing x risk.

“Safety”, too, lends itself to misinterpretation. Sometimes of the form “ok, so let’s make the self-driving cars not crash”. So you’re not starting from an ideal place. But at least you’re starting from a place of AI systems behaving badly in ways you didn’t intend and causing harm. From there, it’s easier to explain existential safety as simply an extreme safety hazard, and one that’s not even unlikely.

If you tell people “produce long term near optimal outcomes” and they are EAs or rationalists, they probably understand what you mean. If they are random AI researchers, this is so vague as to be completely meaningless. They will fill it in with whatever they want. The ones who think this means full steam ahead toward techno utopia will think that. The ones who think this means making AI systems not misclassify images in racist ways will think that. The ones who think it means making AI systems output fake explanations for their reasoning will think that.

Everyone wants to make AI produce good outcomes. And you do not need to convince the vast majority of researchers to work on AI capabilities. They just do it anyway. Many of them don’t even do it for ideological reasons, they do it because it’s cool!

The differential thing we need to be pushing on is AI not creating an existential catastrophe. In public messaging (and what is a name except public messaging?) we do not need to distract with other considerations at this present moment. And right now, I don’t think we have a better term than safety that points in that direction.