Zach Stein-Perlman comments on The religion problem in AI alignment

Zach Stein-Perlman 23 Sep 2022 0:55 UTC
5 points
1 ∶ 0
One way of reading this comment is that it’s a semantic disagreement about what alignment means. The OP seems to be talking about the problem of getting an AI to do the right thing, writ large, which may encompass a broader set of topics than alignment research as you define it.
Kind of.
Alignment researchers want AI to do the right thing. How they try to do that is mostly not sensitive to what humans want; different researchers do different stuff but it’s generally more like interpretability or robustness than teaching specific values to AI systems. So even if religion was more popular/appreciated/whatever, they’d still be doing stuff like interpretability, and still be doing it in the same way.
(a) and (b) are clearly false, but many believe that most of the making-AI-go-well problem is getting from AI killing everyone to AI not killing everyone and that going from AI not killing everyone to AI doing stuff everyone thinks is great is relatively easy. And value-loading approaches like CEV should be literally optimal regardless of religiosity.
Few alignment researchers are excited about Stuart Russell’s research, I think (at least in the bay area, where the alignment researchers I know are). I agree that if his style of research was more popular, thinking about values and metavalues and such would be more relevant.
- Geoffrey Miller 23 Sep 2022 16:46 UTC
  6 points
  2 ∶ 1
  Parent
  Zach—I may be an AI alignment newbie, but I don’t understand how ‘alignment’ could be ‘mostly not sensitive to what humans want’. I thought alignment with what humans want was the whole point of alignment. But now you’re making it sound like ‘AI alignment’ means ’alignment with what Bay Area AI researchers think should be everyone’s secular priorities.
  Even CEV seems to depend on an assumption that there is a high degree of common ground among all humans regarding core existential values—Yudkowsky explicitly says that CEV could only works ‘to whatever extent most existing humans, thus extrapolated, would predictably want* the same things’. If some humans are antinatalists, or Earth First eco-activisits, or religious fundamentalists yearning for the Rapture, or bitter nihilists, who want us to go extinct, then CEV won’t work to prevent AI from killing everyone. CEV and most ‘alignment’ methods only seem to work if they sweep the true religious, political, and ideological diversity of humans under the rug.
  I also see no a priori reason why getting from (1) AI killing every one to AI not killing everyone would be easier than getting from (2) AI not killing eveyone to AI doing stuff everyone thinks is great. The first issue (1) seems to require explicitly prioritizing some human corporeal/body interests over the brain’s stated preferences, as I discussed here .