Zach—I may be an AI alignment newbie, but I don’t understand how ‘alignment’ could be ‘mostly not sensitive to what humans want’. I thought alignment with what humans want was the whole point of alignment. But now you’re making it sound like ‘AI alignment’ means ’alignment with what Bay Area AI researchers think should be everyone’s secular priorities.
Even CEV seems to depend on an assumption that there is a high degree of common ground among all humans regarding core existential values—Yudkowsky explicitly says that CEV could only works ‘to whatever extent most existing humans, thus extrapolated, would predictably want* the same things’. If some humans are antinatalists, or Earth First eco-activisits, or religious fundamentalists yearning for the Rapture, or bitter nihilists, who want us to go extinct, then CEV won’t work to prevent AI from killing everyone. CEV and most ‘alignment’ methods only seem to work if they sweep the true religious, political, and ideological diversity of humans under the rug.
I also see no a priori reason why getting from (1) AI killing every one to AI not killing everyone would be easier than getting from (2) AI not killing eveyone to AI doing stuff everyone thinks is great. The first issue (1) seems to require explicitly prioritizing some human corporeal/body interests over the brain’s stated preferences, as I discussed here .
Zach—I may be an AI alignment newbie, but I don’t understand how ‘alignment’ could be ‘mostly not sensitive to what humans want’. I thought alignment with what humans want was the whole point of alignment. But now you’re making it sound like ‘AI alignment’ means ’alignment with what Bay Area AI researchers think should be everyone’s secular priorities.
Even CEV seems to depend on an assumption that there is a high degree of common ground among all humans regarding core existential values—Yudkowsky explicitly says that CEV could only works ‘to whatever extent most existing humans, thus extrapolated, would predictably want* the same things’. If some humans are antinatalists, or Earth First eco-activisits, or religious fundamentalists yearning for the Rapture, or bitter nihilists, who want us to go extinct, then CEV won’t work to prevent AI from killing everyone. CEV and most ‘alignment’ methods only seem to work if they sweep the true religious, political, and ideological diversity of humans under the rug.
I also see no a priori reason why getting from (1) AI killing every one to AI not killing everyone would be easier than getting from (2) AI not killing eveyone to AI doing stuff everyone thinks is great. The first issue (1) seems to require explicitly prioritizing some human corporeal/body interests over the brain’s stated preferences, as I discussed here .