But it seems like such a narrow notion of alignment that it glosses over almost all of the really hard problems in real AI safety—which concern the very real conflicts between the humans who will be using AI.
I very much agree these these political questions matter, and that alignment to multiple humans is conceptually pretty shaky; thanks for bringing up these issues. Still, I think some important context is that many AI safety researchers think that it’s a hard, unsolved problem to just keep future powerful AI systems from causing many deaths (or doing other unambiguously terrible things). They’re often worried that CIRL and every other approach that’s been proposed will completely fail. From that perspective, it no longer looks like almost all of the really hard problems are about conflicts between humans.
(On CIRL, here’s a thread and a longer writeup on why some think that “it almost entirely fails to address the core problems” of AI safety.)
I very much agree these these political questions matter, and that alignment to multiple humans is conceptually pretty shaky; thanks for bringing up these issues. Still, I think some important context is that many AI safety researchers think that it’s a hard, unsolved problem to just keep future powerful AI systems from causing many deaths (or doing other unambiguously terrible things). They’re often worried that CIRL and every other approach that’s been proposed will completely fail. From that perspective, it no longer looks like almost all of the really hard problems are about conflicts between humans.
(On CIRL, here’s a thread and a longer writeup on why some think that “it almost entirely fails to address the core problems” of AI safety.)