I understand that’s how ‘alignment’ is normally defined in AI safety research.
But it seems like such a narrow notion of alignment that it glosses over almost all of the really hard problems in real AI safety—which concern the very real conflicts between the humans who will be using AI.
For example, if the AI is aligned ‘to the people who are allowed to provide feedback’ (eg the feedback to a CIRL system), that raises the question of who is actually going to be allowed to provide feedback. For most real-world applications, deciding that issue is tantamount to deciding which humans will be in control of that real-world domain—and it may leave the AI looking very ‘unaligned’ to all the other humans involved.
But it seems like such a narrow notion of alignment that it glosses over almost all of the really hard problems in real AI safety—which concern the very real conflicts between the humans who will be using AI.
I very much agree these these political questions matter, and that alignment to multiple humans is conceptually pretty shaky; thanks for bringing up these issues. Still, I think some important context is that many AI safety researchers think that it’s a hard, unsolved problem to just keep future powerful AI systems from causing many deaths (or doing other unambiguously terrible things). They’re often worried that CIRL and every other approach that’s been proposed will completely fail. From that perspective, it no longer looks like almost all of the really hard problems are about conflicts between humans.
(On CIRL, here’s a thread and a longer writeup on why some think that “it almost entirely fails to address the core problems” of AI safety.)
I agree, that seems concerning. Ultimately, since the AI developers are designing the AIs, I would guess that they would try to align the AI to be helpful to the users/consumers or to the concerns of the company/government, if they succeed at aligning the AI at all. As for your suggestions “Alignment with whoever bought the AI? Whoever users it most often? Whoever might be most positively or negatively affected by its behavior? Whoever the AI’s company’s legal team says would impose the highest litigation risk?” – these all seem plausible to me.
Hi mic,
I understand that’s how ‘alignment’ is normally defined in AI safety research.
But it seems like such a narrow notion of alignment that it glosses over almost all of the really hard problems in real AI safety—which concern the very real conflicts between the humans who will be using AI.
For example, if the AI is aligned ‘to the people who are allowed to provide feedback’ (eg the feedback to a CIRL system), that raises the question of who is actually going to be allowed to provide feedback. For most real-world applications, deciding that issue is tantamount to deciding which humans will be in control of that real-world domain—and it may leave the AI looking very ‘unaligned’ to all the other humans involved.
I very much agree these these political questions matter, and that alignment to multiple humans is conceptually pretty shaky; thanks for bringing up these issues. Still, I think some important context is that many AI safety researchers think that it’s a hard, unsolved problem to just keep future powerful AI systems from causing many deaths (or doing other unambiguously terrible things). They’re often worried that CIRL and every other approach that’s been proposed will completely fail. From that perspective, it no longer looks like almost all of the really hard problems are about conflicts between humans.
(On CIRL, here’s a thread and a longer writeup on why some think that “it almost entirely fails to address the core problems” of AI safety.)
I agree, that seems concerning. Ultimately, since the AI developers are designing the AIs, I would guess that they would try to align the AI to be helpful to the users/consumers or to the concerns of the company/government, if they succeed at aligning the AI at all. As for your suggestions “Alignment with whoever bought the AI? Whoever users it most often? Whoever might be most positively or negatively affected by its behavior? Whoever the AI’s company’s legal team says would impose the highest litigation risk?” – these all seem plausible to me.
On the separate question of handling conflicting interests: there’s some work on this (e.g., “Aligning with Heterogeneous Preferences for Kidney Exchange” and “Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning”), though perhaps not as much as we would like.