My perspective, I think, is that most of the difficulties that people think of as being the extra, hard part of to one->many alignment, are already present in one->one alignment. A single human is already a barely coherent mess of conflicting wants and goals interacting chaotically, and the strong form of “being aligned to one human” requires a solution that can resolve values conflicts between incompatible ‘parts’ of that human and find outcomes that are satisfactory to all interests. Expanding this to more than one person is a change of degree but not kind.
There is a weaker form of “being aligned to one human” that’s just like “don’t kill that human and follow their commands in more or less the way they intend”, and if that’s all we can get then that only translates to “don’t drive humanity extinct and follow the wishes of at least some subset of people”, and I’d consider that a dramatically suboptimal outcome. At this point I’d take it though.
Hi Robert, thanks for your perspective on this. I love your YouTube videos by the way—very informative and clear, and helpful for AI alignment newbies like me.
My main concern is that we still have massive uncertainty about what proportion of ‘alignment with all humans’ can be solved by ‘alignment with one human’. It sounds like your bet is that it’s somewhere above 50% (maybe?? I’m just guessing); whereas my bet is that it’s under 20% -- i.e. I think that aligning with one human leaves most of the hard problems, and the X risk, unsolved.
And part of my skepticism in that regard is that a great many humans—perhaps most of the 8 billion on Earth—would be happy to use AI to inflict harm, up to and including death and genocide, on certain other individuals and groups of humans. So, AI that’s aligned with frequently homicidal/genocidal individual humans would be AI that’s deeply anti-aligned with other individuals and groups.
My perspective, I think, is that most of the difficulties that people think of as being the extra, hard part of to one->many alignment, are already present in one->one alignment. A single human is already a barely coherent mess of conflicting wants and goals interacting chaotically, and the strong form of “being aligned to one human” requires a solution that can resolve values conflicts between incompatible ‘parts’ of that human and find outcomes that are satisfactory to all interests. Expanding this to more than one person is a change of degree but not kind.
There is a weaker form of “being aligned to one human” that’s just like “don’t kill that human and follow their commands in more or less the way they intend”, and if that’s all we can get then that only translates to “don’t drive humanity extinct and follow the wishes of at least some subset of people”, and I’d consider that a dramatically suboptimal outcome. At this point I’d take it though.
Hi Robert, thanks for your perspective on this. I love your YouTube videos by the way—very informative and clear, and helpful for AI alignment newbies like me.
My main concern is that we still have massive uncertainty about what proportion of ‘alignment with all humans’ can be solved by ‘alignment with one human’. It sounds like your bet is that it’s somewhere above 50% (maybe?? I’m just guessing); whereas my bet is that it’s under 20% -- i.e. I think that aligning with one human leaves most of the hard problems, and the X risk, unsolved.
And part of my skepticism in that regard is that a great many humans—perhaps most of the 8 billion on Earth—would be happy to use AI to inflict harm, up to and including death and genocide, on certain other individuals and groups of humans. So, AI that’s aligned with frequently homicidal/genocidal individual humans would be AI that’s deeply anti-aligned with other individuals and groups.