How can you align AI with humans when humans are not internally aligned?
AI Alignment researchers often talk about aligning AIs with humans, but humans are not aligned with each other as a species. There are groups whose goals directly conflict with each other, and I don’t think there is any singular goal that all humans share.
As an extreme example, one may say “keep humans alive” is a shared goal among humans, but there are people who think that is an anti-goal and humans should be wiped off the planet (e.g., eco-terrorists). “Humans should be happy” is another goal that not everyone shares, and there are entire religions that discourage pleasure and enjoyment.
You could try to simplify further to “keep species around” but there are some who would be fine with a wire-head future while others are not, and some who would be fine with humans merely existing in a zoo while others are not.
Almost every time I hear alignment researchers speak about aligning humans with AI, they seem to start with a premise that there is a cohesive worldview to align with. The best “solution” to this problem that I have heard suggested is that there should be multiple AIs that compete with each other on behalf of different groups of humans or perhaps individual humans, and each would separately represent the goals of those humans. However, the people who suggest this strategy are generally not AI Alignment researchers but rather people arguing against AI alignment researchers.
What is the implied alignment target that AI alignment researchers are trying to work towards?
Many people concerned with existential threats from AI believe that hardest technical challenge is aligning an AI to do any specific thing at all. They argue that we will have little control over the goals and behavior of superhuman systems, and that solving the problem of aligning AI with any one human will eliminate much of the existential risk associated with AI. See here and here for explanations.
I think most alignment researchers would be happy with being able to align an AI with a single human or small group of humans.
But I think what you are really asking is what governance mechanisms would we want to exist, and that seems very similar to the question of how to run a government?
Is everyone willing to accept that “whatever human happens to build the hard takeoff AI gets to be the human the AI is aligned with”? Do AI alignment researchers realize this human may not be them, and may not align with them? Are AI alignment researchers all OK with Vladimir Putin, Kim Jong Un, or Xi Jinping being the alignment target? What about someone like Ted Kaczynski?
If the idea is “we’ll just decide collectively”, then in the most optimistic scenario we can assume (based on our history with democracy) that the alignment target will be something akin to today’s world leaders, none of whom I would be comfortable having an AI aligned with.
If the plan is “we’ll decide collectively, but using a better mechanism than every current existing mechanism” then it feels like there is an implication here that not only can we solve AI alignment but we can also solve human alignment (something humans have been trying and failing to solve for millennia).
Separately, I’m curious why my post got downvoted on quality (not sure if you or someone else). I’m new to this community so perhaps there is some rule I unintentionally broke that I would like to be made aware of.
I’m not necessarily representing this point of view myself, but I think the idea is that any alignment scenario — alignment with any human or group of humans — would be a triumph compared to “doom”.
I do think that in practice if the alignment problem is solved, then yes, whoever gets there first would get to decide. That might not be as bad as you think, though; China is repressive in order to maintain social control, but that repression wouldn’t necessarily be a prerequisite to social control in a super-AGI scenario.
How can you align AI with humans when humans are not internally aligned?
AI Alignment researchers often talk about aligning AIs with humans, but humans are not aligned with each other as a species. There are groups whose goals directly conflict with each other, and I don’t think there is any singular goal that all humans share.
As an extreme example, one may say “keep humans alive” is a shared goal among humans, but there are people who think that is an anti-goal and humans should be wiped off the planet (e.g., eco-terrorists). “Humans should be happy” is another goal that not everyone shares, and there are entire religions that discourage pleasure and enjoyment.
You could try to simplify further to “keep species around” but there are some who would be fine with a wire-head future while others are not, and some who would be fine with humans merely existing in a zoo while others are not.
Almost every time I hear alignment researchers speak about aligning humans with AI, they seem to start with a premise that there is a cohesive worldview to align with. The best “solution” to this problem that I have heard suggested is that there should be multiple AIs that compete with each other on behalf of different groups of humans or perhaps individual humans, and each would separately represent the goals of those humans. However, the people who suggest this strategy are generally not AI Alignment researchers but rather people arguing against AI alignment researchers.
What is the implied alignment target that AI alignment researchers are trying to work towards?
Yep, this is a totally reasonable question. People have worked on it before: https://www.brookings.edu/research/aligned-with-whom-direct-and-social-goals-for-ai-systems/
Many people concerned with existential threats from AI believe that hardest technical challenge is aligning an AI to do any specific thing at all. They argue that we will have little control over the goals and behavior of superhuman systems, and that solving the problem of aligning AI with any one human will eliminate much of the existential risk associated with AI. See here and here for explanations.
I think most alignment researchers would be happy with being able to align an AI with a single human or small group of humans.
But I think what you are really asking is what governance mechanisms would we want to exist, and that seems very similar to the question of how to run a government?
How do we choose which human gets aligned with?
Is everyone willing to accept that “whatever human happens to build the hard takeoff AI gets to be the human the AI is aligned with”? Do AI alignment researchers realize this human may not be them, and may not align with them? Are AI alignment researchers all OK with Vladimir Putin, Kim Jong Un, or Xi Jinping being the alignment target? What about someone like Ted Kaczynski?
If the idea is “we’ll just decide collectively”, then in the most optimistic scenario we can assume (based on our history with democracy) that the alignment target will be something akin to today’s world leaders, none of whom I would be comfortable having an AI aligned with.
If the plan is “we’ll decide collectively, but using a better mechanism than every current existing mechanism” then it feels like there is an implication here that not only can we solve AI alignment but we can also solve human alignment (something humans have been trying and failing to solve for millennia).
Separately, I’m curious why my post got downvoted on quality (not sure if you or someone else). I’m new to this community so perhaps there is some rule I unintentionally broke that I would like to be made aware of.
I did not downvote your post.
I’m not necessarily representing this point of view myself, but I think the idea is that any alignment scenario — alignment with any human or group of humans — would be a triumph compared to “doom”.
I do think that in practice if the alignment problem is solved, then yes, whoever gets there first would get to decide. That might not be as bad as you think, though; China is repressive in order to maintain social control, but that repression wouldn’t necessarily be a prerequisite to social control in a super-AGI scenario.