There are three levels of answers to this question: What the ideal case would be, what the goal to aim for should be, and what will probably happen.
What the ideal case would be: We find a way to encode “true morality” or “the core of what has been driving moral progress” and align AI systems to that.
The slightly less ideal case: AI systems are aligned with humanity’s Coherent Extrapolated Volition of humans that are currently alive. Hopefully that process figures out what relevant moral patients are, and takes their interests into consideration.
In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
What the goal to aim for should be: Something that is (1) good and (2) humanity can coordinate around. In the best case this approximates Coherent Extrapolated Volition, but looks mundane: Humans build AI systems, and there is some democratic control over them, and China has some relevant AI systems, the US has some, the rest of the world rents access to those. Humanity uses them to become smarter, and figures out relevant mechanisms for democratic control over the systems (as we become richer and don’t care as much about zero-sum competition).
What is probably going to happen: A few actors create powerful AI systems and figure out how to align them to their personal interests. They use those systems to colonize the universe, but burn most of the cosmic commons on status signaling games.
Technically, I think that AI safety as a technical discipline has no “say” in who the systems should be aligned with. That’s for society at large to decide.
@niplav Interesting take; thanks for the detailed response.
Technically, I think that AI safety as a technical discipline has no “say” in who the systems should be aligned with. That’s for society at large to decide.
So, if AI safety as a technical discipline should not have a say on who systems should be aligned with, but they are the ones aiming to align the systems, whose values are they aiming to align the systems with?
Is it naturally an extension of the values of whoever has the most compute power, best engineers, and most data?
I love the idea of society at large deciding but then I think about humanity’s track record.
I am somewhat more hopeful about society at large deciding how to use AI systems: I have the impression that wealth has made moral progress faster (since people have more slack for caring about others). This becomes especially stark when I read about very poor people in the past and their behavior towards others.
That said, I’d be happier if we found out how to encode ethical progress in an algorithm and just run that, but I’m not optimistic about our chances of finding such an algorithm (if it exists).
In my conception, AI alignment is the theory of aligning any stronger cognitive system with any weaker cognitive system, allowing for incoherencies and inconsistencies in the weaker system’s actions and preferences.
I very much hope that the solution to AI alignment is not one where we have a theory of how to align AI systems to a specific human—that kind of solution seems fraudulent just on technical grounds (far too specific).
I would make a distinction between alignment theorists and alignment engineers/implementors: the former find a theory of how to align any AI system (or set of systems) with any human (or set of humans), the alignment implementors take that theoretical solution and apply it to specific AI systems and specific humans.
Alignment theorists and alignment implementors might be the same people, but the roles are different.
There are three levels of answers to this question: What the ideal case would be, what the goal to aim for should be, and what will probably happen.
What the ideal case would be: We find a way to encode “true morality” or “the core of what has been driving moral progress” and align AI systems to that.
The slightly less ideal case: AI systems are aligned with humanity’s Coherent Extrapolated Volition of humans that are currently alive. Hopefully that process figures out what relevant moral patients are, and takes their interests into consideration.
What the goal to aim for should be: Something that is (1) good and (2) humanity can coordinate around. In the best case this approximates Coherent Extrapolated Volition, but looks mundane: Humans build AI systems, and there is some democratic control over them, and China has some relevant AI systems, the US has some, the rest of the world rents access to those. Humanity uses them to become smarter, and figures out relevant mechanisms for democratic control over the systems (as we become richer and don’t care as much about zero-sum competition).
What is probably going to happen: A few actors create powerful AI systems and figure out how to align them to their personal interests. They use those systems to colonize the universe, but burn most of the cosmic commons on status signaling games.
Technically, I think that AI safety as a technical discipline has no “say” in who the systems should be aligned with. That’s for society at large to decide.
@niplav Interesting take; thanks for the detailed response.
So, if AI safety as a technical discipline should not have a say on who systems should be aligned with, but they are the ones aiming to align the systems, whose values are they aiming to align the systems with?
Is it naturally an extension of the values of whoever has the most compute power, best engineers, and most data?
I love the idea of society at large deciding but then I think about humanity’s track record.
I am somewhat more hopeful about society at large deciding how to use AI systems: I have the impression that wealth has made moral progress faster (since people have more slack for caring about others). This becomes especially stark when I read about very poor people in the past and their behavior towards others.
That said, I’d be happier if we found out how to encode ethical progress in an algorithm and just run that, but I’m not optimistic about our chances of finding such an algorithm (if it exists).
Interesting, thanks for sharing your thoughts. I guess I’m less certain that wealth has led to faster moral progress.
In my conception, AI alignment is the theory of aligning any stronger cognitive system with any weaker cognitive system, allowing for incoherencies and inconsistencies in the weaker system’s actions and preferences.
I very much hope that the solution to AI alignment is not one where we have a theory of how to align AI systems to a specific human—that kind of solution seems fraudulent just on technical grounds (far too specific).
I would make a distinction between alignment theorists and alignment engineers/implementors: the former find a theory of how to align any AI system (or set of systems) with any human (or set of humans), the alignment implementors take that theoretical solution and apply it to specific AI systems and specific humans.
Alignment theorists and alignment implementors might be the same people, but the roles are different.
This is similar to many technical problems: You might ask someone trying to find a slope that goes through a could of x/y points, with the smallest distance to each of those points, “But which dataset are you trying to apply the linear regression to?”—the answer is “any”.