@niplav Interesting take; thanks for the detailed response.
Technically, I think that AI safety as a technical discipline has no “say” in who the systems should be aligned with. That’s for society at large to decide.
So, if AI safety as a technical discipline should not have a say on who systems should be aligned with, but they are the ones aiming to align the systems, whose values are they aiming to align the systems with?
Is it naturally an extension of the values of whoever has the most compute power, best engineers, and most data?
I love the idea of society at large deciding but then I think about humanity’s track record.
I am somewhat more hopeful about society at large deciding how to use AI systems: I have the impression that wealth has made moral progress faster (since people have more slack for caring about others). This becomes especially stark when I read about very poor people in the past and their behavior towards others.
That said, I’d be happier if we found out how to encode ethical progress in an algorithm and just run that, but I’m not optimistic about our chances of finding such an algorithm (if it exists).
In my conception, AI alignment is the theory of aligning any stronger cognitive system with any weaker cognitive system, allowing for incoherencies and inconsistencies in the weaker system’s actions and preferences.
I very much hope that the solution to AI alignment is not one where we have a theory of how to align AI systems to a specific human—that kind of solution seems fraudulent just on technical grounds (far too specific).
I would make a distinction between alignment theorists and alignment engineers/implementors: the former find a theory of how to align any AI system (or set of systems) with any human (or set of humans), the alignment implementors take that theoretical solution and apply it to specific AI systems and specific humans.
Alignment theorists and alignment implementors might be the same people, but the roles are different.
@niplav Interesting take; thanks for the detailed response.
So, if AI safety as a technical discipline should not have a say on who systems should be aligned with, but they are the ones aiming to align the systems, whose values are they aiming to align the systems with?
Is it naturally an extension of the values of whoever has the most compute power, best engineers, and most data?
I love the idea of society at large deciding but then I think about humanity’s track record.
I am somewhat more hopeful about society at large deciding how to use AI systems: I have the impression that wealth has made moral progress faster (since people have more slack for caring about others). This becomes especially stark when I read about very poor people in the past and their behavior towards others.
That said, I’d be happier if we found out how to encode ethical progress in an algorithm and just run that, but I’m not optimistic about our chances of finding such an algorithm (if it exists).
Interesting, thanks for sharing your thoughts. I guess I’m less certain that wealth has led to faster moral progress.
In my conception, AI alignment is the theory of aligning any stronger cognitive system with any weaker cognitive system, allowing for incoherencies and inconsistencies in the weaker system’s actions and preferences.
I very much hope that the solution to AI alignment is not one where we have a theory of how to align AI systems to a specific human—that kind of solution seems fraudulent just on technical grounds (far too specific).
I would make a distinction between alignment theorists and alignment engineers/implementors: the former find a theory of how to align any AI system (or set of systems) with any human (or set of humans), the alignment implementors take that theoretical solution and apply it to specific AI systems and specific humans.
Alignment theorists and alignment implementors might be the same people, but the roles are different.
This is similar to many technical problems: You might ask someone trying to find a slope that goes through a could of x/y points, with the smallest distance to each of those points, “But which dataset are you trying to apply the linear regression to?”—the answer is “any”.