Ryan Greenblatt comments on Clarifying two uses of “alignment”

Ryan Greenblatt 10 Mar 2024 19:40 UTC
8 points
0 ∶ 0
My impression is that people’s opinions about AI alignment difficulty often comes down to differences in how much they think we need to solve the second problem relative to the first problem, in order to get AI systems that generate net-positive value for humans.
I don’t think many people are very optimistic about ensuring good outcomes from AI due to the combination of the following beliefs:
1. AIs will have long run goals that aren’t intentionally instilled by their creators. (Beyond episode goals).
2. From the perspective of this person’s moral views (the person whose beliefs are under consideration), these long run goals have no terminal value (just caring about paperclips).
3. These AIs will very quickly (years not decades) become wildly smarter than humans due to explosive growth in the singularity.
4. But, this will be fine because these AIs will just integrate into society similarly to humans: they’ll trade with other agents, obey laws, accept money, etc.
Other than Robin Hanson and you, I’m not aware of anyone else who puts substantial weight on this collection of views.
I think more common reasons for optimism are either:
1. AIs won’t have long long run goals that aren’t intentionally instilled by their creators. (Prior to humans being obsoleted by AIs which will take care of the next generation of alignment difficulties.)
2. AIs will have long run goals, but these long run goals have at least some terminal value. (This could be due to some indirect argument like “well, it would be overly paternalistic to not value what our successors value”.)
Separately, I’m somewhat optimistic about gaining value from approaches that involve paying AIs as I discuss in another comment.