Edit: not sure how much I still like the following frame; it might lump together a handful of questions that are better thought of as distinct.
I’d tentatively suggest an additional question for the post’s list of research questions (in the context of the idea that we may only get narrow/minimalist versions of alignment):
Assuming that transformative AI will be aligned, how good will the future be?
My (not very confident) sense is that opinions on this question are highly varied, and that it’s another important strategic question. After all,
Some people seem to think that, if transformative AI will be aligned, then the future will be amazing.
A common justification for this view seems to be: AI will be aligned to people/groups who on reflection would have good values (because most people/institutions have such values, or because people/groups with good values are on track to influence), and AI-assisted deliberation & coordination will be enough to bootstrap them from that starting point to an amazing future.
If we had good arguments for this, the community could focus on alignment.
Some people seem to think that, even if transformative AI will be aligned, the future won’t be all that amazing.
Common justifications for this view seem to be: AI will be aligned to individuals or (coordinated) groups with lame or bad values, either because they are already on track to influence or because inadequate cooperation will erode value during or after the development of transformative AI.
If we had good arguments for this, the community could dedicate a large fraction of its resources to addressing whatever may cause a future with aligned AI to not be great (e.g., by boosting certain organizational or individual actors, improving institutions, forming “cooperation-compatible” plans for using aligned AI, or otherwise improving cooperation).
Work in (meta)ethics, moral psychology, and cultural/moral history
On the claim that agents with good values will, for theoretical reasons, exert disproportionate influence (a potential contributor to the future being good, given alignment):
Work on moral trade also seems relevant here (since moral trade lets everyone have more influence on what they care more about).
On the claim that currently influential groups have good/lame/bad values (a potential contributor to the future being good or bad/lame, given alignment):
Edit: not sure how much I still like the following frame; it might lump together a handful of questions that are better thought of as distinct.
I’d tentatively suggest an additional question for the post’s list of research questions (in the context of the idea that we may only get narrow/minimalist versions of alignment):
Assuming that transformative AI will be aligned, how good will the future be?
My (not very confident) sense is that opinions on this question are highly varied, and that it’s another important strategic question. After all,
Some people seem to think that, if transformative AI will be aligned, then the future will be amazing.
A common justification for this view seems to be: AI will be aligned to people/groups who on reflection would have good values (because most people/institutions have such values, or because people/groups with good values are on track to influence), and AI-assisted deliberation & coordination will be enough to bootstrap them from that starting point to an amazing future.
If we had good arguments for this, the community could focus on alignment.
Some people seem to think that, even if transformative AI will be aligned, the future won’t be all that amazing.
Common justifications for this view seem to be: AI will be aligned to individuals or (coordinated) groups with lame or bad values, either because they are already on track to influence or because inadequate cooperation will erode value during or after the development of transformative AI.
If we had good arguments for this, the community could dedicate a large fraction of its resources to addressing whatever may cause a future with aligned AI to not be great (e.g., by boosting certain organizational or individual actors, improving institutions, forming “cooperation-compatible” plans for using aligned AI, or otherwise improving cooperation).
Some existing work on these topics, as potential starting points for people interested in looking into this (updated March 11, 2022):
On (AI-assisted) reflection on values (a potential contributor to the future being good, given alignment):
Decoupling deliberation from competition (Christiano, 2021)
Ambitious vs. narrow value learning (Christiano, 2015)
Work in (meta)ethics, moral psychology, and cultural/moral history
On the claim that agents with good values will, for theoretical reasons, exert disproportionate influence (a potential contributor to the future being good, given alignment):
Why might the future be good? (Christiano, 2013)
Work on moral trade also seems relevant here (since moral trade lets everyone have more influence on what they care more about).
On the claim that currently influential groups have good/lame/bad values (a potential contributor to the future being good or bad/lame, given alignment):
This comment (Drexler, 2021)
We’re already in AI takeoff (Valentine, 2022)
Work on the values, processes, and histories of relevant governments, companies, and (social, ideological, and political) movements
One could have informal conversations to learn more about how much leverage various people/groups do or don’t have in relevant groups/organizations
On value erosion through competition (a potential contributor to the future being bad/lame, even with alignment):
“Value erosion through competition” section of a post (Dafoe, 2020)
The four readings cited/linked in the above Dafoe post section
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) (Critch, 2021) (see the comments for further discussion)
Spreading happiness to the stars seems little harder than just spreading (Shulman, 2012) (see the comments for further discussion)
Game theoretic work on cooperation and competition (?)
Keeping an eye out for more work on this topic might be useful.
Additional material that seems relevant:
Public choice theory and social choice theory (?)
Technical alignment work also seems like important context for thinking about what AI aligned to a group/organization may be like.
Additional sources referenced in section 1.2 of the Global Priorities Institute’s research agenda may also be relevant.
Several parts of the original post here and its appendices also seem relevant.