Assuming misaligned AI is a risk, is technical AI alignment enough, or do you need joint AI/Societal alignment?
My work has involved trying to support risk awareness and coordination similar to what has been suggested for AI alignment. For example, for mitigating harms around synthetic media / “deepfakes” (now rebranded to generative AI) and it worked for a few years with all the major orgs and most relevant research groups.
But then new orgs jumped in to fill the capability gap! (e.g. eleuther, stability, etc.) Due to demand and for potentially good reasons: those capabilities which can harm people can also help people. The ultimate result is the proliferation/access/democratization of AI capabilities in the face of risks.
Question 1) What would stop the same thing from happening for technical AI safety alignment?[1]
I’m currently skeptical that this sort of coordination is possible without some addressing deeper societal incentives (AKA reward functions; e.g. around profit/power/attention maximization, self-dealing, etc.) and related multi-principal-agent challenges. This joint/ai societal alignment or holistic alignment would seem to be a prerequisite to the actual implementation of technical alignment.[2]
Question 2) Am I missing something here? If one assumes that misaligned AI is a threat worth resourcing, what is the likelihood of succeeding at AI alignment longterm without also succeeding at ‘societal alignment’?
This is assuming you can even get the major players on board, which isn’t true for e.g. misaligned recommender systems that I’ve also worked on (on the societal side).
Assuming misaligned AI is a risk, is technical AI alignment enough, or do you need joint AI/Societal alignment?
My work has involved trying to support risk awareness and coordination similar to what has been suggested for AI alignment. For example, for mitigating harms around synthetic media / “deepfakes” (now rebranded to generative AI) and it worked for a few years with all the major orgs and most relevant research groups.
But then new orgs jumped in to fill the capability gap! (e.g. eleuther, stability, etc.)
Due to demand and for potentially good reasons: those capabilities which can harm people can also help people. The ultimate result is the proliferation/access/democratization of AI capabilities in the face of risks.
I’m currently skeptical that this sort of coordination is possible without some addressing deeper societal incentives (AKA reward functions; e.g. around profit/power/attention maximization, self-dealing, etc.) and related multi-principal-agent challenges. This joint/ai societal alignment or holistic alignment would seem to be a prerequisite to the actual implementation of technical alignment.[2]
This is assuming you can even get the major players on board, which isn’t true for e.g. misaligned recommender systems that I’ve also worked on (on the societal side).
This would also be generally good for the world! E.g. to address externalities, political dysfunction, corruption, etc.