Don’t Call It AI Alignment

Let me start by saying that I generally hate pedantic arguments, so I don’t see this post as the most important thing in the world. More important than not calling it AI Alignment is actually ensuring AI Safety, obviously.

Still, the phrase “AI Alignment” has always bugged me. I’m not an AI researcher or anything, so my subject matter knowledge is not very deep, but from my understanding AI Alignment is a poor description of the issue actually at hand. AI Safety is more about control than alignment.

To illustrate, let me display a Twitter poll I recently ran:

Taking out the See Results, that’s around 30% Yes, 70% No.

I would guess that respondents are >2/​3 EA/​rationality people or similar—my follower count is not very large and is a mix of EA and politics people, but it was retweeted by Peter Wildeford, Natalia Coelho, Ezra Newman and (at)PradyuPrasad. 70% of EAs identified as utilitarian in the 2019 EA survey; I’m not sure how these respondents compare to that but they are probably pretty similar. Either way I’d estimate that >25% of people who are mostly hedonic total utiliarians voted No on this question.

Even with what is in my view the most complete and plausible ethical theory we have, a substantial fraction of believers still don’t want an AI exclusively aligned to this theory. Add in the large contentious disagreements that humans (or even just philosophers) have about ethics, and it is pretty clear that we can’t just give AI an ethical theory to run with. This is not to mention the likely impossibility of making an AI that will optimize for utilitarianism or any other moral philosophy. Broadening it to human values rather than a specific moral philosophy, these values are similarly contentious, imprecise and not worth optimizing for in an explicit manner.

An alternative, from my experience less frequently used, definition of AI Alignment is aligning AI to the intent of the designer (this is what Wikipedia says). The problem is that AGI systems themselves, partly by definition, don’t have one intent. AGI systems like ChatGPT (yes it is AGI, no it is not human-level AGI) are broad in capabilities and are then applied by developers to specific problems. AI Safety is more about putting constraints on AI models rather than aligning them to one specific task.

What we really want is an AI that won’t kill us all. The goal of AI Safety is to ensure that humanity has control over its future, that AIs do not start to dominate humanity in the same way humanity dominates all other species. Aligning AI to human values or to any specific intent is relatively unimportant so long as we make sure their actions are within our control. I think calling AI Safety “AI Alignment” or referring to “The Alignment Problem” misleads people as to what the issue actually is and what we should do about it. Just call it AI Safety, and either call the technical problem “AI Control” or drop the (IMO also misleading) idea that there is “one” problem that we are trying to solve.