Humans are subject to instrumental convergence as much as an AI would be. We seek power, resources and influence in pursuit of many of our goals.
Whatever our goals happen to be, we will want to use AI to help us increase our power to help us get what we value.
If people are augmenting their goal-seeking with AI, will we converge on harmonious goals, or will we continue to pursue parochial self-interest?
In short, if we somehow solve the alignment problem for AI, will we also solve the human alignment problem? Or will we simply race to use AI to maximise our own power and our own values, even if these harm others?
The best hope is that if we solve AI alignment, the AI will keep us in check in a benevolent and minimally impactful way. It will prevent us from pursuing zero-sum goals and guide us to be better versions of ourselves.
But this kind of control may well appear misaligned from our current perspectives, in that some people’s cherished goals and values may not be the ones the AI chooses to support.
So to talk of aligned AI is to gloss over the possibility that it is likely to be misaligned with a great many peoples’ current goals and ambitions.
The human alignment problem
Humans are subject to instrumental convergence as much as an AI would be. We seek power, resources and influence in pursuit of many of our goals.
Whatever our goals happen to be, we will want to use AI to help us increase our power to help us get what we value.
If people are augmenting their goal-seeking with AI, will we converge on harmonious goals, or will we continue to pursue parochial self-interest?
In short, if we somehow solve the alignment problem for AI, will we also solve the human alignment problem? Or will we simply race to use AI to maximise our own power and our own values, even if these harm others?
The best hope is that if we solve AI alignment, the AI will keep us in check in a benevolent and minimally impactful way. It will prevent us from pursuing zero-sum goals and guide us to be better versions of ourselves.
But this kind of control may well appear misaligned from our current perspectives, in that some people’s cherished goals and values may not be the ones the AI chooses to support.
So to talk of aligned AI is to gloss over the possibility that it is likely to be misaligned with a great many peoples’ current goals and ambitions.