when we have no evidence that aligning AGIs with ‘human values’ would be any easier than aligning Palestinians with Israeli values, or aligning libertarian atheists with Russian Orthodox values—or even aligning Gen Z with Gen X values?
When I ask an LLM to do something it usually outputs something that is its best attempt at being helpful. How is this not some evidence of alignment that is easier than inter-human alignment?
LLMs are not AGIs in the sense being discussed, they are at best proto-AGI. That means the logic fails at exactly the point where it matters.
When I ask a friend to give me a dollar when I’m short, they often do so. Is this evidence that I can borrow a billion dollars? Should I go on a spending spree on the basis that I’ll be able to get the money to pay for it from those friends?
When I lift, catch, or throw a 10 pound weight, I usually manage it without hurting myself. Is this evidence that weight isn’t an issue? Should I try to catch a 1,000 pound boulder?
‘AI alignment’ isn’t about whether a narrow, reactive, non-agentic AI system (such as a current LLM) seems ‘helpful’.
It’s about whether an agentic AI that can make its own decision and take its own autonomous actions will make decisions that are aligned with general human values and goals.
When I ask an LLM to do something it usually outputs something that is its best attempt at being helpful. How is this not some evidence of alignment that is easier than inter-human alignment?
LLMs are not AGIs in the sense being discussed, they are at best proto-AGI. That means the logic fails at exactly the point where it matters.
When I ask a friend to give me a dollar when I’m short, they often do so. Is this evidence that I can borrow a billion dollars? Should I go on a spending spree on the basis that I’ll be able to get the money to pay for it from those friends?
When I lift, catch, or throw a 10 pound weight, I usually manage it without hurting myself. Is this evidence that weight isn’t an issue? Should I try to catch a 1,000 pound boulder?
‘AI alignment’ isn’t about whether a narrow, reactive, non-agentic AI system (such as a current LLM) seems ‘helpful’.
It’s about whether an agentic AI that can make its own decision and take its own autonomous actions will make decisions that are aligned with general human values and goals.