titotal comments on Clarifying two uses of “alignment”

titotal 10 Mar 2024 21:10 UTC
20 points
2 ∶ 0
I think there’s also a problem with treating “misaligned” as a binary thing, where the AI either exactly shares all our values down to the smallest detail (aligned) or it doesn’t (misaligned). As the OP has noted, in this sense all human beings are “misaligned” with each other.
It makes sense to me to divide your category 2 further, talking about alignment as a spectrum, from “perfectly aligned”, to “won’t kill anyone aligned” to “won’t extinct humanity aligned”. The first is probably impossible, the last is probably not that difficult.
If we have an AI that is “won’t kill anyone aligned”, then your world of AI trade seems fine. We can trade for our mutual benefit safe in the knowledge that if a power struggle ensues, it will not end in our destruction.