If the latter, we’re not really seeking ‘AI alignment’. We’re talking about using AI systems as mass ‘moral enhancement’ technologies. AKA ‘moral conformity’ technologies, aka ‘political indoctrination’ technologies. That raises a whole other set of questions about power, do-gooding, elitism, and hubris.
I would draw a distinction between what I call “metaphilosophical paternalism” and “political indoctrination”, the difference being whether we’re “encouraging” what we think are good reasoning methods and good meta-level preferences (e.g., preferences about how to reason, how to form beliefs, how to interact with people with different beliefs/values), or whether we’re “encouraging” object-level preferences for example about income redistribution.
My precondition for doing this though, is that we first solve metaphilosophy, in other words have a thorough understanding of what “good reasoning” (including philosophical and moral reasoning) actually consists of, or a thorough understanding of what good meta-level preferences consist of. I would be the first to admit that we seriously lack this right now. It seems a very long shot to develop such an understanding before AGI, but I have trouble seeing how to ensure a good long term outcome for future human-AI civilization unless we succeed in doing something like this.
I think in practice what we’re likely to get is “political indoctrination” (given huge institutional pressure/incentive to do that), which I’m very worried about but am not sure how to prevent, aside from solving metaphilosophy and talking people into doing metaphilosophical paternalism instead.
So, we better be honest with ourselves about which type of ‘alignment’ we’re really aiming for.
I have had discussions with some alignment researchers (mainly Paul Christiano) about my concerns on this topic, and the impression I get is that they’re mainly focused on “aligned with individual people’s current values as they are” and they’re not hugely concerned about this leading to bad outcomes like people locking in their current beliefs/values. I think Paul said something like he doesn’t think many people would actually want their AI to do that, and others are mostly just ignoring the issue? They also don’t seem hugely concerned that their work will be (mis)used for “political indoctrination” (regardless of what they personally prefer).
So from my perspective, the problem is not so much alignment researchers “not being honest with themselves” about what kind of alignment we’re aiming for, but rather a confusing (to me) nonchalance about potential negative outcomes of AIs aligned with religious or ideological values.
ETA: What’s your own view on this? How do you see things working out in the long run if we do build AIs aligned to people’s current values, which include religious values for many of them? Based on this, are you worried or not worried?
It would be lovely if we could gently nudge people, through unbiased ‘metaphilosophical paternalism’, to adopt better meta-preferences about how to reason, debate, and update their values. What a wonderful world that would be. Turning everyone into EAs, in our own image.
However, I agree with you that in practice, AI systems are likely end up (1) ‘aligning’ on people’s values as they actually are—i.e. mostly religious, politically partisan, nepotistic, anthropocentric, hypocritical, fiercely tribal, etc. , or (2) embodying some set of values approved by certain powerful elites, that differ from what ordinary folks currently believe, but that are promoted ‘for their own good’—which would basically be the most powerful system of indoctrination and propaganda ever developed.
The recent concern about AI researchers about how to ‘reduce misinformation on social media’ through politically selective censorship suggest that option (2) will be very tempting to AI developers seeking to ‘do good’ in the world.
And of course, even if we could figure out how AI systems could do metaphilosophical paternalism, religious people have a very different idea of what that should look like—e.g. they might promote faith over reason, tradition over open-mindedness, revelation over empiricism, sectarianism over universalism, afterlife longtermism over futuristic longtermism, etc.
I would draw a distinction between what I call “metaphilosophical paternalism” and “political indoctrination”, the difference being whether we’re “encouraging” what we think are good reasoning methods and good meta-level preferences (e.g., preferences about how to reason, how to form beliefs, how to interact with people with different beliefs/values), or whether we’re “encouraging” object-level preferences for example about income redistribution.
My precondition for doing this though, is that we first solve metaphilosophy, in other words have a thorough understanding of what “good reasoning” (including philosophical and moral reasoning) actually consists of, or a thorough understanding of what good meta-level preferences consist of. I would be the first to admit that we seriously lack this right now. It seems a very long shot to develop such an understanding before AGI, but I have trouble seeing how to ensure a good long term outcome for future human-AI civilization unless we succeed in doing something like this.
I think in practice what we’re likely to get is “political indoctrination” (given huge institutional pressure/incentive to do that), which I’m very worried about but am not sure how to prevent, aside from solving metaphilosophy and talking people into doing metaphilosophical paternalism instead.
I have had discussions with some alignment researchers (mainly Paul Christiano) about my concerns on this topic, and the impression I get is that they’re mainly focused on “aligned with individual people’s current values as they are” and they’re not hugely concerned about this leading to bad outcomes like people locking in their current beliefs/values. I think Paul said something like he doesn’t think many people would actually want their AI to do that, and others are mostly just ignoring the issue? They also don’t seem hugely concerned that their work will be (mis)used for “political indoctrination” (regardless of what they personally prefer).
So from my perspective, the problem is not so much alignment researchers “not being honest with themselves” about what kind of alignment we’re aiming for, but rather a confusing (to me) nonchalance about potential negative outcomes of AIs aligned with religious or ideological values.
ETA: What’s your own view on this? How do you see things working out in the long run if we do build AIs aligned to people’s current values, which include religious values for many of them? Based on this, are you worried or not worried?
Hi Wei_Dai—great comments and insights.
It would be lovely if we could gently nudge people, through unbiased ‘metaphilosophical paternalism’, to adopt better meta-preferences about how to reason, debate, and update their values. What a wonderful world that would be. Turning everyone into EAs, in our own image.
However, I agree with you that in practice, AI systems are likely end up (1) ‘aligning’ on people’s values as they actually are—i.e. mostly religious, politically partisan, nepotistic, anthropocentric, hypocritical, fiercely tribal, etc. , or (2) embodying some set of values approved by certain powerful elites, that differ from what ordinary folks currently believe, but that are promoted ‘for their own good’—which would basically be the most powerful system of indoctrination and propaganda ever developed.
The recent concern about AI researchers about how to ‘reduce misinformation on social media’ through politically selective censorship suggest that option (2) will be very tempting to AI developers seeking to ‘do good’ in the world.
And of course, even if we could figure out how AI systems could do metaphilosophical paternalism, religious people have a very different idea of what that should look like—e.g. they might promote faith over reason, tradition over open-mindedness, revelation over empiricism, sectarianism over universalism, afterlife longtermism over futuristic longtermism, etc.