If you think the Simulation Hypothesis seems likely, but the traditional religions are idiotic
I think the key difference here is that while traditional religions claim detailed knowledge about who the gods are, what they’re like, what they want, and what we should do in light of such knowledge, my position is that we currently actually have little idea who our simulators are and can’t even describe our uncertainty in a clear way (such as with a probability distribution), nor how such knowledge should inform our actions. It would take a lot of research, intellectual progress, and perhaps increased intellectual capacity to change that. I’m fairly certain that any confidence in the details of gods/simulators at this point is unjustified, and people like me are simply at a better epistemic vantage point compared to traditional religionists who make such claims.
These are the human values that religious people would want the AI to align with. If we can’t develop AI systems that are aligned with these values, we haven’t solved the AI alignment problem.
I also think that the existence of religious values poses a serious difficulty for AI alignment, but I have the opposite worry, that we might develop AIs that “blindly” align with religious values (for example locking people into their current religious beliefs because they seem to value faith), thus causing a great deal of harm according to more enlightened values.
It’s not clear to me what should be done with religious values though, either technically or sociopolitically. One (half-baked) idea I have is that if we can develop a good understanding of what “good reasoning” consists of, maybe aligned AI can use that to encourage people to adopt good reasoning processes that eventually cause them to abandon their false religious beliefs and the values that are based on those false beliefs, or allow the the AI to talk people out of their unjustified beliefs/values based on the AI’s own good reasoning.
I agree that traditional religious beliefs & theology usually show much less epistemic humility than EAs who believe in the Simulation Hypothesis. I was just pointing out that there are some similarities in the underlying metaphysics. And, more intellectually advanced forms of these religions (e.g. more recent Protestant theology, Zen Buddhism) do show a fairly high degree of epistemic humility in not pretending to know a lot of details about what’s behind the Simulation.
Your second point raises a crucial ethical challenge for EA.
When we say that we want AI that’s ‘aligned with human values’, do we really mean aligned with individual people’s current values as they are (perhaps including fundamentalist religious values, hard-core ethnonationalist values, runaway consumerist values, or sociopathic values)?
Or do we mean we want AI to support people’s idealized values as we might want them to be?
If the latter, we’re not really seeking ‘AI alignment’. We’re talking about using AI systems as mass ‘moral enhancement’ technologies. AKA ‘moral conformity’ technologies, aka ‘political indoctrination’ technologies. That raises a whole other set of questions about power, do-gooding, elitism, and hubris.
So, we better be honest with ourselves about which type of ‘alignment’ we’re really aiming for.
If the latter, we’re not really seeking ‘AI alignment’. We’re talking about using AI systems as mass ‘moral enhancement’ technologies. AKA ‘moral conformity’ technologies, aka ‘political indoctrination’ technologies. That raises a whole other set of questions about power, do-gooding, elitism, and hubris.
I would draw a distinction between what I call “metaphilosophical paternalism” and “political indoctrination”, the difference being whether we’re “encouraging” what we think are good reasoning methods and good meta-level preferences (e.g., preferences about how to reason, how to form beliefs, how to interact with people with different beliefs/values), or whether we’re “encouraging” object-level preferences for example about income redistribution.
My precondition for doing this though, is that we first solve metaphilosophy, in other words have a thorough understanding of what “good reasoning” (including philosophical and moral reasoning) actually consists of, or a thorough understanding of what good meta-level preferences consist of. I would be the first to admit that we seriously lack this right now. It seems a very long shot to develop such an understanding before AGI, but I have trouble seeing how to ensure a good long term outcome for future human-AI civilization unless we succeed in doing something like this.
I think in practice what we’re likely to get is “political indoctrination” (given huge institutional pressure/incentive to do that), which I’m very worried about but am not sure how to prevent, aside from solving metaphilosophy and talking people into doing metaphilosophical paternalism instead.
So, we better be honest with ourselves about which type of ‘alignment’ we’re really aiming for.
I have had discussions with some alignment researchers (mainly Paul Christiano) about my concerns on this topic, and the impression I get is that they’re mainly focused on “aligned with individual people’s current values as they are” and they’re not hugely concerned about this leading to bad outcomes like people locking in their current beliefs/values. I think Paul said something like he doesn’t think many people would actually want their AI to do that, and others are mostly just ignoring the issue? They also don’t seem hugely concerned that their work will be (mis)used for “political indoctrination” (regardless of what they personally prefer).
So from my perspective, the problem is not so much alignment researchers “not being honest with themselves” about what kind of alignment we’re aiming for, but rather a confusing (to me) nonchalance about potential negative outcomes of AIs aligned with religious or ideological values.
ETA: What’s your own view on this? How do you see things working out in the long run if we do build AIs aligned to people’s current values, which include religious values for many of them? Based on this, are you worried or not worried?
It would be lovely if we could gently nudge people, through unbiased ‘metaphilosophical paternalism’, to adopt better meta-preferences about how to reason, debate, and update their values. What a wonderful world that would be. Turning everyone into EAs, in our own image.
However, I agree with you that in practice, AI systems are likely end up (1) ‘aligning’ on people’s values as they actually are—i.e. mostly religious, politically partisan, nepotistic, anthropocentric, hypocritical, fiercely tribal, etc. , or (2) embodying some set of values approved by certain powerful elites, that differ from what ordinary folks currently believe, but that are promoted ‘for their own good’—which would basically be the most powerful system of indoctrination and propaganda ever developed.
The recent concern about AI researchers about how to ‘reduce misinformation on social media’ through politically selective censorship suggest that option (2) will be very tempting to AI developers seeking to ‘do good’ in the world.
And of course, even if we could figure out how AI systems could do metaphilosophical paternalism, religious people have a very different idea of what that should look like—e.g. they might promote faith over reason, tradition over open-mindedness, revelation over empiricism, sectarianism over universalism, afterlife longtermism over futuristic longtermism, etc.
I think the key difference here is that while traditional religions claim detailed knowledge about who the gods are, what they’re like, what they want, and what we should do in light of such knowledge, my position is that we currently actually have little idea who our simulators are and can’t even describe our uncertainty in a clear way (such as with a probability distribution), nor how such knowledge should inform our actions. It would take a lot of research, intellectual progress, and perhaps increased intellectual capacity to change that. I’m fairly certain that any confidence in the details of gods/simulators at this point is unjustified, and people like me are simply at a better epistemic vantage point compared to traditional religionists who make such claims.
I also think that the existence of religious values poses a serious difficulty for AI alignment, but I have the opposite worry, that we might develop AIs that “blindly” align with religious values (for example locking people into their current religious beliefs because they seem to value faith), thus causing a great deal of harm according to more enlightened values.
It’s not clear to me what should be done with religious values though, either technically or sociopolitically. One (half-baked) idea I have is that if we can develop a good understanding of what “good reasoning” consists of, maybe aligned AI can use that to encourage people to adopt good reasoning processes that eventually cause them to abandon their false religious beliefs and the values that are based on those false beliefs, or allow the the AI to talk people out of their unjustified beliefs/values based on the AI’s own good reasoning.
Wei_Dai: good replies.
I agree that traditional religious beliefs & theology usually show much less epistemic humility than EAs who believe in the Simulation Hypothesis. I was just pointing out that there are some similarities in the underlying metaphysics. And, more intellectually advanced forms of these religions (e.g. more recent Protestant theology, Zen Buddhism) do show a fairly high degree of epistemic humility in not pretending to know a lot of details about what’s behind the Simulation.
Your second point raises a crucial ethical challenge for EA.
When we say that we want AI that’s ‘aligned with human values’, do we really mean aligned with individual people’s current values as they are (perhaps including fundamentalist religious values, hard-core ethnonationalist values, runaway consumerist values, or sociopathic values)?
Or do we mean we want AI to support people’s idealized values as we might want them to be?
If the latter, we’re not really seeking ‘AI alignment’. We’re talking about using AI systems as mass ‘moral enhancement’ technologies. AKA ‘moral conformity’ technologies, aka ‘political indoctrination’ technologies. That raises a whole other set of questions about power, do-gooding, elitism, and hubris.
So, we better be honest with ourselves about which type of ‘alignment’ we’re really aiming for.
I would draw a distinction between what I call “metaphilosophical paternalism” and “political indoctrination”, the difference being whether we’re “encouraging” what we think are good reasoning methods and good meta-level preferences (e.g., preferences about how to reason, how to form beliefs, how to interact with people with different beliefs/values), or whether we’re “encouraging” object-level preferences for example about income redistribution.
My precondition for doing this though, is that we first solve metaphilosophy, in other words have a thorough understanding of what “good reasoning” (including philosophical and moral reasoning) actually consists of, or a thorough understanding of what good meta-level preferences consist of. I would be the first to admit that we seriously lack this right now. It seems a very long shot to develop such an understanding before AGI, but I have trouble seeing how to ensure a good long term outcome for future human-AI civilization unless we succeed in doing something like this.
I think in practice what we’re likely to get is “political indoctrination” (given huge institutional pressure/incentive to do that), which I’m very worried about but am not sure how to prevent, aside from solving metaphilosophy and talking people into doing metaphilosophical paternalism instead.
I have had discussions with some alignment researchers (mainly Paul Christiano) about my concerns on this topic, and the impression I get is that they’re mainly focused on “aligned with individual people’s current values as they are” and they’re not hugely concerned about this leading to bad outcomes like people locking in their current beliefs/values. I think Paul said something like he doesn’t think many people would actually want their AI to do that, and others are mostly just ignoring the issue? They also don’t seem hugely concerned that their work will be (mis)used for “political indoctrination” (regardless of what they personally prefer).
So from my perspective, the problem is not so much alignment researchers “not being honest with themselves” about what kind of alignment we’re aiming for, but rather a confusing (to me) nonchalance about potential negative outcomes of AIs aligned with religious or ideological values.
ETA: What’s your own view on this? How do you see things working out in the long run if we do build AIs aligned to people’s current values, which include religious values for many of them? Based on this, are you worried or not worried?
Hi Wei_Dai—great comments and insights.
It would be lovely if we could gently nudge people, through unbiased ‘metaphilosophical paternalism’, to adopt better meta-preferences about how to reason, debate, and update their values. What a wonderful world that would be. Turning everyone into EAs, in our own image.
However, I agree with you that in practice, AI systems are likely end up (1) ‘aligning’ on people’s values as they actually are—i.e. mostly religious, politically partisan, nepotistic, anthropocentric, hypocritical, fiercely tribal, etc. , or (2) embodying some set of values approved by certain powerful elites, that differ from what ordinary folks currently believe, but that are promoted ‘for their own good’—which would basically be the most powerful system of indoctrination and propaganda ever developed.
The recent concern about AI researchers about how to ‘reduce misinformation on social media’ through politically selective censorship suggest that option (2) will be very tempting to AI developers seeking to ‘do good’ in the world.
And of course, even if we could figure out how AI systems could do metaphilosophical paternalism, religious people have a very different idea of what that should look like—e.g. they might promote faith over reason, tradition over open-mindedness, revelation over empiricism, sectarianism over universalism, afterlife longtermism over futuristic longtermism, etc.