Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
I think the key difference here is that while traditional religions claim detailed knowledge about who the gods are, what they’re like, what they want, and what we should do in light of such knowledge, my position is that we currently actually have little idea who our simulators are and can’t even describe our uncertainty in a clear way (such as with a probability distribution), nor how such knowledge should inform our actions. It would take a lot of research, intellectual progress, and perhaps increased intellectual capacity to change that. I’m fairly certain that any confidence in the details of gods/simulators at this point is unjustified, and people like me are simply at a better epistemic vantage point compared to traditional religionists who make such claims.
I also think that the existence of religious values poses a serious difficulty for AI alignment, but I have the opposite worry, that we might develop AIs that “blindly” align with religious values (for example locking people into their current religious beliefs because they seem to value faith), thus causing a great deal of harm according to more enlightened values.
It’s not clear to me what should be done with religious values though, either technically or sociopolitically. One (half-baked) idea I have is that if we can develop a good understanding of what “good reasoning” consists of, maybe aligned AI can use that to encourage people to adopt good reasoning processes that eventually cause them to abandon their false religious beliefs and the values that are based on those false beliefs, or allow the the AI to talk people out of their unjustified beliefs/values based on the AI’s own good reasoning.
Wei_Dai: good replies.
I agree that traditional religious beliefs & theology usually show much less epistemic humility than EAs who believe in the Simulation Hypothesis. I was just pointing out that there are some similarities in the underlying metaphysics. And, more intellectually advanced forms of these religions (e.g. more recent Protestant theology, Zen Buddhism) do show a fairly high degree of epistemic humility in not pretending to know a lot of details about what’s behind the Simulation.
Your second point raises a crucial ethical challenge for EA.
When we say that we want AI that’s ‘aligned with human values’, do we really mean aligned with individual people’s current values as they are (perhaps including fundamentalist religious values, hard-core ethnonationalist values, runaway consumerist values, or sociopathic values)?
Or do we mean we want AI to support people’s idealized values as we might want them to be?
If the latter, we’re not really seeking ‘AI alignment’. We’re talking about using AI systems as mass ‘moral enhancement’ technologies. AKA ‘moral conformity’ technologies, aka ‘political indoctrination’ technologies. That raises a whole other set of questions about power, do-gooding, elitism, and hubris.
So, we better be honest with ourselves about which type of ‘alignment’ we’re really aiming for.
I would draw a distinction between what I call “metaphilosophical paternalism” and “political indoctrination”, the difference being whether we’re “encouraging” what we think are good reasoning methods and good meta-level preferences (e.g., preferences about how to reason, how to form beliefs, how to interact with people with different beliefs/values), or whether we’re “encouraging” object-level preferences for example about income redistribution.
My precondition for doing this though, is that we first solve metaphilosophy, in other words have a thorough understanding of what “good reasoning” (including philosophical and moral reasoning) actually consists of, or a thorough understanding of what good meta-level preferences consist of. I would be the first to admit that we seriously lack this right now. It seems a very long shot to develop such an understanding before AGI, but I have trouble seeing how to ensure a good long term outcome for future human-AI civilization unless we succeed in doing something like this.
I think in practice what we’re likely to get is “political indoctrination” (given huge institutional pressure/incentive to do that), which I’m very worried about but am not sure how to prevent, aside from solving metaphilosophy and talking people into doing metaphilosophical paternalism instead.
I have had discussions with some alignment researchers (mainly Paul Christiano) about my concerns on this topic, and the impression I get is that they’re mainly focused on “aligned with individual people’s current values as they are” and they’re not hugely concerned about this leading to bad outcomes like people locking in their current beliefs/values. I think Paul said something like he doesn’t think many people would actually want their AI to do that, and others are mostly just ignoring the issue? They also don’t seem hugely concerned that their work will be (mis)used for “political indoctrination” (regardless of what they personally prefer).
So from my perspective, the problem is not so much alignment researchers “not being honest with themselves” about what kind of alignment we’re aiming for, but rather a confusing (to me) nonchalance about potential negative outcomes of AIs aligned with religious or ideological values.
ETA: What’s your own view on this? How do you see things working out in the long run if we do build AIs aligned to people’s current values, which include religious values for many of them? Based on this, are you worried or not worried?
Hi Wei_Dai—great comments and insights.
It would be lovely if we could gently nudge people, through unbiased ‘metaphilosophical paternalism’, to adopt better meta-preferences about how to reason, debate, and update their values. What a wonderful world that would be. Turning everyone into EAs, in our own image.
However, I agree with you that in practice, AI systems are likely end up (1) ‘aligning’ on people’s values as they actually are—i.e. mostly religious, politically partisan, nepotistic, anthropocentric, hypocritical, fiercely tribal, etc. , or (2) embodying some set of values approved by certain powerful elites, that differ from what ordinary folks currently believe, but that are promoted ‘for their own good’—which would basically be the most powerful system of indoctrination and propaganda ever developed.
The recent concern about AI researchers about how to ‘reduce misinformation on social media’ through politically selective censorship suggest that option (2) will be very tempting to AI developers seeking to ‘do good’ in the world.
And of course, even if we could figure out how AI systems could do metaphilosophical paternalism, religious people have a very different idea of what that should look like—e.g. they might promote faith over reason, tradition over open-mindedness, revelation over empiricism, sectarianism over universalism, afterlife longtermism over futuristic longtermism, etc.
I think almost all alignment research is concerned with issues like what’s going on inside the AI or how can we get the AI to tell us what it thinks or how can we get the AI to do what we tell it to do rather than something that depends much on what it is that humans actually value. Alignment-research-taking-religion-into-account would look identical to current alignment research.
I don’t think it’s right that the broad project of alignment would look the same with and without considering religion. I’m curious what your reasoning is here and if I’m mistaken.
One way of reading this comment is that it’s a semantic disagreement about what alignment means. The OP seems to be talking about the problem of getting an AI to do the right thing, writ large, which may encompass a broader set of topics than alignment research as you define it.
Two other ways of reading it are that (a) solving the problem the OP is addressing (getting an AI to do the right thing, writ large) does not depend on values, or (b) solving the alignment problem will necessarily solve the value problem. I don’t entirely see how you can justify (a) without a claim like (b), though I’m curious if there’s a way.
You might justify (b) via the argument that solving alignment involves coming up a way to extrapolate values. Perhaps it is irrelevant which particular person you start with, because the extrapolation process will end up at the same point. To me this seems quite dubious. We have no such method and observe deep disagreement in the world. Which methods we use to resolve disagreement and determine whose values we include seem to involve a question of values. And from my lay sense, the methods of alignment that are currently most-discussed involve aligning it with specific preferences.
Kind of.
Alignment researchers want AI to do the right thing. How they try to do that is mostly not sensitive to what humans want; different researchers do different stuff but it’s generally more like interpretability or robustness than teaching specific values to AI systems. So even if religion was more popular/appreciated/whatever, they’d still be doing stuff like interpretability, and still be doing it in the same way.
(a) and (b) are clearly false, but many believe that most of the making-AI-go-well problem is getting from AI killing everyone to AI not killing everyone and that going from AI not killing everyone to AI doing stuff everyone thinks is great is relatively easy. And value-loading approaches like CEV should be literally optimal regardless of religiosity.
Few alignment researchers are excited about Stuart Russell’s research, I think (at least in the bay area, where the alignment researchers I know are). I agree that if his style of research was more popular, thinking about values and metavalues and such would be more relevant.
Zach—I may be an AI alignment newbie, but I don’t understand how ‘alignment’ could be ‘mostly not sensitive to what humans want’. I thought alignment with what humans want was the whole point of alignment. But now you’re making it sound like ‘AI alignment’ means ’alignment with what Bay Area AI researchers think should be everyone’s secular priorities.
Even CEV seems to depend on an assumption that there is a high degree of common ground among all humans regarding core existential values—Yudkowsky explicitly says that CEV could only works ‘to whatever extent most existing humans, thus extrapolated, would predictably want* the same things’. If some humans are antinatalists, or Earth First eco-activisits, or religious fundamentalists yearning for the Rapture, or bitter nihilists, who want us to go extinct, then CEV won’t work to prevent AI from killing everyone. CEV and most ‘alignment’ methods only seem to work if they sweep the true religious, political, and ideological diversity of humans under the rug.
I also see no a priori reason why getting from (1) AI killing every one to AI not killing everyone would be easier than getting from (2) AI not killing eveyone to AI doing stuff everyone thinks is great. The first issue (1) seems to require explicitly prioritizing some human corporeal/body interests over the brain’s stated preferences, as I discussed here .
zdgroff—that link re. specific preferences to the 80k Hours interview with Stuart Russell is a fascinating example of what I’m concerned about. Russell seems to be arguing that either we align an AI system with one person’s individual stated preferences at a time, or we’d have to discover the ultimate moral truth of the universe, and get the AI aligned to that.
But where’s the middle ground of trying to align with multiple people who have diverse values? That’s where most of the near-term X risk lurks, IMHO—i.e. in runaway geopolitical or religious wars, or other human conflicts, amplified by AI capabilities. Even if we’re talking fairly narrow AI rather than AGI.
Zach—thanks for this comment; I’m working on a reply to it, which I’ll published as an EA Forum post within a couple of days.
A preview: I think there are good theoretical and empirical reasons why alignment research taking the full heterogeneity of human value types into account (including differences between religious values, political values, food preferences, economic ambitions, mate preferences, cultural taboos, aesthetic tastes, etc) would NOT look identical to current alignment research.
Zach—update: I’ve written a new post today that tries to address your point: https://forum.effectivealtruism.org/posts/KZiaBCWWW3FtZXGBi/the-heterogeneity-of-human-value-types-implications-for-ai
You didn’t mention the Long Reflection, which is another point of contact between EA and religion. The Long Reflection is about figuring out what values are actually right, and I think it would be odd to not do deep study of all the cultures available to us to inform that, including religious ones. Presumably, EA is all about acting on the best values (when it does good, it does what is really good), so maybe it needs input from the Long Reflection to make big decisions.
James—I agree. Human values as they currently are—in all their messy, hypocritical, virtue-signaling, partisan, sectarian glory -- might NOT be what we want to upload into powerful AI systems. A Long Reflection might be advisable.
Great article. I’m a devout Christian who believes rewards in the afterlife are based on morality not religion, and I feel the article missed something important about Christianity. According to this 2021 Pew poll, only 44% of American Christians believe that people who don’t believe in God cannot go to heaven.
https://www.pewresearch.org/religion/2021/11/23/views-on-the-afterlife/
I also want to mention that for me and many other devout Christians, the afterlife is relatively unimportant. What matters most is trying to glorify God on earth. That essentially means living with the values EAs aspire to: expanding our moral circle, overcoming cognitive biases to be more productive, and making sacrifices to help people. We know very little about the afterlife, but we know a lot about how God called us to live. We should want to glorify God because we love him, not because we expect a reward. I don’t have statistics on how common this perspective is, but according to that Pew poll 8% of American Christians don’t believe in heaven at all.
Sonia—thanks for your helpful perspective as a Christian, and the link to the Pew poll (which is fascinating, and I recommend others have a look at it.)
It’s helpful to be reminded that there’s a big variety of beliefs within each religion about the relative importance of this life versus an afterlife, and the relative importance of specific religious commandments and practices versus more general moral principles.
In thinking about these issues, I think it’s important to take a very evidence-based approach to understanding the current distributions of religious beliefs and values, including differences between EAs and non-EAs, and the often big differences across countries, cultures, ages, sexes, social classes, education levels, etc.
Possibly relevant: The Pope’s AI adviser on ensuring algorithms respect human dignity.
Thanks for your insights! I have recently worked on this question as part of my master’s thesis. It’s not a perfect paper, but I hope it will get more people interested in this research area: https://uu.diva-portal.org/smash/record.jsf?pid=diva2%3A1769193&dswid=3870 If you could find some time to glance over it, I would be happy to hear your feedback.
There’s also EA for Jews. Here’s the facebook group :)
Thought provoking essay, thanks Geoffrey!
Oh cool, thanks for the link to the EA for Jews facebook group. Sorry I missed it!
Thank you for writing this. While recognizing the important role religion plays in society, I feel that even though you take your preferences seriously, you did not consider the religious world-view and the consequences of it.
What if, in fact, there is a God? What if a religion is correct? What if there is meaning to the universe? Unless you ask those questions, in my opinion, you are just using religion as a weathervane to determine human values, not actually addressing the religious experience. You are not explaining why religion is so prominent or why it is so profoundly different than a materialist world view.
To prompt some thinking on this:
Why, in fact, are people religious? What if an AGI began to believe in God, or had a transcendental experience of its own that informed its actions? Would you then call it misaligned? How do you think being religious would affect you?
Strong upvote. Great post.
PS as a preamble to this post on religious values, I’d recommend reading this newer post first: https://forum.effectivealtruism.org/posts/KZiaBCWWW3FtZXGBi/the-heterogeneity-of-human-value-types-implications-for-ai
Your post seems fairly well thought out, and is interesting to me as a person who isn’t that interested/invested in AI alignment/development. I have a a few thoughts...
your population estimates talk about growth, but do they take into account the deaths of older generations that might be considered “more religious”?
I am fairly certain that people who are Hindu do not consider Hinduism to be “the one true religion” (or at least the basic teachings to do not say it is), and I know this is the case for other, smaller religions such as Sikhism.
“Christian and Muslims want to go to heaven”—I cannot agree entirely with this statement. My understanding is that if a person accepts Christ as their saviour (and therefore becomes a Christian), they will be rewarded with heaven when they die. They cannot “earn” their way into heaven. However, Christians should strive towards making Earth as close to heaven as possible while they are alive. (I cannot speak to the Islamic beliefs on heaven.) Doesn’t this align with attempts to make the world better, more fair, healthier, etc?
Danielle—good points.
For net increase or decrease in religiosity in the next decades, you’re right that we’d want a more precise demographic model of births, deaths, rates of vertical vs. horizontal cultural transmission for specific religions, etc.
re. Hinduism, I resonate with your sense that lots of Hindus are less inclined to think they’re in the ‘one true religion’ than people in other religions. But I have low confidence in that—I’ve only spent 2 weeks in India, have interacted mostly with highly educated Indians, and don’t know much about Hindu vs. Muslim conflicts over history, or what they reveal about degree of religious exclusivity.
The issue of ‘earning’ one’s way into heaven has been a source of much contention over the centuries, e.g. the Catholic emphasis on good works vs. the Protestant emphasis on faith. Certainly for religious people who emphasize moral behavior in this life, there might be minimal conflict between religious values and EA values. However, many religious people (perhaps especially outside the US/UK/Europe) might put a heavier emphasis on the afterlife (e.g. in cases of religious martyrdom.)
Thoughtful post, thank you for sharing. I am only doing an exploratory reading of material covering the intersection of EA and religion at this point, so I can’t speak very well the to alignment issue. I agree with those who suggest that religious ethicists would want the AI to explain its reasoning.
What immediately comes to mind in response to your problem statement is the number of different Christianities and Buddhisms (to name the familiar) that are out there, many with their own theology/doctrine—and within each, you also have consequential stratification by education level, and then division on the grounds of conservative and liberal interpretations. I don’t think a consensus set of religious ethics would look substantially different from secular ethics, and it may in fact have fewer constraints.
If you want a perspective on the value of AI safety from Christian theology (following the thought of Thomas Aquinas), you could read Stefan Riedener’s essay on EA, “Human Extinction from a Thomist Perspective,” pp187-210 in the book:
Roser, Dominic, Stefan Riedener, and Markus Huppenbauer. 2022. “Effective Altruism and Religion: Synergies, Tensions, Dialogue.” First, Pano-Verlag.
https://smile.amazon.com/Effective-Altruism-Religion-Synergies-Tensions/dp/3290220672/ref=monarch_sidesheet
I’ve wondered if it’s easier to align AI to something simple rather than complex (or if it’s more like “aligning things at all is really hard, but adding complexity is relatively easy once you get there”). If simplicity is more practical, then training an AI to do something libertarian might be simpler than to pursue any other value. The AI could protect “agency” (one version of that being “ability of each human to move their bodies as they wish, and the ability to secure their own decision-making ability”). Or, it might turn out to be easier to program AI to listen to humans, so that AI end up under the rule of human political and economic structures, or some other way to aggregate human decision-making. Under either a libertarian or human-obeying AI programming, humans can pursue their religions mostly as they always have.
The implicit bets on religiosity in AI safety aren’t that it’ll be unpopular, only that it won’t influence decision making / powerful actors. If/when AGI arises, it’ll initially be controlled by govts/companies/citizens of rich countries. Of all the belief systems, AI safety people worry most about confucianism because it has the most real influence on decision making (ccp).
Interesting. I guess I’m a bit confused then about whether ‘AI alignment’ is really intended to align with what all 8 billion living humans actually value and belief, right now (including quite strong overall support for CCP values among most of the 1.4 billion people in China), or whether it’s intended to align with that powerful ‘govts/companies/citizens of rich countries’ value and believe.