The religion problem in AI alignment
AI alignment research has a religion problem – specifically, a big blind spot when it comes to modeling and aligning the human values associated with organized religions.
The blind spot arises because many Artificial Intelligence (AI) researchers and Effective Altruists (EAs) are atheists who don’t think much about religion. For example, in the 2019 EA Survey, 86% of Effective Altruists reported being atheist, agnostic, or non-religious. On Facebook, the Effective Altruism group has 21,800 members, whereas the three main religiously-inclined EA groups that I could find (Christians and Effective Altruism, Buddhists in Effective Altruism, Spirituality and Effective Altruism) only have 1,700 members in total. And, out of thousands of posts on EA Forum, less than a dozen include the words ‘religion’ or ‘religious’.
Thus, when we think about aligning AI and human values, we tend to ignore religion entirely, or we discount it as irrelevant, regressive, irrational, retarded, and/or ridiculous. When we envision glorious futures, with our transhumanist descendants or self-replicating superintelligent drones pursuing galactic colonization, we typically don’t think of those futures as including religion at all. We imagine, at most, that future people might preserve some religious traditions as quaint cultural vestiges, or cautionary tales from a less rational era. We don’t take seriously the idea that helping more people enjoying infinite bliss in heaven after they die should be a major cause area. Conversely, when we envision existential risks and global catastrophes, we don’t typically think about people burning in hell forever, or being reincarnated as an insect in a world of wild animal suffering, or being stuck in Saṃsāra, the endless cycle of suffering. Our concepts of collective human success or failure are highly secularized. Within that secular context, a highly aligned Superintelligence seems like the best redemptive savior we could realistically hope for, and perhaps the only path towards maximizing total future sentient well-being.
However, most current humans are religious. Religion has been important throughout human history and probably prehistory. There are pretty good evolutionary psychology theories about the origins and adaptive functions of religions in human groups, as group coordination mechanisms, costly commitment signals of tribal loyalty, sources of mating and reproductive norms, and repositories of miscellaneous life advice, such as taboos against eating foods with high parasite risk, or against reproducing with one’s first cousins.
So, AI alignment research is focused on promoting alignment between the AI systems that we design and train, and our human goals, values, preferences, and norms. And we know that for most humans, their religions are closely associated with their goals, values, preferences, and norms. Thus, if we take the standard definition of AI alignment research at face value – as achieving alignment with human values – a lot of it must concern alignment with religious values.
The continuing popularity of religions
If you don’t think this is a serious issue, let’s consider the popularity of religions today.
Many AI researchers and Effective Altruists live in relatively secular countries such as the US, UK, Australia, Canada, and Germany, where religion has relatively limited and declining influence over politics, academia, media, and public discourse. (For example, about 69% of the respondents to the 2020 EA Survey lived in these 5 countries, and very few EAs came from countries with high religiosity.) Also, whereas a high proportion of charitable donations globally go to religious charities, EAs typically donate very little money to religious charities, which they usually regard as theoretically incoherent and empirically unsupported.
However, out of the 8,000 million living humans, surveys indicate that about 2,380 million are Christian, 1,810 million are Muslim, 1,160 million are Hindu, and 507 million are Buddhist. Together, these ‘big four’ religions include about 5,857 million humans, or 73% of our species. Another 500-1,000 million people (it’s hard to count) follow various ‘folk religions’ (including Shinto, Taoism, African tribal religions, etc.) and/or are influenced by various quasi-religious belief systems such as Confucianism. People who identify as ‘unaffiliated’ with any religion (including atheists and agnostics) seem to number no more than 1,200 million, or about 15% of humanity.
Another way to look at human religion is by country. Of the 14 major countries with populations above 100 million, only 2 (China, population 1,430 million and Japan, population 124 million) have a majority that are not actively involved in organized religion. Consider the other 12 countries in descending order of population, with just the big four religions considered:
India: 1,420 million people, 80% Hindu, 14% Muslim
USA: 338 million people, 65% Christian
Indonesia: 276 million people, 87% Muslim
Pakistan: 236 million people, 96% Muslim
Nigeria: 219 million people, 46% Christian, 46% Muslim
Brazil: 215 million people, 81% Christian
Bangladesh: 171 million people, 91% Muslim
Russia: 144 million people, 47% Christian, 7% Muslim
Mexico: 127 million people, 90% Christian
Ethiopia: 123 million people, 67% Christian, 31% Muslim
Philippines: 116 million people, 88% Christian
Egypt: 111 million people, 90% Muslim, 10% Christian
Together, these 14 most populous countries include 5,050 million people, or 63% of all living humans. Note that in 8 of the 14 countries, more than 80% of people belong to just one of the four major religions.
Overall, over 80% of living humans are religious to some degree – and often to a very considerable degree. Religious people often believe that their most important goals, values, preferences, and norms are dictated by God(s), revealed by prophets, derived from their religion, and are crucial to their long-termist well-being after death (whether in a Christian or Muslim afterlife, a Hindu reincarnation, or a Buddhist liberation from reincarnation).
Can we achieve AI alignment with humans and human values when over 80% of humans are religious, and when their most important values are closely associated with their religion? I don’t know. Let’s think about it a bit more.
Will religion wither away before AI alignment becomes a big issue?
Elite intellectuals have been predicting that religion with wither away ever since the 16th century Scientific Revolution, the 17th century Enlightenment, the 18th century Industrial Revolution, 19th century socialism, and 20th century consumerism. Yet here we still are, 80% religious.
Many AI researchers and Effective Altruists live in countries such as the US, UK, and Australia where religiosity is quickly declining, and where younger cohorts show rapidly increasing atheism and agnosticism, with political activism on social media replacing organized religion as the key domain of young adult virtue-signaling. For example, in the US in 2017, only 12% of people over age 65 were religiously unaffiliated, whereas 38% of people aged 18-29 were religiously unaffiliated – a huge generational increase in atheism. This might give the impression that religion is in a general global decline as a human instinct and institution.
However, this decrease in religiosity is happening mostly in rich countries with low and declining birthrates. In countries with high birthrates, religiosity remains high. Given that religious people are having more kids, and religious countries are having more kids, and religiosity is both genetically heritable and culturally transmitted, religiosity may hold steady or even increase at the global population level. For example, Nigeria is about 92% religious (46% Christian, 46% Muslim), and its population is expected to increase from 219 million in 2022 to 329 million in 2040 and 401 million in 2050. If Nigerian religiosity remains steady at 92%, that’s another 167 million religious people within the next 30 years, just in one country.
On a longer timescale, by 2100, 8 of the 10 most countries that are projected to have the highest populations are ones that currently have very high religiosity, including India (1,450 million people expected by 2100), Nigeria (733 million), Pakistan (403 million), D.R. Congo (362 million), Indonesia (321 million), Ethiopia (294 million), Tanzania (286 million), and Egypt (225 million). Together, these countries will account for at least an extra 1,500 million people (compared to their current populations), and if they remain at least 80% religious, they’ll add an extra 1,200 million religious people in the world by 2100. For religion to suffer a net decrease in popularity, the other countries would have to add an extra 1,200 million atheists – which seems unlikely, given their declining populations.
Effective Altruists often seem to assume that religion will decline in poorer, higher-fertility countries as they enjoy better nutrition, health care (anti-malaria bednets, deworming, etc), nootropics, and embryo selection methods to increase average intelligence and openness. In other words, the expectation is that cognitive and moral enhancement will gradually turn more people into atheists – even if religious people are currently out-breeding atheists. Maybe that will happen, over a multi-decade or multi-generational time scale. But these interventions seems unlikely to have a major effect on religiosity before AI alignment becomes a major issue, given current projections by AI researchers about the likely AGI timelines. In summary, most humans will probably still be religious if AGI is developed any time in the next century.
Common ground between Effective Altruists and religious people
The likely persistence of human religion may seem alarming or depressing to EA atheists and AI safety researchers. What do we rationalists have in common with religious people? How could they even participate constructively in a dialogue about AI alignment?
I see at least three major kinds of common ground: moral circle expansion, longtermism, and the Simulation Hypothesis.
Moral circle expansion. Effective Altruists ever since Peter Singer have sought to expand our moral circle, pushing human instincts for altruism beyond their ancestral limits (self, family, and tribe) to include more humans and other animal species. Most major religions have also preached some form of moral circle expansion, nudging their believers to be nicer not just to people in the same family and tribe, but to everyone who shares the same faith. Often, religions even preach tolerance and altruism towards non-believers, and some degree of concern for other animals (e.g. Hindu and Buddhist veganism). Of course, religious altruism is much more based on deontology and virtue-signaling than on consequentialism, and tends to be far less evidence-based than EA charity evaluation. But religions on average tend to preach altruism over selfishness, and a more inclusive version of tribalism over a less inclusive version.
Longtermism. Many Effective Altruists have shifted from emphasizing shorter-term goals such as global public health and poverty reduction, to longer-term goals such as reducing existential risks and promoting sentient flourishing in future millennia. Longtermism is the new enthusiasm. But all the major religions have been longtermist for at least a millennium. They just have a different model of the world, where the ‘long term’ typically includes an extremely long afterlife. For example, Christianity and Islam teach that devout believers will enjoy an eternal heaven in the afterlife, and non-believers will suffer eternal torment (or at least a lamentable alienation from God) in hell. Hinduism teaches that people reincarnate in higher or lower sentient forms according to the karma accumulated in each life; this happens thousands and thousands of times, at a time scale that can be measured in kalpas (units of 4.32 billion years). Buddhism also emphasizes reincarnation, teaching that it may take many cycles of birth and death (saṃsāra) before a sentient being escapes duḥkha (suffering) and achieves nirvana. Religions also generally nudge people to avoid short-term temptations, bad habits, impulsive aggression, and runaway consumerism, and to think about the longer-term consequences of their actions, both in this life and the afterlife.
Simulation Hypothesis. Many Effective Altruists believe in the Simulation Hypothesis, that we are living in some sort of simulated reality, such as highly advanced computer simulation, virtual reality, or Matrix, that was created by much more advanced forms of sentient life. All major religions agree with this. They just have a pre-computational understanding of how such a simulation works, and of what kind of entities are running it. What we call simulators or programmers, they would call Gods. For example, in Hinduism, māyā refers to the veil of illusion or the tempting magic show that humans perceive as their everyday lives. In Christian theology, our visible, temporal world is a relatively illusory and transient show created by a more substantial and eternal God. Most religions view our lived experiences as somewhat shallow, deceptive, and fleeting compared to an eternal, transcendent realm where immortals live. If you think the Simulation Hypothesis seems likely, but the traditional religions are idiotic, then either you don’t understand the Simulation Hypothesis in a sufficiently humble way, or you don’t understand traditional religions in a sufficiently humble way.
Of course, to skeptics outside Effective Altruism, our interests in moral circle expansion, longtermism, and the Simulation Hypothesis strike them as quite religious, cult-like, and faith-based. If outsiders think that EA has something pretty deep in common with major religions, maybe we can accept some of those commonalities ourselves, and hold religions is a little less contempt than we’re used to doing?
Distinctive problems that religion raises for AI alignment
If we take seriously the fact that (1) over 80% of humans are religious, and (2) many human values are closely associated with religions, and (3) AI alignment is supposed to be about aligning human values with AI systems, then where does that leave us? How should we be thinking differently about AI alignment?
I don’t know. This essay is a preliminary attempt to raise the issue. I look forward to your thoughts and feedback. I don’t have many concrete suggestions so far.
However, as a preliminary exercise, I can note a few distinctive problems that religion seems to raise for AI alignment. (There might be dozens of other problems that you can suggest.)
1. AI systems look intrinsically sacrilegious, outrageous, and hubristic to many religious people, and the more powerful the AI system gets, the more outrageous it might seem. All four main religions teach that humans have distinctive kinds of souls that survive our bodily death. Creating systems that show human-level intelligence, but that are soulless, faithless, and atheistic, may seem evil to many religious people. As AI systems grow more advanced, more capable, and more intrusive in our everyday lives, religious leaders will notice, and judge. And some will condemn. The Pope, Cardinals, and Catholic priests will have views on AI. Muslim Imams, Alims, Allamahs, Grand Muftis, Mujtahids, and Ayatollahs will have views on AI. Hindu and Buddhist priests, scholars, and gurus will have views on AI. Those views might be positive, but they might be very negative, or even aggressive.
Consider the traditional Muslim views on idolatry, which prohibit depictions, sculptures, or simulacra of sentient beings – especially humans. If Muslim leaders decide that creating AI systems or anthropomorphic robots is sacrilegious, the Butlerian Jihad might become an actual Jihad, and AI researchers might be condemned by serious fatwas.
2. Cultural conflicts between religions may create AI safety risks just as serious as geopolitical conflict between nation-states. Historically, religious conflicts have accounted for a high proportion of organized warfare. Even in the last century, a lot of conflict and tension between nation-states has some degree of religious conflict behind it (consider mostly-Hindu India versus mostly-Muslim Pakistan, mostly-Sunni Saudi Arabia versus mostly-Shia Iran, or the Cold War between mostly-Christian USA and mostly-atheist Soviet Union).
Every major religion proclaims itself the one true religion, whereas no nation-state proclaims itself the ‘one true nation-state’. Nation-states can play positive-sum games with each based on exchanging resources, products, ideas, traditions, tourists, and talent. Religions seem locked into a more zero-sum competitive situation. Nation-states might find enough common ground that they can avoid AI arms races that sacrifice safety for speed of progress. If religions realize that AI can play powerful roles in recruiting new members, enforcing doctrine, surveilling believers, identifying heretics and apostates, and undermining rival religions, we might face a religious AI arms race just as dangerous as a geopolitical AI arms race.
Nation-states have a lot of resources they can devote to AI development, but so do organized religions. And many nation-states are highly aligned with particular religions – e.g. the 13 countries that are officially Christian, and the 26 countries that are officially Muslim. (If you’ve played the Civilization VI computer game, consider that there’s a ‘Religious Victory’ condition in addition to the Domination, Diplomacy, Science, and Cultural victory conditions.)
3. Religious values might be harder to align with AI systems than other kinds of values. For example, when AI researchers talk about teaching AI systems to incorporate human values, the most common examples seem to be (1) super-high-stakes preferences for life over death (e.g. ‘Please don’t kill me or exterminate all of humanity just to make some paperclips or collect some stamps’), (2) low-stakes preferences for certain consumer goods and services (e.g. ‘Please deliver pineapple pizza in an hour’, or ‘Please drive me to the airport without breaking the speed limit’), or (3) low-stakes preferences to avoid inconveniences (e.g. ‘Please don’t knock over that vase while making tea’).
However, many religions teach that believers should pursue ascetic or transcendental values that don’t map very well onto life-or-death, consumer, of convenience preferences. For example, Buddhism teaches a doctrine of nonattachment, which is basically a meta-preference not to have strong preferences, and not to take one’s preferences too seriously. If an AI system asked a seriously devout Buddhist to teach it their preferences, the devout Buddhist might shrug, and say ‘Help me to escape from saṃsāra and the whole illusion that preferences matter. Oh and maybe remind me in an hour to meditate.’
For the billions of believers who take seriously the idea of the afterlife or reincarnation (i.e. every member of the four major religions who’s actually devout), the idea that an AI system could incorporate their true preferences and payoff functions is metaphysically impossible, because they’re no feedback or reward signal from beyond the grave. Christian and Muslims want to go to heaven, and if they’re extremely devout, that’s all that matters. There’s nothing that an AI system can do for them in this life that matters even a trillionth as much as getting an eternal reward in heaven. Getting into heaven is the reward signal that they would want the AI to learn, and to help them achieve – but how will the AI learn where their souls go after death? Where exactly do AI researchers get that training data?
Can AI engineers invent some clever new ‘collaborative inverse reinforcement learning algorithm’ or ‘coherent extrapolated transcendental volition algorithm’ (or whatever) to train an AI that can help a devout Christian or Muslim get into heaven? Can the algorithm train an AI system to help a devout Hindu accumulate enough karma to be reincarnated as a Brahmin, swami, or guru? Can it train an AI system to help a devout Buddhist achieve nirvana?
These are the human values that religious people would want the AI to align with. If we can’t develop AI systems that are aligned with these values, we haven’t solved the AI alignment problem.
We could keep ignoring religion, and push on with AI alignment work as if all humans are atheists with atheist values. But then we’d be ignoring more than 80% of humanity.
Caveats and notes:
I’m a Darwinian atheist with only a superficial understanding of religion; I just know some religious friends and family members, and think we should take religion seriously as a human phenomenon.
I’m no more than 70% confident about any argument I’m making here, and my facts, figures, definitions, and claims could probably be improved in many ways.
I’m studied human nature since the late 1980s, and I did lots of machine learning research with neural networks in the early 1990s, but I’ve only thought seriously about AI alignment for 5 years, and I’ve only been writing publicly about it for a few weeks, so my understanding of the AI alignment literature is limited.
Even if everything I say here is true (e.g. that AI alignment with the most important religious values might be impossible), I still think we should invest a huge amount of talent, time, thought, energy, and money into research on AI alignment with people’s secular values.
These ideas are fairly half-baked, and I welcome any constructive criticism, elaboration, and feedback.