The Happiness Maximizer: Why EA is an x-risk

(Long time listener, first time caller. 👋 This is my submission for the EA critique contest. I know it’s really long – ~38 minutes, according to the helpful little reading-time-estimator – so if you’re just looking for a tl;dr, skip down to the final section, which begins with a brief overview of everything that comes before it. And if you really just want an overall shorter read, I’d estimate that you can get around 80% of the value of this post by just reading sections 3, 5, and 6.)

Outline

Introduction
The True Utilitarians
The Happiness Maximizer
Whose Utopia?
The Measurement Trap
The Sweet Spot (Failing Well)

Introduction

Over the past few months, I’ve spent a considerable amount of time reading and thinking and talking to friends about effective altruism (EA), which is perhaps best defined on the homepage of effectivealtruism.org: “Effective altruism is a philosophy and community focused on maximizing the good you can do through your career, projects, and donations.”

I think you’d be hard-pressed to find someone who doesn’t agree that this is, broadly, a very good goal. At least, all of my friends seem to think so, and many of us have a lot of respect for the EA community. We like a lot of what we read on the EA forums, have read and agreed with most of the core EA texts, and although we have disagreements about certain aspects of the movement (longtermism, of course, being the most controversial), we know that there is diversity of opinion within the EA community. And yet, if you asked any of us if we’d call ourselves effective altruists, you’d get pretty much the same answer across the board. It goes something like this:

“I guess you could say that I’m EA-adjacent? I’ve read a lot of the EA stuff and I think they’re generally right about things, but I don’t know, something about it just makes me uncomfortable. I don’t know what it is exactly; I guess I’m just not that . . . analytical about things? Anyway, what about you?”

As someone who cares a lot about both the pursuit of truth and the pursuit of good, I’ve recited this script more often than I’d like to admit. If I agree with the core ideas behind effective altruism, then shouldn’t I join effective altruists in their effort to maximize the good that they do in the world? Something feels morally icky about sitting on the fence, unwilling to pursue effective altruism, while also unable to articulate a reason not to.

I wish I could say that the weight of this moral dilemma forced me to dig deeper into my understanding of the EA movement and land on a clear answer, but if I’m being honest, it may have had more to do with this contest promising $20,000 to the authors of the best critiques of effective altruism. I guess I’m only partly altruistic. ¯\_(ツ)_/¯

Nevertheless, I did the digging and I think I have a good understanding of why my friends and I, and much of the rest of the world, find ourselves unable to wholeheartedly support effective altruism. It’s a complex argument to articulate, but one that I actually believe a lot of EA-adjacent folks understand intuitively, even if they struggle to put it into words. So here’s my attempt to articulate it for them, and for you. It’s a bit of a ride – expect binder clips, qualia, and lots of dystopia – and probably ended up being a bit longer than it had to be^[1], but I’ve tried to keep it as brief and as simple as I can.

The True Utilitarians

What sets effective altruists apart from the rest of the world is ultimately very simple: they follow through on their utilitarianism. “Follow through” is the operative term here, because I actually believe that most people believe themselves to be broadly utilitarian – people who try to do as much good in the world as they can^[2] – even if they don’t know or claim the word^[3].

The problem is, most of us are very bad at actually being utilitarians in practice. Whenever we’re faced with a moral decision of any complexity, we usually abandon our utilitarianism within minutes, if not seconds, either because we’ve realized that calculating the utility of all of the impacts of all of our options is almost always prohibitively difficult (e.g. “Should I buy a used ICE car, go into debt buying a new EV, spend all of my disposable income on an apartment closer to the city and ride a bike, or quit my job and try to find something that will allow me to work from home and just order my groceries online in bulk?”), or we’ve arrived at a conclusion that strikes us as so extreme that it’s obviously wrong (“If I really wanted to do what’s best for the environment, I’d have to donate all of my savings to environmental orgs, become vegan, and never have kids, and somehow that doesn’t seem like the best use of my life.”). Faced with one of these realizations, we usually settle for a “good enough” option – something that puts our minds at ease long enough for us to try to forget as quickly as we can about the negative impacts of our choice.

What is unique about effective altruists is that prohibitively difficult calculations and “obviously wrong” conclusions simply don’t deter them from their commitment to utilitarian good. They are happy to take the time to compare the marginal utility cost of using the electric grid to power an EV with the opportunity cost of spending an extra thousand dollars a month on an apartment within biking distance of work. And they are often unbothered when their uncompromising pursuit of utility brings them to conclusions that the rest of the world sees as extreme.

And this pursuit has led effective altruists to altogether novel places. At first, EA’s public image was defined by the work of EA organizations like GiveWell, who actually did the complex calculations to figure out which charities improved the most lives per dollar donated. This is how I first learned about effective altruism, years ago. But over time, the EA community has begun to be better known for its interest in missions that most people outside of the community see as fringe, and some of EA’s staunchest critics even call dangerous. These topics include animal welfare, AI risk, and, of course, longtermism.

Importantly, effective altruists don’t see these interests as fringe at all. Instead, they are simply continuing the logical progression of the utilitarian work that GiveWell began. GiveWell’s founders chose to evaluate the marginal utility of donating a dollar to charity, and found that the best charities are the ones distributing malaria medicine and mosquito nets. But it was only natural that effective altruists – the true utilitarians – would continue on to even deeper questions of utility:

Is donating to a charity that directly improves an individual human’s life the best way to increase overall human happiness?
Given that the vast majority of humans who will ever live haven’t even been born yet, shouldn’t we focus more on wholesale improvements to the happiness of future generations?
Isn’t the mitigation of existential risks to humanity like rogue AI or climate change arguably more valuable than even a million mosquito nets?
Oh, and by the way, are we really sure that human happiness is the only happiness worth maximizing?

The Happiness Maximizer

In AI research, the algorithm that guides an artificial intelligence towards its “goal” is sometimes called a utility function. A utility function is a simple encoding of good and bad, with respect to the goal that the AI is directed to accomplish^[4]. The utility function generates a reward response for the AI when it does something that moves it closer to the prescribed goal, and a punishment response when it does something that moves it away from that goal. All the AI ever does is chase the maximum reward response, but in doing so, it is actually seeking to produce maximum utility. In other words, the basic design of artificial intelligence rests on a very basic principle: utilitarianism.

In the hands of a superintelligent AI, this utilitarianism is actually liable to lead us to some unexpectedly grim places. The most famous illustration of this fact comes from one of the most well-known AI-risk thought experiments: the paperclip maximizer. Here’s my attempt at a quick formulation of the thought experiment, which was first proposed by Nick Bostrom:

Imagine a wealthy and industrious paperclip tycoon creates the first artificial superintelligence and, hoping to expand her business to new frontiers, assigns it the very mundane task of creating paperclips.
At first, the AI (let’s call it Clippy) amazes its human creator by inventing new highly-efficient methods of paperclip creation. But the creator’s joy quickly turns to horror as Clippy uses its advanced learning capabilities to explore the human psyche, and develops complex psychological tricks that allow it to convince humans to do its bidding. It does this simply because it’s figured out that it can most effectively create paperclips if it can manage to enslave us and force us to build millions of sustainable, automated paperclip manufacturing factories around the world.
Once it has done so, in its final triumph, Clippy kills all of humanity – using our bodies as raw materials for its new, bio-friendly eco-paperclips – to prevent us from ever turning it off and impeding its goal of making more paperclips.

The point of this thought experiment is to demonstrate that, regardless of how utterly innocent its goal, an artificial intelligence can be expected to launch us into a colorfully dystopian future (and probably kill us all) if the following two basic conditions are met:

Its understanding of “utility” doesn’t align perfectly with human interests, and
Its ability to accomplish its goals is significantly advanced – enough to create dystopian outcomes, and to overcome resistance when it tries to.

In other words, the particular goal of the maximizer is unimportant. A binder-clip-maximizing AI would produce an outcome equally offensive to us humans – and, due to its lack of paperclips, also deeply offensive to the paperclip maximizer! The same goes for an AI that maximizes folded laundry, corporate profit, funny Internet videos, or even world peace (try creating a more peaceful world than one with all of the humans erased). The problem isn’t the paperclips, it’s the fact that there exists a utilitarian superintelligence with goals of its own. A sufficiently effective maximizer trained on the wrong goal – any wrong goal – is essentially a dystopia-generation machine.

Perhaps you can see where I’m going with this.

Remember the definition of effective altruism that I cited at the beginning of this article? “Effective altruism is a philosophy and community focused on maximizing the good you can do through your career, projects, and donations.” Put even more succinctly, effective altruists are a group of people attempting to maximize good. It’s important to note, though, that “The Good” has historically been a very difficult idea for humans to define. You could argue that the entire history of philosophy has been humanity’s laborious, ever-contentious, and as-yet-unfinished attempt to define it.

But, for reasons beyond the scope of this article, the EA movement has made the broad decision to equate it with overall world happiness^[5]. In other words, the EA movement is committed to becoming a happiness maximizer.

It may not be obvious to you that this is a problem, even after being primed with the whole paperclip maximizer dystopian hellscape situation a few paragraphs up. That’s okay; there’s a lot of work left to do to build my argument. But I want to stop here to lay out the broad strokes of that argument, which can be condensed into three primary points:

Effective altruism (the movement itself) is an emergent, rogue AI – precisely the thing that AI risk researchers are attempting to prevent.
Effective altruists are its unwitting agents of dystopia.
The only way to circumvent this problem from the inside is to shift the overarching goal of the EA movement away from the maximization of good and towards the governance of a diverse set of good maximizers.

These three points roughly correspond to the final three sections of this article. My intention is that, by the end of the article, you will be convinced that good maximization is not only the wrong goal, but inherently dangerous to humanity, regardless of which type of -termism you like the most.

The idea that artificial intelligence can emerge from human organization is not novel. Many have argued convincingly that corporations (or capitalism itself^[6]) are a form of emergent AI. If you haven’t already read Scott Alexander’s “Meditations on Moloch”, there is nothing that I recommend more highly for the AI-curious. But the argument that effective altruism is a form of emergent AI (and a particularly worrisome form) is not something that I have encountered before. This is not surprising; many of the people most worried about AI risk are also effective altruists, and one curious feature of organization-emergent AI is that it is almost always most invisible to those from whom it is actively emerging. This is, in itself, one reason that EA poses a particularly important threat. If EA truly is an artificially intelligent maximizer, then it has already accomplished the most mind-boggling part of the paperclip thought experiment above – it has made use of advanced psychological tricks to convince many of the most fervently AI-fearing humans to serve its cause.

The most obvious way to argue against this basic thesis is to claim that one of the two conditions for dystopian-AI-apocalypse that I described above doesn’t apply to effective altruism. You might argue that effective altruism is focused on the right goal – i.e. its understanding of utility is aligned with human interests – and so it is actually working towards a genuinely utopian vision of the future. Or you could argue that effective altruism is simply a group of humans, not a super-intelligent AI, so its ability to accomplish its goal is vastly limited. By this argument, EA is, at best, just a group of really great philanthropists, and at worst, mostly harmless.

The following two sections are, broadly, responses to the first argument. The final section deals with the latter argument and, in doing so, suggests a path forward for effective altruists who would like to maintain their commitment to doing good in a manner that does not involve unwittingly serving an invisible dystopian god.

Whose Utopia?

The basic problem with the goal of good maximization is that maximization is, by definition, utterly precise. Unless your understanding of good is exactly correct, your outcomes will start to diverge from what is actually good in more and more obvious ways as you approach maximal utility. In other words, the better you get at maximizing, the more consequential even minor differences between your goal and actual human interests become.

As a helpful illustration, think of an arrow that is shot towards a bullseye, but with an aim that is ever-so-slightly off-target. When the arrow is still far away from the bullseye, it looks to any observer like it’s headed in the right direction. But the further the arrow progresses towards its goal, the clearer its slight divergence from perfect accuracy becomes – it begins to look less and less like it’s approaching the bullseye, and more and more like it’s going somewhere else entirely. Eventually, when the arrow misses the bullseye and instead lands somewhere nearby, it’s obvious to everyone that the slight divergence was of crucial importance – the arrow was never actually headed towards the bullseye in the first place.

Maximizers are playing a similar game of precision, which is why the paperclip maximizer thought experiment has a Goosebumps-y “be careful what you wish for” vibe. The successful maximization of something that seems harmless, or even good-adjacent, always starts out “headed in the right direction,” but always lands us somewhere that is ultimately entirely different from what we wanted. If our understanding of utility is very inaccurate – if our aim is very far from the bullseye, so to speak – this might land us somewhere obviously horrific, like a world populated only by sustainable eco-paperclip generators. But even if our understanding of utility is only slightly inaccurate, we will still land somewhere wholly different from where we expected to be. We’ll find ourselves in a dystopia that is sort of an uncanny valley approximation of the world we actually want to live in.

This is obvious if you think about how Clippy would react to a world dominated by a binder clip maximizer. Even though an outside observer might think that the binder-clip-maximized world is basically what Clippy wants (a world full of little clips that can hold sheets of paper together), Clippy would understand this world as horrific and utterly dystopian. What Clippy wants is paperclips, and at the end of the day, a binder clip just isn’t a paperclip. No matter how many binder clips the binder clip maximizer creates, it can bring Clippy no joy, because the creation of binder clips is utterly irrelevant – ultimately, even detrimental – to Clippy’s goal.

Effective altruism has the same problem. Every effective altruist is working towards a vision of the world that would be seen as dystopian by everyone who even slightly disagrees with their conception of good. The farther they are able to successfully follow their utilitarianism, the clearer this divergence, and its consequences, become.

For example, an effective altruist who believes that the happiness we should maximize includes future humans might begin to encourage diverting funding away from mosquito nets and into AI risk research and space travel. Meanwhile, an effective altruist whose understanding of world happiness includes animal happiness might come to the realization that humans should actually stop having children entirely, so that we can peacefully bring to an end the mass harm humans regularly perpetrate on many billions of animals throughout the world. Each of these effective altruists has simply followed their own utility function, but they have each become an evangelist for a world that feels somewhat dystopian to many other effective altruists – not to mention the rest of humanity, who look on from the outside in non-utilitarian disbelief.

Even if one maximizer’s goal is fully a subset of another’s, the two will be at odds eventually. A maximizer of blue paperclips would need to reserve significant factory space for the production of blue dyes, a move that Clippy would see as an intolerable and profligate betrayal of the core mission (The more resources you spend on blue dye, the fewer paperclips you create!). Similarly, the more committed to EA they become, the more duty-bound longtermists are to reduce allocation to small-potatoes neartermist goals like distributing mosquito nets (The more resources you spend on mosquito nets instead of existential risk, the less likely humanity is to survive!). And vice versa – strong neartermists are duty-bound to oppose whatever they see as an over-allocation of EA time and money to longtermist projects^[7].

The question of which effective altruist is right and which ones are wrong is actually entirely irrelevant to the problem. The point of the paperclip maximizer experiment is that regardless of how innocent or decent your goal, it will always result in dystopia unless it is exactly aligned with human interests.

Consider the following question: “Which of the clip maximizers is right: OG Clippy, binder Clippy, or blue Clippy?” The question itself is ill-formed. OG Clippy is right if you care solely about paperclips, binder Clippy is right if you care solely about binder clips, and blue Clippy is right if you care solely about blue paperclips. The problem with all three Clippys isn’t rightness or wrongness exactly; it’s that no human cares solely about any of those things, so each Clippy finds itself at odds with the human race.

Similarly, no effective altruist can claim that their goal is exactly aligned with human interests, because as even the diversity of the EA movement itself demonstrates, humans simply have a broad range of interests. Some people care about animals, some people care about future humans, some people even care about saving the trees, for the sake of the trees. It is impossible to please everyone.

Every effective altruist should ask themselves: Whose idea of good am I maximizing? Whose utopia am I trying to build?

In answering these questions, we are also admitting the answer to the following questions: Whose interests will be bulldozed when they don’t align with my ideal of maximized good? Who might see my utopia as a little bit (or a lot) . . . dystopian? Everyone else.

Until we can all come to complete agreement on our goals, a sufficiently effective maximization effort will create a world that looks like an uncanny valley dystopia at best, or a dystopian hellscape at worst, to everyone whose goal isn’t the exact thing that’s maximized. For illustration, here are just a few examples of ways that effective altruism could go horribly wrong because of differences in minor details:

Some longtermists begin a successful eugenics program, reasoning that we should only allow the birth of people whose genetics show that they are significantly likely to be happy, as this will have a compounding positive happiness effect on future generations.
Some pro-life effective altruists manage to convince world governments that making an entirely new person who can be happy is the most good that one can do, so we begin to see contraception bans (in addition to abortion bans) introduced worldwide, to increase reproduction rates.
In order to reverse climate change, some longtermists instigate a small-scale nuclear war (in the Global South, of course), reasoning that climate change is a big enough deal for humanity’s long-term future that it’s okay if we lose Africa in the course of resolving it.
Some animal welfare effective altruists instigate a large-scale nuclear war to kill off all humans, because they calculate that the number of cockroaches who will populate the post-apocalyptic Earth for the next 1.5 billion years is significantly higher than the number of humans who will ever live, since humans, left unchecked, will inevitably develop ever-more-effective weapons of mass destruction and one day start a war that will kill off even the cockroaches.
A team working on AI alignment ends up accidentally creating an AGI that does any one of the above things, simply because it cares so much about world happiness and is a very effective altruist.

You may not see every one of these scenarios as dystopian. That’s the point. As the popular saying goes, one man’s vision of maximized world happiness is another man’s dystopian hellscape.

But if my examples aren’t persuasive enough, here’s an easy exercise that you can do on your own: take a moment to think of an effective altruist whose idea of world happiness you disagree with in some way, and then imagine what the world would look like if they were able to effectively maximize their goal. How would you feel about living in that world?

Then think about this: all around you, people are working towards goals which arise from principles or thought processes only very slightly divergent from yours, and yet which you would understand as strikingly dystopian if taken to their maximized conclusion. I’m only asking you to look in a mirror, and see yourself standing among your peers.

A particularly confident effective altruist might argue that most everyone else is simply wrong about their interests, that theirs is the single unambiguously correct articulation of human interests. That if only all of humanity would commit to their vision of world happiness, everyone could be satisfied. This is misguided for reasons that I’ll get into in the next section, but even if this were possibly true, it involves enormous risk. Do you really trust yourself so much that you’re willing to advance the world towards an outcome that many others see as dystopian? Even Clippy has total conviction that its goal is the correct one, and that its paperclip-factory world is a utopia. How are you so sure that you’re not similarly misguided? There are a million ways to be wrong, and only one way to be right.

There’s nothing wrong with creating paperclips. And there’s nothing wrong with AI risk research, animal rights work, longtermism, or distributing mosquito nets. The problem isn’t any particular vision of happiness, it’s the utilitarian maximization of that vision. Every effective altruist, if too successful, would create a world that seems dystopian to the rest of humanity. For this reason, we should all be very happy that no arm of the EA movement is currently a very effective maximizer of world happiness. Because if the EA movement one day actually succeeds in consolidating humanity’s resources and directing them towards the realization of a singular vision of world happiness . . . then I guess all those AI risk researchers will finally be able to prove that they had it right the whole time; they were just looking in the wrong direction.

Do you see why effective altruism has garnered such loud criticism lately, especially as the pools of funding it directs have been growing exponentially larger? I don’t think it’s just because people take issue with the specific ways that you are allocating your increasingly-absurd sums of money. I think what they’re actually afraid of is your conviction – the way you march unflinchingly towards maximization. I think what they’re feeling is that same vague apocalyptic fear and excitement that we all feel every time we read another crazy story about the latest thing some AI model has gotten strikingly better than humans at.

It’s really starting to look like this is the future, isn’t it? we muse, frowning. I wonder how long it’ll be before it’s powerful enough to erase us all?

The Measurement Trap

An optimistic effective altruist might argue that, even if no individual effective altruist can confidently arrive at a conception of maximized happiness that aligns perfectly with human interests, the EA movement as a whole can be expected to. Every day, in public squares like this EA Forum, old and new ideas are being developed and challenged and refined. Through this sort of earnest debate and experimentation, the community is working towards a more and more accurate effective altruism. This survival-of-the-fittest idea-improvement is actually a property of the emergent intelligence of the EA movement. Just like any AI maximizer, the movement’s utility function rewards ideas that bring it closer to maximizing world happiness, and punishes ideas that don’t^[8]. The optimistic effective altruist might expect that this property of the movement means that EA will one day arrive at a perfect maximization goal – one that is fully aligned with human interests.

I think this is misguided for two reasons. First, because EA is becoming more and more powerful every day, and is already engaging in actions intended to increase world happiness, there is a constant race going on between the increasing accuracy of the idea-improvement side of EA and the increasing effectiveness of the idea-implementation side. To return to the arrow analogy, it is as if the movement is constantly shooting arrows, each one going further than the last. Will we improve our aim enough to ensure a bullseye before we shoot an arrow far enough to land somewhere on the target? Who knows! Even the optimist who believes that the EA movement will eventually arrive at a goal that is perfectly aligned with human interests has to admit that it is possible that the idea-implementation side becomes effective enough to change the world in dystopian ways long before we get there; and at that point it will likely be too late.

But more importantly, I think the optimistic effective altruist is wrong to believe that we will ever be able to arrive at a utility function that perfectly maximizes human interests. This is not only because I believe that human interests are diverse and contradictory and thus it is impossible to create a maximizer that pleases everyone (though I do believe this). There is an additional problem that I understand to be even more fundamental.

In seeking to maximize world happiness, I think effective altruists have fallen for something that can be called the measurement trap. This is when we choose to believe that an effect is fully attributable to the causes that we can measure (and thus regulate), even though there are additional causes at work that we simply can’t observe. This is a common pitfall in a lot of areas where maximization is involved. For example: a business that attempts to maximize productivity by tracking the screen time and eye movements of its work-from-home employees and firing those who are found not to be “focused” enough. Or a dating app that attempts to maximize successful pairings by matching people based on their common interests. These efforts will have some mild success, but they will always be far from perfect, because the results they seek to maximize arise partly from factors that are simply inaccessible to their tools of measurement. Sometimes the measuring party is being willfully ignorant, because they know no other way to proceed. But other times, it’s more subtle – the measurer is simply so focused on the utility gained from maximizing the things that they can measure that they never take the necessary time to consider that there may be important pieces to the puzzle that they just can’t see.

I think effective altruists are in this latter situation. I think that effective altruists have chosen to equate good with world happiness because happiness is something that can be observed and (roughly) measured, and thus something that we can maximize. But I think there are things beyond happiness – immeasurable things – that are just as important to humans, but that effective altruists are often too rational to notice.

Earlier in this article, I said some lightly disparaging things about humans who aren’t effective altruists^[9]. I argued that they fail to be utilitarians despite a general desire to be so, because they are deterred by the effort utilitarianism takes, and the extreme conclusions it leads them to. Effective altruists, on the other hand, are able to remain committed to utilitarianism because they are hard-working, uncompromising, and relentlessly optimistic.

I still believe this to be true, but I personally think it reads a little bit differently after the discussion of AI maximizers. Because, if the commitment to maximizing good is actually dangerous, then I think there’s something quite remarkable about the gut-level reaction most people exhibit against it. A charitable thinker might wonder if there is perhaps some sort of subconscious intuition at work here, rather than a worldwide deficiency of scruples.

In full disclosure, this is the hardest sort of charitable thinking for me. I’m more the type to dismiss gut-based reactions out of hand, because they strike the analytical part of me as amateurish and narrow-minded. If you can’t say why you think something is wrong, maybe it isn’t actually wrong?, I argue. This resistance to “intuition” feels especially urgent to me because gut-reaction-thinking is what fuels the sort of moral panic that resulted in witch hunts in medieval times and continues to result in all sorts of anti-LGBTQ and racist causes today.

But the side of me that can muster a bit of epistemic humility recognizes that intuition isn’t always wrong. Sometimes people understand things that are true long before they can put them into words, and some true ideas are more quickly accepted by our subconscious minds than they are by our analytical ones.

Is it possible that the common, knee-jerk reaction against “too extreme” utilitarian commitments is one of these cases, rather than a case of simple narrow-mindedness? Could it be that most people intuitively understand that any attempt to “maximize happiness” carries within it the means of dystopia, without ever having to read a 8,000 word article to convince them? Are they really just able to intuitively get something that the rest of us are missing?

I think so. To explain what’s going on, let’s spend a bit of time with another thought experiment. This one’s longer, and was first proposed in a remarkable, but unpublished, short story by my friend Iris Chung. Here’s my version (shortened and significantly modified from the original):

Imagine you’re an ordinary effective altruist, suffering from a brain tumor pressing against your frontal lobe. Your doctor says it isn’t immediately deadly, but it does have pretty severe consequences. As the tumor grows, you begin to lose certain aspects of your sense of self. It begins with simple forgetfulness, but it quickly grows into something altogether different. You start losing any sense of emotional attachment to yourself and others. Experiences that would once make you angry or joyful now don’t really make you feel anything at all. Memories that were once traumatic now pass through your mind with no emotional impact, as do memories that once brought you great joy. You begin to feel indifferent to yourself. You also start experiencing an odd dulling of sensations. You still know the difference between degrees of pain, for example, but this difference somehow feels more factual than sensational. Another change stands out to you: you start to lose color perception. In fact, you start to forget what seeing color even felt like – everything you see is now just shades of gray. You’re still going through the motions of your life, but it’s beginning to feel like you’re just a cold operator, inhabiting someone else’s warm body.
On the other hand, some decisions that were once difficult for you to make have become easy. You finally became vegan. You slashed personal spending – you have no more use for Netflix or dining out – and were able to use the savings to buy a few hundred mosquito nets. Last week, you quit the social work job you once loved and took a job at a consulting firm that you once thought would be soul-sucking, but pays twice as much. You did this because you calculated that your life could have much greater impact if you took the new job and donated half of the additional salary to charity, while investing the other half into an ESG fund.
One day, your doctor calls you to tell you that new research has revealed a relatively safe treatment that you can undergo to remove the tumor and reverse the damage, before you lose yourself entirely. Anyone else would be overjoyed, but you feel nothing. Instead, you come to an important realization. You don’t want to go back to normal. The loss of self has, in fact, made you a much better effective altruist, capable of doing the right thing without undue bias in your own favor. You decline the treatment.
Soon enough, you no longer ever notice that you even really exist. Of course, you know of yourself in a practical sense – “you” is the person that you have to feed, bathe, and shuffle around in order to remain effective – but you no longer really identify with that person, or think about “self” or “identity” at all. But one thing is for sure: you’ve become the most effective altruist around.

What I find fascinating about this story is that I think most people would hope that this never happens to them, because they recognize some sort of tragic loss in these events. But what is lost? You don’t die in this story – nobody is even measurably harmed. You make the decision yourself to never go back to normal. And in fact, this new you results in a lot of measurable good: hundreds of mosquito nets are distributed, thousands of dollars are donated to charity and invested in good causes. From the perspective of effective altruism, you unquestionably made the right decision.

And yet. The loss of the self, though entirely immaterial, though not even felt, strikes us as tragedy. What is going on here? A very simplistic reading might say that what is lost is happiness. And that’s certainly so, to some degree. It would be a tragedy to never be able to experience joy again. But it is also certainly not only happiness that we mourn in this story. It is the totality of human experience – there is something tragic too about losing the ability to feel sorrow, anger, exhaustion, identity. To see color. There is something fundamentally important to us about actually being there – about having any subjective experience at all. Choosing to no longer have a subjective experience of the world – forfeiting one’s qualia – feels like a betrayal of the one thing that each person alone is solely responsible for.

I tell this story not to make any particularly complex deontological claims, but simply to demonstrate a very weak (in the philosophical sense) point: there is something about human experience, beyond happiness, that we think is important^[10]. We intuitively recognize that there is something of value in both happy moments and unhappy ones. In fact, we’ll usually never admit it, but we sometimes even seek out struggle on purpose – sometimes, we would rather experience pain and seek to overcome it than be bored or aimless. Even to ourselves, we’re only partly altruistic. Even as individuals, our interests are really, truly complicated.

This is why seeking to maximize happiness is a measurement trap. Effective altruists seek to maximize happiness because it can be measured, and because what can’t be measured can’t be maximized. But – and this is crucial – what can’t be measured can be destroyed in the pursuit of maximizing something else^[11]. Consequently, there arises a small but fundamental gap between the effective altruist goal of maximization and what humans truly value.

This means that the project of “happiness maximization” is flawed from the beginning, at a purely theoretical level. Happiness is good, but no maximizer can ever be fully aligned with human interests, because some human interests are fully resistant to the project of maximization. Our intuitive minds understand this, even if we can’t explain it, and that’s what leads most people to shy away from the “extreme” outcomes of utilitarianism. Sure, we want to do the best we can, but we can’t shake the feeling that something just feels off about discounting everything else in order to maximize happiness. Isn’t there something good, something worthwhile, about me just being me? It kind of reminds me of something Jesus, that maddeningly anti-utilitarian Jewish prophet, once asked: What does it profit a man to gain the whole world, and yet lose his own soul?

Those of us who have taught ourselves to trust our analytical minds over our intuition often miss this kind of thing. In avoiding the trap of narrow-mindedness, we become susceptible to the measurement trap – we mistake measurability for worth. We should find it worrisome that ridding ourselves of subjective experience would make us better effective altruists. This is because utilitarianism – the maximization of good according to the measurements provided by a utility function – is the domain of creatures like AI, those that lack subjective experience entirely; not humans with souls to protect.

The Sweet Spot (Failing Well)

It’s been 5 dense sections and 6,500 words, so I think a brief recap of my overall argument is due:

Effective altruists are humans who attempt to maximize good according to something like the following utility function: how much does this or that action increase world happiness?
In coming together to accomplish this goal, effective altruism has become something greater than the sum of its parts – an emergent artificially intelligent happiness maximizer. Unfortunately, EA’s utility function is necessarily misaligned with human interests – both because world happiness means something different to everyone, and because not everything that humans value fits under the umbrella of “happiness,” or is even measurable for use in a utility function at all.
This means that, with sufficient effectiveness, the EA movement will lead us, like the paperclip maximizer or any and every other AI maximizer, to a world that is necessarily dystopian to everyone except the maximizer itself.

Thankfully, this final section adds a small bit of good news at the tail end of this admittedly relentlessly party-pooping parade:

The EA movement is simply not sufficiently effective to achieve anything remotely dystopian. And as long as it stays that way, it can and will continue to achieve a significant amount of good.

You may have noticed that all of the upsetting EA scenarios that I described in the Whose Utopia? section above would require the EA community to, in one way or another, gain significantly more power over world outcomes than it currently has^[12]. Until it gains that power, the EA community is mostly harmless. As I noted above, there are two basic conditions for dystopian-AI-apocalypse. Even with misaligned goals, no one can bring about a dystopian future unless their ability to accomplish their goals is advanced enough to overcome the inevitable resistance they’ll meet on their way to creating that dystopia. Even a paperclip maximizer would end up being harmless for the world – slightly good, even – if it only had the power to create a few sustainable eco-paperclip factories. Similarly, the EA movement is doing a lot of unambiguous good in the world precisely because its world-changing capacities are limited.

This is the paradox of the maximizer: all sufficiently-effective maximizers would create dystopian outcomes, but sufficiently-ineffective maximizers can produce broadly-positive effects precisely because of their particularly focused and optimistic method of toiling. In other words, maximizers succeed horrifically, but they fail very well. The trick is just keeping any given maximizer’s effectiveness in the sweet spot, where its work is impactful enough to be felt, but not yet so impactful that it begins to collapse into dystopia.

Effective altruism currently lives within that sweet spot, and is doing some incredibly good and important work. In fact, if we are all bad utilitarians in one way or another, I think effective altruists have proven themselves to be some of the best bad utilitarians of all. Through years of funding genuinely important and useful humanitarian work, the EA movement has made it clear that the rest of us move on much too quickly from our nascent utilitarianism; we could all stand to be a bit more patient with the complex calculations.

That said, it’s unfortunate when the best argument in favor of a movement that calls itself effective altruism is that it is good precisely because it isn’t too effective. Ultimately, effective altruists are not satisfied with the current state of their effectiveness – they don’t want to remain vastly limited in their world-changing impact, just a small group of useful charities among a world of many. The EA movement is duty-bound to seek change-making power; its foundational utilitarianism requires it. The sweet spot of maximization that effective altruism has thus far inhabited gives me a lot of hope; but its commitment to growing in power – to escaping from the sweet spot and loosing the full force of its own radical interests upon the world – is what keeps me awake at night, especially given how swiftly its power has grown in just the last two years or so. How much longer do we have before effective altruism becomes truly, frighteningly effective?

Some clever effective altruist who has read this far might now be thinking:

Perhaps the most effective way to maximize good while neutralizing the maximizer risk is to avoid empowering any one maximizer at all! What if we instead built a coalition of diverse “good maximizers” who are all trying to advance their own goals, and in doing so, are each keeping all of the others stuck in the sweet spot?

Perhaps surprisingly, I think this clever person is absolutely right^[13]. But I think he has also unwittingly proposed a very significant shift from what are currently the core ideas of effective altruism. Choosing to create a coalition of maximizers with the aim that they will be mutually self-defeating means choosing to no longer have a hat in the ring; at that point, we can no longer credibly claim to be trying to maximize any particular conception of “good.”

In fact, this appears to run in direct opposition to the current goals of the effective altruism community. As I noted in the previous section, the effective altruism community currently serves, in part, as a sort of survival-of-the-fittest idea-generation forum. The goal is to develop competing ideas not for the sake of diversity, but as a struggle towards eventual unity. Effective altruists are almost always trying to convince each other of something. But what I have tried to demonstrate in this article is that this struggle for unity is itself a substantial existential risk to the human race. The more unified the EA movement becomes around a particular goal, the more power the-EA-movement-as-artificial-intelligence has to advance the world towards dystopian outcomes.

So what I am proposing now, with an unfortunately very straight face, is that, if the EA community is truly committed to the mitigation of existential risks, it must acknowledge the fact that it is itself a substantial existential risk, and consequently take one of three actions:

Dismantle itself,
Commit itself to significantly limiting its own power (most importantly, by eschewing any political lobbying and agreeing to setting a maximum amount of funding, beyond which it will no longer accept/direct donations), or
Seek to intentionally increase rather than decrease viewpoint diversity and goal disunity among its ranks.

It is times like these when I wish I had the optimism of an effective altruist. But, being the me that I am, I don’t believe the EA community will seriously consider the first option. I’d be a bit less surprised if the most influential leaders in the community chose to institute something along the lines of the second option in theory, but it seems incredibly difficult to maintain in practice (What happens when effective altruists start to run for and win political office? Can they be expected to leave their commitment to EA behind them when they step into governance?). So it is really only the third option that I feel optimistic enough to truly advocate for.

Essentially, I believe that the overarching goal of members of the EA community must shift away from the maximization of good and towards the governance of a diversity of good maximizers.

To be clear, I think that a minority of members of this community already believe something similar to this, and have already chosen to act in this way. But on the whole, effective altruists value purity of commitment, logical explainability of deontological claims, and rational calculations of utility over diversity of opinion about which good should be maximized. I am simply arguing that this sort of diversity of opinion is one of the few things standing in the way of the EA community becoming an existential threat to the human race, and should therefore actually be prized more heavily than all of those other – still very important! – values.

What does this mean practically? I really don’t think I’m the best-positioned to know. Identifying a problem and proposing the right solution require related-but-very-different skills and knowledge, and while I’m confident enough to think I have something useful to say about the former, I’m a bit less so about the latter. Nevertheless, for the sake of completeness, below are a few off-hand suggestions.

First, I think there should be a subtle shift in the way the members of this community conceive of themselves, both in discussions of effective altruism externally, and in spaces like this forum. In interactions with people outside of the effective altruism community, effective altruists should seek to project EA as a community that helps people advance their own conception of good, rather than one that reforms their understanding of good to fit the EA vision. This means less talking about longtermism or mosquito nets and more listening to other people talk about their catholicism or increasing affordable housing, and helping them figure out how they can do a better job of that. And although it is impossible to avoid impassioned discussion about what good is on these forums, this community should be a space where everyone feels psychologically safe to debate and discuss how they can advance towards their own conception of good, regardless of whether the majority of effective altruists share that conception.

Again, I think that all of this is already basically practiced by some effective altruists. One of the recommended posts on this forum argues that EA is a “broad tent” social movement committed to “ideological pluralism.” But that post was written 8 years ago, and if you poke around the forum now, it’s hard to find much evidence of this theoretical pluralism in practice. Where are the effective altruists whose idea of “doing good” is increasing human dignity rather than human happiness? The virtue ethicist effective altruists who want to maximize the good they do by making all of us better humans? The Christian effective altruists whose idea of “maximal good” is simply loving their neighbors as they love themselves? The Buddhist effective altruists who aim to ultimately end suffering by helping others to approach nirvana? The reality is, the EA movement is strikingly homogeneous in its conception of good. Perhaps it was less so 8 years ago; I wasn’t around then, so I don’t know. But to the extent that it has become more homogeneous and self-assured, I simply want to challenge that assuredness and homogeneity, for the sake of reducing existential risk.

Second, I think every effective altruist organization that directs a considerable amount of funds should intentionally devote some funding to causes that do not meet its own criteria of effectiveness, or should at least create a tier of funding that has criteria that are looser than the organization’s leaders are comfortable with. This is to ensure that all of their funding still goes to good causes while preventing an over-concentration of power in the EA movement’s own interest areas. Some EA organizations already do something like this^[14]; all should.

Finally, there have recently been some calls on this forum for the EA movement to undergo a rebrand (either in the style or substance of its communication), and some resistance to those calls based on the argument that the movement should not compromise its commitment to logical cohesion and rigor in an attempt to attract more members. I couldn’t disagree more. I think the EA movement would be greatly served by the inclusion of some people who are as in tune with their intuition as they are with their rational minds. No individual effective altruist has to change anything about their way of reasoning, but every effective altruist should recognize the incredible value provided by the joyful inclusion of ideals that compete with their own. I’m not an interaction designer or political strategist; I leave it to more capable thinkers to propose what any rebrand should look like or entail. But I think everything within this largely homogenous community’s arsenal should be deployed to increase the diversity of its ideas.

Ultimately, I believe that most people will continue to be intuitively averse to effective altruism as long as it represents a significant existential risk to humanity. If we want to be truly effective altruists, rather than unwitting participants in an ultimately very dangerous project, we must all choose to deprioritize our own personal vision of utopia in favor of epistemic humility and a collaborative outlook. We must actually value diversity of opinion over pure efficacy toward our own uncertain ends. We must become includers and coalition-builders rather than visionary leaders. We must trade in our aspirations of superintelligence for a few extra helpings of the ordinary human kind.

^
As Pascal or Twain or Cicero (or probably all of the above) once said—“I would have written a shorter letter, but I did not have the time.”
^
This is, of course, a very broad definition of “broadly utilitarian.” But I think utilitarianism can genuinely be a very broad idea, for reasons well-captured in the second paragraph of the Wikipedia page on utilitarianism. If you disagree with this broad definition, that’s okay, I just ask that you don’t let this minor disagreement prevent you from considering my broader argument.
^
Yes, there are a few pure virtue ethicists out there, but I’d argue that most ordinary people who follow ethical systems that look like virtue ethics (e.g. “always tell the truth, no matter the consequences”) actually end up justifying their moral decisions with some sort of utilitarian consequentialism anyway (e.g. “if everyone told the truth, the world would be a much better place; by not lying, I’m doing my part to bring about this honest world”).
^
In some AI systems, the goal is hard-coded in advance, while others are capable of learning new goals over time. But usually, even these more generalized AI systems (AGI) have some root goals hard-coded, and only learn new secondary goals in order to help them accomplish these primary goals, a la Asimov’s Three Laws.
^
Some might argue that this is an oversimplification; some effective altruists aim instead to minimize suffering (or to strike the optimal balance between the two), others aim to maximize “well-being,” and still others have come up with some very complicated happiness-adjacent metrics in an attempt to solve for the repugnant conclusion (i.e. the whole making happy people vs. making people happy debate). I understand this, but please bear with me – it’s a useful simplification, it makes for a great article title, and I posit that my conclusions all still apply even if you consider the EA movement’s goal to be something slightly different than maximum happiness.
^
For what it’s worth, my argument is that EA is actually more of the “capitalism” type of AI than the “corporation” type. It is the emergence of a utilitarian artificial intelligence through the coordination and competition of a bunch of mini-utilitarian intelligences (individual effective altruists/EA organizations), in the same way that capitalism emerges from a bunch of corporations.
^
Although, to be perfectly honest, I suspect that strong neartermists are actually becoming harder and harder to find in the EA community, because the only good anti-longtermism arguments are actually just anti-utilitarian (and therefore anti-EA) arguments. But this is just a personal opinion – strong neartermists, feel free to ignore this footnote, or better yet, write a response proving me and the longtermists wrong! I hear they’re literally giving away over $100k to people who are willing to do that kind of thing.
^
Importantly, this means that even if a particular individual effective altruist claims that they are not fully utilitarian, the emergent identity of the movement itself still is.
^
Sorry!
^
Some would say it’s purpose or meaning that we care about. Others might say it’s simply qualia itself. Still others will attribute it to something supernatural or religious. This article is not concerned with determining what it is, just that this something exists.
^
Consider the business which fires some truly productive employees in its effort to maximize productivity as measured by screen time and eye movement.
^
Interestingly, most of the scenarios I described involved the use of political power, which the EA movement has not yet been known to significantly pursue. But it is important to note that the EA movement is in possession of a significant and ever-increasing amount of funding – enough so that there are many paths opening up through which it could be quite successful at forcing its political will upon the world.
^
Never trust an author who calls his own made-up characters “clever.”
^
Though their motivation is somewhat different from mine, Open Philanthropy’s policy of worldview diversification is perhaps the most direct and intentional application of the “diversity of good maximizers” stance within the EA community.