Punching Utilitarians in the Face
A fun game for avowed non-utilitarians is to invent increasingly exotic thought experiments to demonstrate the sheer absurdity of utilitarianism. Consider this bit from Tyler’s recent interview with SBF:
COWEN: Should a Benthamite be risk-neutral with regard to social welfare?
BANKMAN-FRIED: Yes, that I feel very strongly about.
COWEN: Okay, but let’s say there’s a game: 51 percent, you double the Earth out somewhere else; 49 percent, it all disappears. Would you play that game? And would you keep on playing that, double or nothing?
…
BANKMAN-FRIED: Again, I feel compelled to say caveats here, like, “How do you really know that’s what’s happening?” Blah, blah, blah, whatever. But that aside, take the pure hypothetical.
COWEN: Then you keep on playing the game. So, what’s the chance we’re left with anything? Don’t I just St. Petersburg paradox you into nonexistence?
Pretty damning! It sure sounds pretty naive to just take any bet with positive expected value. Or from a more academic context, here is FTX Foundation CEO Nick Beckstead alongside Teruji Thomas:
On your deathbed, God brings good news… he’ll give you a ticket that can be handed to the reaper, good for an additional year of happy life on Earth.
As you celebrate, the devil appears and asks “Won’t you accept a small risk to get something vastly better? Trade that ticket for this one: it’s good for 10 years of happy life, but with probability 0.999.”
You accept… but then the devil asks again… “Trade that ticket for this one: it is good for 100 years of happy life–10 times as long–with probability 0.9992–just 0.1% lower.”
An hour later, you’ve made 50,000 trades… You find yourself with a ticket for 1050,000 years of happy life that only works with probability 0.99950,000, less than one chance in 1021
Predictably, you die that very night.
And it’s not just risk! There are damning scenarios downright disproving utilitarianism around every corner. Joe Carlsmith:
Suppose that oops: actually, red’s payout is just a single, barely-conscious, slightly-happy lizard, floating for eternity in space. For a sufficiently utilitarian-ish infinite fanatic, it makes no difference. Burn the Utopia. Torture the kittens.
…in the land of the infinite, the bullet-biting utilitarian train runs out of track…
It’s looking quite bad for utilitarianism at this point. But of course, one man’s modus ponens is another man’s modus tollens, and so I submit to you that actually, it is the thought experiments which are damned by all this.
I take the case for “common sense ethics” seriously, meaning that a correct ethical system should, for the most part, advocate for things in a way that lines up with what people actually feel and believe is right.
But if your entire argument against utilitarianism is based on ginormous numbers, tiny probabilities, literal eternities and other such nonsense, you are no longer on the side of moral intuitionism. Rather, your arguments are wildly unintuitive, your “thought experiments” literally unimaginable, and each “intuition pump” overtly designed to take advantage of known cognitive failures.
The real problem isn’t even that these scenarios are too exotic, it’s that coming up with them is trivial, and thus proves nothing. Consider, with apologies to Derek Parfit:
Suppose that I am driving at midnight through some desert. My car breaks down. You are a stranger, and the only other driver near. I manage to stop you, and I ask for help.
As you are against utilitarianism, you have committed to the following doctrine: when a stranger asks for help at midnight in the desert, you will give them the help they need free of charge. Unless they are a utilitarian, in which case you will punch them in the face, light them on fire, and commit to spending the rest of your life sabotaging shipments of anti-malarial betnets.
Here is a case without any outlandish numbers in which being a utilitarian does not result in the best outcome. And yet clearly, it proves nothing at all about utilitarianism!
Look, I know this all sounds silly, but it is no sillier than Newcomb’s Paradox. As a brief reminder:
The player is given a choice between taking only box B, or taking both boxes A and B.
Box A is transparent and always contains a visible $1,000.
Box B is opaque, and its content has already been set by the predictor.
If the predictor has predicted that the player will take only box B, then box B contains $1,000,000. If the predictor has predicted the player will take both boxes A and B, then box B contains nothing.
Again, this initial looks pretty damning for standard decision theory …except that you can generate a similar “experiment” to argue against anything you don’t like. In fact, you can generate far worse ones! Consider:
The player is given a choice between only taking box B, or taking both boxes A and B.
Box A is transparent and always contains $1,000.
Box B is opaque, and its content has already been set by the predictor.
If the predictor has predicted the player acts in accordance with Theory I Like, Box B contains $1,000,000. If the predictor has predicted the player acts in accordance with Theory I Don’t Like, then box B contains a quadrillion negative QALYs.
The problem isn’t that decision theory is wrong, it’s that the setup has been designed to punish people who behave a certain way. And so it’s meaningless because we can trivially generate analogous setups that punish any arbitrary group of people, thus “disproving” their belief system, or normative theory, or whatever it is you’re trying to argue against… while at the same time providing no actual evidence one way or another.
Does this mean thought experiments are all useless and we just have to do moral philosophy entirely a priori? Not at all. But there are two particular cases where these fail, and a suspiciously large number of the popular experiments fall into at least one of them:
The “moral intuition” is clearly not generated by reliable intuitions because it abuses:
a. Incomprehensibly large or small numbers
b. Known cognitive biases
c. Wildly unintuitive premisesThe “moral intuition” proves too much because it can be trivial deployed against any arbitrary theory
In contrast, the best thought experiments are less like clubs beating you over the head, and more like poetry that highlights a playful tension between conflicting reasons. In this vein, Philippa Foot’s Trolley Problems are so lovely because they elegantly guide you around the contours of your own values. They allow you to parse out various objections, to better understand which particular aspects of an action make it objectionable, and play your own judgements against each other in a way that generates humility, thoughtfulness and comprehension.
So I love thought experiments. And I deeply appreciate the way make-believe scenarios can teach us about the real world. I just don’t care for getting punched in the face.
–––
Appendix
Nicolaus Bernoulli, Joe Carlsmith, Nick Beckstead, Teruji Thomas, Derek Parfit, Tyler Cowen, and Robert Nozick are all perfectly fine people and good moral philosophers.
I am also not a moral philosopher myself, and it’s likely that I’m missing something important.
Having said that, I will do the public service of risking embarassment to make my bullet biting explicit:
I take the St. Petersburg gamble, and accept that a 0.5n probability of 2nx value is positive-EV.
I also take the devil’s deal.
I simply don’t believe that infinities exist, and even though 0 isn’t a probability, I reject the probabilistic argument that any possibility of infinity allows them to dominate all EV calculations. I just don’t think the argument is coherent, at least not in the formulations I’ve seen.
Similarly, once you introduce a “reliable predictor”, everything goes out the window and the money is the least of your concern. But granting the premise, fine, I One Box.
EDIT: I didn’t discuss it here, but the original desert dillema just involves you being a selfish person who can’t lie, and then man refusing to help you because he knows you won’t actually reward him. This doesn’t fall into either of the “bad thought experiment” heuristics I outlined above, and is in fact, a seemingly reasonable scenario.
But I don’t think the lesson is “selfishness is always self-defeating”, I think the lesson is “if you’re unable to lie, having a policy of acting selfish is probably the wrong way to implement your selfish aims.” And so you should rationally determine to act irrationally (with respect to your short-term aims), but this is really no different than any other short-term/long-term tradeoff.
Parfit’s point, by the way, was a more abstract thing about the fact that some “policies” can be self-defeating, and that this results in some theoretically interesting claims. Which is good and clevel, but for our purposes my point is that the “Argument from getting punched in the face by an AI that hates your policy in particular” does a good job of demonstrating this doesn’t prove anything about any given policy in particular.
I’m pretty confused about the argument made by this post. Pascal’s Mugging seems like a legitimately important objection to expected value based decision theory, and all of these thought experiments are basically flavours of that. This post feels like it’s just imposing scorn on that idea without making an actual argument?
I think “utilitarianism says seemingly weird shit when given large utilities and tiny probabilities” is one of the most important objections.
Is your complaint that this is an isolated demand for rigor?
I’m not well versed enough in higher mathematics to be confident in this, but it seems to me like these objections to utilitarianism are attacking it by insisting it solve problems it’s not designed to handle. We can define a “finite utilitarianism,” for example, where only finite quantities of utility are considered. In these cases, the St. Petersburg Paradox has a straightforward answer, which is that we are happy to take the positive expected value gamble, because occasionally it will pay off.
This brings utilitarianism closer to engineering than to math. In engineering, we use mathematical models to approximate the behavior of materials. However, we obtain the math models from empirical observation, and we understand that these models break down at a certain point. Despite this, they are extremely useful models for almost any practical circumstance the engineer might confront. Likewise for utilitarianism.
It’s fine to argue about transcendent, infinite utilitarianism. But both sides of the debate should be explicitly clear that it’s this very particular distilled flavor of moral philosophy that they are debating. This would be akin perhaps to Einstein proving that Newton’s laws break down near the speed of light. If we are debating over whether we have a moral imperative to create a galaxy-spanning civilization as soon as possible, or by contrast to systematically wipe out all the net-negative life in the universe, then these issues with pushing the limits of utilitariansim are in force. If we are debating a more ordinary moral question, such as whether or not to go to war, practical finite utilitarianism works fine.
These thought experiments also implicitly distort it by asking us to substitute some concrete good, such as saving lives or making money, for what utilitarianism actually cares about, which is utility. Our utility does not necessarily scale linearly with number of lives saved or amount of money made. But because utility is so abstract, it is hard even for a utilitarian to feel in their gut that twice the utility is twice as good, even though this is true by definition.
Utilitarianism’s foil, deontology, makes crazy-sounding claims about relatively ordinary conundrums, such as “you can’t lie to the axe murderer about where his victim is hiding.” Cooking up insane situations where utilitarianism sounds just as crazy is sort of moral DARVO.
OK, that seems like a pretty reasonable position. Thoough if we’re restricting ourselves to everyday situations it feels a bit messy—naive utilitarianism implies things like lying a bunch or killing people in contrived situations, and I think the utility maximising decision is actually to be somewhat deontologist.
More importantly though, people do use utilitarianism in contexts with very large amounts of utility and small probabilities—see strong longtermism and the astronomical waste arguments. I think this is an important and action relevant thing, influencing a bunch of people in EA, and that criticising this is a meaningful critique of utilitarianism, not a weird contrived thought experiment
I don’t know what “naive” utilitarianism is. Some possibilities include:
Making incorrect predictions about the net effects of your behavior on future world states, due to the ways that utilitarian concepts might misguide your epistemics.
Having different interpretations of the same outcomes from a more “sophisticated” moral thinker.
I would argue that (1) is basically an epistemic problem, not a moral one. If the major concern with utilitarian concepts is that it makes people make inaccurate predictions about how their behaviors will affect the future, that is an empirical psychological problem and needs to be dealt with separately from utilitarian concepts as tools for moral reasoning.
(2) is an argument from authority.
Please let me know if you were referencing some other concern than the two I’ve speculated about here; I assume I have probably missed your point!
I don’t know what “be somewhat deontologist” means to you. I do think that if the same behavior is motivated by multiple contrasting moral frameworks (i.e. by deontology and utilitarianism), that suggests it is “morally robust” and more attractive for that reason.
However, being a deontologist and not a utilitarian is only truly meaningful when the two moral frameworks would lead us to different decisions. In these circumstances, it is by definition not the utility maximizing decision to be deontologist.
If I had to guess at your meaning, it’s that “deontologist” is a psychological state, close to a personality trait or identity. Hence, it is primarily something that you can “be,” and something that you can be “somewhat” in a meaningful way. Being a deontological sort of person makes you do things that a utilitarian calculus might approve of.
I agree that people do attempt to apply utilitarian concepts to make an argument for avoiding astronomical waste.
I agree that if a moral argument is directing significant human endeavors, that makes it important to consider.
This is where I disagree with (my interpretation of) you.
I think of moral questions as akin to engineering problems.
Occasionally, it turns out that a “really big” or “really small” version of a familiar tool or material is the perfect solution for a novel engineering challenge. The Great Wall of China is an example.
Other times, we need to implement a familiar concept using unfamiliar technology, such as “molecular tweezers” or “solar sails.”
Still other times, the engineering challenge is remote enough that we have to invent a whole new category of tool, using entirely new technologies, in order to solve it.
Utilitarianism, deontology, virtue ethics, nihilism, relativism, and other frameworks all offer us “moral tools” and “moral concepts” that we can use to analyze and interpret novel “moral engineering challenges,” like the question of whether and how to steer sentient beings toward expansion throughout the lightcone.
When these tools, as we apply them today, fail to solve these novel moral conundrums in a satisfying way, that suggests some combination of their limitations, our own flawed application of them, and perhaps the potential for some new moral tools that we haven’t hit on yet.
Failure to fully solve these novel problems isn’t a “critique” of these moral tools, any more than a collapsed bridge is a “critique” of the crane that was used to build it.
The tendency to frame moral questions, like astronomical waste, as opportunities to pit one moral framework against another and see which comes out the victor, strikes me as a strange practice.
Imagine that we are living in an early era, in which there is much debate and uncertainty about whether or not it is morally good to kill heathens. Heathens are killed routinely, but people talk a lot about whether or not this is a good thing.
However, every time the subject of heathen-killing comes up, the argument quickly turns to a debate over whether the Orthodox or the Anti-Orthodox moral framework gives weirder results in evaluating the heathen-killing question. All the top philosophers from both schools of thought think of the heathen-killing question as showing up the strengths and weaknesses of the two philosophical schools.
I propose that it would be silly to participate in the Orthodox vs. Anti-Orthodox debate. Instead, I would prefer to focus on understanding the heathen-killing question from both schools of thought, and also try to rope in other perspectives: economic, political, technological, cultural, and historical. I would want to meet some heathens and some heathen-killers. I would try to get the facts on the ground. Who is leading the next war party? How will the spoils be divided up? Who has lost a loved one in the battles with the heathens? Are there any secret heathens around in our own side?
This research strikes me as far more interesting, and far more useful in working toward a resolution of the heathen-killing question, than perpetuating the Orthodox vs. Anti-Orthodox debate.
By the same token, I propose that we stop interpreting astronomical waste and similar moral conundrums as opportunities to debate the merits of utilitarianism vs. deontology vs. other schools of thought. Instead, let’s try and obtain a multifaceted, “foxy” view of the issue. I suspect that these controversial questions will begin to dissolve as we gather more information from a wider diversity of departments and experiences than we have at present.
You don’t need explicit infinities to get weird things out of utilitarianism. Strong Longtermism is already an example of how the tiny probability that your action affects a huge number of (people?) dominates the expected value of your actions in the eyes of some prominent EAs.
I agree with you. Weirdness, though, is a far softer “critique” than the clear paradoxes that result from explicit infinities. And high-value low-probability moral tradeoffs aren’t even all that weird.
We need information in order to have an expected value. We can be utilitarians who deny that sufficient information is available to justify a given high-value low-probability tradeoff. Some of the critiques of “weird” longtermism lose their force once we clarify either a) that we’re ~totally uncertain about the valence of the action under consideration relative to the next-best alternative, and hence the moral conundrum is really an epistemic conundrum, or b) that we actually are very confident about its moral valence and opportunity cost, in which case the weirdness evaporates.
Consider a physicist who realizes there’s a very low but nonzero chance that detonating the first atom bomb will light the atmosphere on fire, yet who also believes that every day it doesn’t get dropped on Japan extends WWII and leads to more deaths on all sides on net. For this physicist, it might still make perfect sense to spend a year testing and checking to resolve this small chance that the nuclear bomb doesn’t work. I think this is not a “weird” decision from the perspective of most people, whether or not we assume the physicist is objectively correct about the epistemic aspect of the tradeoff.
To a certain extent, it’s utilitarianism that invites these potential critiques. If a theory says that probabilities/expected value are integral to figuring out what to do, then questions looking at very large or very small probabilities/expected value is fair game. And looking at extreme and near-extreme cases is a legitimate philosophical heuristic.
Correct me if I am wrong, but I don’t neccessarily see St Petersberg Paradox as being the same as Pascals mugging. The latter is a criticism of speculation, and the former is more of an intuitive critique against expected value theory
I originally wrote this post for my personal blog and was asked to cross-post here. I stand by the ideas, but I apologize that the tone is a bit out of step with how I would normally write for this forum.
A related annoyance I have is ambiguity between “best actions in most environments”vs “best actions in environments where other agents are actively out to abuse your decision theory.” After reading your post, I finally got around to quickly scribbling my thoughts here.
Problems with infinity doesn’t go away just because you assume that actual infinities don’t exist. Even with just finite numbers, you can face gambles that have infinite expected value, if increasingly good possibilities have insufficiently rapidly diminishing probabilities. And this still causes a lot of problems.
(I also don’t think that’s an esoteric possibility. I think that’s the epistemic situation we’re currently in, e.g. with respect to the amount of possible lives that could be created in the future.)
Also, as far as I know (which isn’t a super strong guarantee) every nice theorem that shows that it’s good to maximize expected value assumes that possible utility is bounded in both directions (for outcomes with probability >0). So there’s no really strong reason to think that it would make sense to maximize expected welfare in an unbounded way, in the first place.
See also: www.lesswrong.com/posts/hbmsW2k9DxED5Z4eJ/impossibility-results-for-unbounded-utilities
Under mainstream conceptions of physics (as I loosely understand them), the number of possible lives in the future is unfathomably large, but not actually infinite.
I’m not saying it’s infinite, just that (even assuming it’s finite) I assign non-0 probability to different possible finite numbers in a fashion such that the expected value is infinite. (Just like the expected value of an infinite st petersburg challenge is infinite, although every outcome has finite size.)
Is the expected number finite, though? If you assign nonzero probability to a distribution with infinite EV, your overall EV will be infinite. If you can’t give a hard upper bound, i.e. you can’t prove that there exists some finite number N, such that the number of possible lives in the future is at most N with probability 1, it seems hard to rule out giving any weight to such distributions with infinite EV (although I am now just invoking Cromwell’s rule).
I think you probably can? Edit: assuming current physics holds up, which is of course not probability 1. But I don’t think it makes sense to treat the event that it doesn’t seriously, in a Pascal’s mugging sense.
The topic under discussion is whether pascalian scenarios are a problem for utilitarianism, so we do need to take pascalian scenarios seriously, in this discussion.
I think this gets at something important, but:
This list also applies to prominent arguments for longtermism and existential risk mitigation, right? For example, Greaves & MacAskill think that the charge of fanaticism is one of the most serious problems with strong longtermism, which “tend[s] to involve tiny probabilities of enormous benefits.” To the extent that’s true, the extreme thought experiments seem to capture something significant. If they reveal a failure of utilitarianism, strong longtermism may fail too. (I realize there are other, non-longtermist arguments for x-risk reduction.)
Longtermism does mess with intuitions, but it’s also not basing its legitimacy on a case from intuition. In some ways, it’s the exact opposite: it seems absurd to think that every single life we see today could be nearly insignificant when compared to the vast future, and yet this is what one line of reasoning tells us.
This feels somewhat analogous to how a lossless compression algorithm can’t work on all inputs at once, and a learning algorithm can’t be superior on every problem at once. For each algorithm, you can construct an example where it fails miserably.
Yet we still have whole industries dedicated to using learning and compression algorithms, because you can manufacture ones that work for all realistic problems, which is a family that doesn’t include that pathological construction.
I don’t think the case for infinities is defeated by rejecting “you can’t assign probability 0 to any coherent hypothesis” (Cromwell’s rule), although I do endorse something like Cromwell’s rule. I think infinities probably actually exist, e.g. the universe is probably actually infinite in spatial extent (see the discussion here; globally flat is consistent with the evidence so far, and bounded flat universes are much weirder to me than infinite ones). This can matter if you take acausal influence, e.g. evidential decision theory, seriously. You need to aggregate in special ways (I like expansionism, although it doesn’t cover all cases) or ignore the infinities to avoid all of your actions having undefined expected value, because your decisions are probably correlated with infinitely many other people’s in an infinite universe.
It’s also plausible that spacetime is continuous, although ethical concerns probably only have finitely many degrees of freedom locally, and this wouldn’t generate infinite EVs in bounded regions of spacetime.
Random aside, but does the St. Petersburg paradox not just make total sense if you believe Everett & do a quantum coin flip? i.e. in 1⁄2 universes you die, & in 1⁄2 you more than double. From the perspective of all things I might care about in the multiverse, this is just “make more stuff that I care about exist in the multiverse, with certainty”
Or more intuitively, “with certainty, move your civilization to a different universe alongside another prospering civilization you value, and make both more prosperous”.
Or if you repeat it, you have “move all civilizations into a few giant universes, and make them dramatically more prosperous.
Which is clearly good under most views, right?
I think this Everettian framing is useful and really probes at how we should think about probabilities outside of the quantum sense as well. So I would suggest your reasoning this holds for the standard coin flip case too.
Correct me if I’m wrong, but doesn’t the experiment just need a predictor who does better than random? The oracle could just just your good friend who knows your tendencies better than random, as long as box b’s payout is high enough.
I think the crux is how the oracle makes predictions? (Assuming it’s sufficiently better than random; if it’s 50.01% accurate and the difference between the boxes is a factor of 1000 then of course you should just 2-box.) For example, if it just reads your DNA and predicts based on that, you should 1-box evidentially or 2-box causally. If it simulates you such that whichever choice you make, it would probably predict that you would make that choice, then you should 1-box. It’s not obvious without more detail how “your good friend” makes their prediction.
What do you mean by “coherent”? Without inconsistencies/contradictions?
I’m not sure which formulations you’ve seen, but infinite (and undefined) expected values are implied by textbook measure theory with infinities, which uses the extended real numbers, R∪{−∞,+∞}. Basically, we define, for any extended real number a>0,a×(+∞)=+∞, and similarly b+∞=+∞ for any b∈R∪{+∞}, plus some other rules.
Then, for any extended real number-valued random variable X, you can show the following:
If P(X=+∞)>0 and P(X>c)=1 for some c∈R, then E[X]=∞.
If P(X<c)=1 for some c∈R, and P(X=−∞)>0, then E[X]=−∞.
If P(X=+∞)>0 and P(X=−∞)>0, then E[X] is undefined.
If P(X=+∞)>0 or P(X=−∞)>0, then E[X] is not finite (it will be infinite or undefined).
Are you denying that for any standard real number p>0,p×∞ is either not definable coherently, or not ∞? Or something else?
Here’s a proof of 1, where C:=∫X<+∞X dP≥∫X<+∞c dP=cP(X<+∞)>−∞ :
E[X]=∫X dP=∫X=+∞X dP+∫X<+∞X dP=∫X=+∞X dP+C=∫X=+∞+∞ dP+C
=+∞×P(X=+∞)+C=+∞+C=+∞
The last equality follows because C>−∞, and the one before because P(X=+∞)>0. All the steps are pretty standard, either following (almost) directly from definitions, or from propositions that are simple to prove from definitions and usually covered early on when integrals are defined.
Infinities make mathematical sense, but they don’t make real world sense. When you’re measuring expected value and the value comes from the actual physical world, it can (empirically) never be infinite.
*You could say black hole singularities are an infinity, but even those are a delta measure with a finite integral.
You can still assign actual infinities a decent chance of existing without measuring them directly. Not being able to measure them doesn’t make them incoherent or impossible. I think the universe is probably infinite in spatial extent based on current physics: https://forum.effectivealtruism.org/posts/5nzDZTsi35mDaJny3/punching-utilitarians-in-the-face?commentId=G3DELj34he2bsqszf
I think you’re using “sense” too restrictively.
I liked this post because of I’ve been thinking about similar issues recently, but find some of the conclusions strange. For example, isn’t there a “generalised trolley problem” for any deontologist who asserts that rule X should be followed:
?
Why is this relevant? I don’t think at this point the deontologist holds up their hands upon hearing any example of the above and denounces their theory. I think they add another rule that allows them to violate their former rule*. I think more needs to be done to prove that the boundary cases for utilitarianism are wild, but they are not out of the ordinary for deontological ethics.
* and I see this as about as wild as when the utilitarian doesn’t voluntarily harvest organs because of “societal factors”, and has to add this to their utility function (here: https://www.utilitarianism.net/objections-to-utilitarianism/rights)
I think these spectrum arguments are doing much more of point (1) ‘The “moral intuition” is clearly not generated by reliable intuitions’ rather than (2) ‘proving too much’.
As such I think these are genuinely useful thought experiments, as then we can discuss the issues and biases we are discussing under (1). For example, I too would be willing to bite the bullet on Cowen’s St Petersberg Paradox Persistence edition—as I can point to the greater value each time. I think many people find it counter-intuitive due to risk adversity. Which I think is also a fine point and can be discussed readily! Or maybe someone doesn’t like transitivity—also an interesting point worth considering!
I do not think that means we can throw these thought experiments out the window, or point to them being unfair. The moral views that we’re are defending are necessarily optimising so it makes sense to point out when this optimisation process makes people think that a moral harm has been committed. Exactly what spectrum arguments are set out to do.