Morality and constrained maximization, part 2

(Cross-posted from Hands and Cities)

This is the second in a pair of posts examining whether morality falls out of instrumental rationality, if you do the game theory right. David Gauthier thinks the answer is yes. I focus on his view as an example of a broader tendency, to which I expect many of my comments to generalize.

In my last post, I summarized the basics of Gauthier’s view. In this post, I get into more detail about the morality Gauthier proposes, and I discuss four objections to it:

1. It isn’t actually instrumentally rational.

I’m not fussed about whether Gauthier’s constrained maximization can ever be instrumentally rational. But Gauthier wants us to abide by the constraints we would’ve agreed on in a particular (hypothetical) situation we were never in. I think this requires more justification than Gauthier has given.

2. It gives the wrong type of reasons for moral behavior.

I used to treat this objection as decisive. Now I’ve softened a bit — but not fully.

3. It incentivizes threats and exploitation.

Gauthier thinks his constrained maximizers will avoid this via refusals to cooperate with agents that attempt it. I’m open to stuff in this vicinity, but I’m skeptical of his particular story.

4. It ties the moral consideration X receives too closely to X’s power, and licenses arbitrarily bad behavior towards the sufficiently disempowered and unloved.

This one looks true to me, and bad.

Overall, I don’t think Gauthier’s approach re-creates conventional morality, or the type of care towards others most important to my own ethical outlook. But I think it points at something important regardless.

I. Contracts you would’ve signed

When we last left Gauthier, he was telling us to be constrained maximizers: agents willing, in certain situations, to give up on maximizing utility, and to act on the basis of a collective strategy close enough to what would be agreed on via a process of rational bargaining. Morality, roughly speaking, is this strategy.

The project here fits a classic “social contract” formula: it attempts to ground a normative standard in what some set of agents would agree to in some situation. But social contracts come in different flavors – and on a spectrum from “idealistic” to “realpolitik,” Gauthier’s is on the latter end.

Thus: Rawls famously imagines agents behind a “veil of ignorance,” which denies them knowledge of who they are, and hence motivates concern for the fate of all (and especially, thinks Rawls, for the worst off). Not so Gauthier: his agents know their identities and social advantages, and they bargain on that basis.

Similarly: theorists like Scanlon specify that the choices of the contractors reflect some form of other-regarding concern (implicit in notions of “equal respect,” or of what agreements an agent could “reasonably reject” — see Southwood (2010) for more on this). Again, not so Gauthier: his bargainers are purely out for their own utility, which can be as indifferent to others as you please.

Interestingly, though, Gauthier doesn’t go as fully realpolitik as he could. In particular, his bargaining situation isn’t the “state of nature” – the state in which everyone does whatever they want to each other, dogs feast on other dogs, and Hobbes starts slinging adjectives around. Gauthier’s contractors don’t negotiate from a position of having already reaped the fruits, or suffered the harms, of the predation, coercion, conflict, and so on to which unconstrained maximization leads. Nor, says Gauthier, does the state of nature – or worse, the “threat point,” in which everyone tries to impose maximal costs on others, even at the cost of their own utility – function as the “disagreement point” relative to which candidate bargains are evaluated.

Rather, Gauthier claims that the resources bargainers bring to the table, and the “disagreement point” they try to improve upon, must be compatible with a kind of proto-morality that he calls the “Lockean proviso.” This proviso requires, roughly, that agents do not “worsen” the situations of others, except insofar as necessary to avoid “worsening” their own situation – where “worsening” is evaluated relative to the utility an agent would have received absent any interaction (which for Gauthier is similar to “if the other parties didn’t exist”). Thus:

If I push you into a pond, I worsen your situation (you would’ve stayed dry absent interaction with me); but if you’re already in the pond, and I fail to save you from drowning, I don’t (you would’ve drowned absent interaction).
If you and I are stranded on an island together, we are each allowed to grab as much fruit from the trees as fast as we can, because had we been alone on the island, we would have had all the fruit (so even though I worsen your situation by grabbing fruit, I do it to avoid worsening my own, which takes priority).
However, if you grow a vegetable garden on the island, I can’t take the broccoli (without compensating you), because had I been alone on the island, there would have been no garden (so not having your broccoli does not “worsen” my situation).

We can ask lots of questions about how Gauthier’s “Lockean proviso” is supposed to work, and about the attractiveness of its verdicts. And we can ask further questions about why, exactly, Gauthier’s bargainers must abide by it. I’ll ask a few of these below.

For now, suffice to say that for Gauthier, if you want to cooperate with someone, then even before you actually agree on the terms of cooperation (and hence on “morality”), you must approach them in a way that already puts some constraints on your utility maximization (e.g., with a kind of “proto-morality”) – and he imagines hypothetical bargainers who have taken this to heart. (For people you don’t want to cooperate with, though, there’s nothing in Gauthier to stop you from feasting on their flesh – more on this later.)

Beyond what I’ve said so far, though, the specifics of the hypothetical bargaining situation Gauthier has in mind remain, to me, quite unclear. Proviso aside, which aspects of my current life, and the lives of others, am I supposed to imagine importing into the bargaining room? Which of the proviso-compatible disagreement points are we supposed to work from? How are we supposed to relate the output of the ideal bargaining process to the contingency, imperfection, and non-compliance at stake in our actual conventions and institutions? Gauthier gets into some of this at various points, and he’s got other work on the topic I haven’t engaged with, but at present I don’t feel like I have a clear picture of how to put Gauthier’s view – and others like it – into practice, even if I wanted to.

But we need not query the details of Gauthier’s view in particular. My aim here is to evaluate it as an illustrative example of a broader tendency – namely, an aspiration to derive morality solely from a subjectivist, maximizing conception of instrumental rationality, via appeal to that most hallowed source of normativity: “something something game theory social contract something.” Gauthier’s is one (incomplete, imo) attempt to fill out the “something somethings,” here, but there are others (perhaps you’ve heard some in your dorm room?) – and many of them, I think, will suffer from structurally similar objections, regardless of the details. Those are the objections I want to focus on. Let’s turn to them now.

II. Which contracts? Why keep them?

Gauthier has set himself a tough task. He wants to affirm all of:

Instrumental rationality is about maximizing expected utility.
Sometimes, morality demands that you don’t maximize expected utility.
Nevertheless, compliance with morality in such cases is instrumentally rational.

Hmm. How can you get all three? The combo sounds a bit like: “All dogs have buddha nature. Some cats do not have buddha nature. Those cats are dogs.” Zen master, dost thou equivocate?

Gauthier’s kensho is:

A. The disposition to comply with morality is instrumentally rational to acquire (or maybe, to “have”).
B. Thus, it’s instrumentally rational to act on.

Why is the disposition to act morally instrumentally rational to acquire? Because without it, says Gauthier, sufficiently discerning people won’t cooperate with you (I discuss “what if I can deceive them?” below). And as we discussed last post, cooperation is super great, from a getting-utility perspective.

Let’s grant this much to Gauthier for now. For some cooperation clubs, you’ll need a compliance disposition to get past the bouncers. So if you’re standing in line, it’s rational to acquire such a disposition (or keep one if you’ve got it). We can say something similar about compliance-y self-modifications, binding commitments, policies, and so on.

But we might still wonder about the jump from “disposition rational to acquire” to “disposition rational to act on.” Suppose, for example, that Eccentric Ed offers you ten million dollars to press a button that, for the next four hours, will give you the disposition to rip a handful of hair from your head anytime someone says the word “Eureka!” with suitable gusto. You press, pocket the money, and leave – as was, let’s say, instrumentally rational. Back at home, while you’re counting the cash, your roommate makes a breakthrough on her jigsaw puzzle and shouts “Eureka!”. Is it instrumentally rational to rip a handful of hair from your head? (See Parfit here for a more involved case.) I’m often inclined towards “whatever” for stuff in this broad vicinity (I discuss why here), but in this particular case I lean towards “no.” (I think it matters to my intuition that whether you get the money isn’t contingent on Ed’s prediction about whether you hair-rip later – it’s just contingent on having pressed the button at all.)

We can ask other questions about the jump from A to B. For example: what, exactly, is the underlying principle justifying such a jump? Consider this unattractive candidate:

Principle P: If there is a situation X in which it would be instrumentally rational to acquire a disposition (pre-commit, etc) to do A in situation Y, then it is instrumentally rational, in situation Y, to do A.

Principle P is an instructive non-starter. In particular: in the hair-ripping case, at least you actually were in situation X at some point, and you actually acquired the disposition in question. But P does not make either of these conditions necessary.

Thus, suppose that earlier, Ed made you the offer, but you (irrationally) refused to press the button. Now, you’re at home, broke, and you hear “Eureka!”. Should you rip the hair? P seems to say yes – but that seems super wrong.

And once you’ve never even been in situation X, P has obviously lost the plot. Even right now, there is some situation X where it would be rational for you, reader, to become disposed to rip a chunk of hair from your head if you see the word “Eureka!” written in bold. Eccentric Ed, for example, could offer to pay you enough. So: Eureka! Shall you rip? No. Definitely not.

You might think: “Joe, no one would ever be silly enough to appeal to a principle like P.” But what principle are we appealing to? Recall that on grounds of instrumental rationality alone, you’re supposed to abide by constraints “close enough” to the ones you would have agreed to after a hypothetical bargaining process involving everyone you aim to cooperate with. Thus, it’s not enough to get into the idea that it’s instrumentally rational to abide by actual agreements (precommitments, self-modifications, disposition-acquisitions) you actually (and rationally) made. Gauthier is telling you to abide by an agreement you didn’t make, but that you would have made, in a situation you were never in. If not P, what grounds this?

Clearly, we need at least some account of why Gauthier’s bargaining situation, in particular, has some privileged purchase on your instrumental rationality. Here I recall someone I know describing his ethic: “I’m into keeping contracts I would have signed.” But: signed under what circumstances? You would’ve signed that “rip the hair if you see ‘Eureka!’ in bold” contract, no? Eccentric Ed has a lot more where that came from. Without restrictions on the “would,” “contracts I would have signed” includes a zillion crazy things you definitely don’t want to do. So what limits our “woulds”? Why Gauthier’s in particular?

Gauthier (2013, p. 619) considers an objection like this. His response is that because we can’t actually have the bargaining session in question, abiding by the principles we would agree to in such a session “brings society as close as is possible to the status of a voluntary association.” And there is indeed something nice about this. But whence its grip on the instrumental rationality of the dog-eat-dog egoists that Gauthier is trying to lure from the state of nature? Presumably, there need be no term in their arbitrary utility functions for “approximating a voluntary association as much as possible.” And even if there were, why would this be enough to render compliance with Gauthier’s principles, even at the costs of lots of utility, instrumentally rational overall? I worry that Gauthier’s argumentative approach, here, has switched from “this follows from cold-hearted maximizing subjectivism” to an approach more common in political philosophy: “this vibes with various liberal-ish intuitions and ideals.”

Gauthier’s problems, here, parallel the two I wrote about in the context of updateless-ish decision theories. One question is whether it can ever be instrumentally rational to choose lower payoffs with certainty (e.g. paying in Parfit’s hitchhiker, one-boxing in transparent Newcomb, and so on), if e.g. doing so would be suitably rational to pre-commit to. There, Gauthier wants to say something about evaluating dispositions rather than actions, but his account isn’t, in my view, especially satisfying (especially once we’re clear on the distinction between dispositions and actions as evaluative focal points, and on the different counterfactuals such focal points imply). Indeed, my suspicion is that those attached to calling such behavior instrumentally rational (as opposed to just e.g. being able to do it if necessary) will do better to look to the more developed (if still somewhat hazy) machinery of e.g. functional decision theory and the like: principles like “if rational to acquire a disposition, rational to execute on it” seem too permissive (especially if we allow merely hypothetical situations to count in favor of acquisition).

That said, as I indicated in my post on decision theory, I am overall pretty happy to do things like pay in Parfit’s hitchhiker, and one-box in transparent Newcomb – and I don’t feel especially fussed about the labels we put on this behavior. So in this sense, I’m on board with constrained maximization in some form: “but it’s by definition non-maximizing in some cases” doesn’t really bother me.

The bigger problem, in my opinion, is identifying the ex ante standpoint we should use to decide which dispositions/pre-commitments to abide by. Cases like Parfit’s Hitchhiker and Transparent Newcomb always come with a “problem statement,” which includes a “prior” you can use to calculate which overall policy would maximize expected utility (see Abram Demski here). But real life does not come with a problem statement or “prior”: you wake up in the midst of it, and by the time you’re making real decisions, you’ve already got tons of information about your particular situation, and no obvious way to “rewind” all the way to the beginning. Updateless decision theorists need to find some privileged place to rewind to – a formidable problem in its own right. But Gauthier’s got a harder task: he needs to justify rewinding to a particular bargaining room you were never in (one that includes, for Gauthier, quite a bit of information about your identity and resources). Why there?^[1]

Of course, in various actual situations you encounter, it may well make straightforward, classical-utility-maximizing sense to cultivate various dispositions (make various commitments, etc) in the vicinity of some form of constrained maximization. But it is a substantially further question whether adherence to the output of Gauthier’s hypothetical bargaining situation, in particular, will fall out of this.

Perhaps, though, something close enough will. Certainly, it seems right that if they can recognize each other, Gauthier-ish agents can save a lot of hassle and waste in interaction – they can just cut right to a reasonably fair, pareto-optimal outcome they would have bargained for, without burning time and resources on conflict, bad equilibria, costly assurance mechanisms, or even on explicit communication. And perhaps this sort of broad vibe is a lot of what we should focus on in the context of Gauthier-like proposals. Need we pin down exactly what sort of grand (cosmic?) bargaining table we should imagine having sat down at, or why this bargain, in particular, is what it makes sense to adhere to? Can’t we just be broadly, like, cooperative?

Maybe. It does feel like there can be something overly finicky about demanding some perfect theory of cooperation before cooperating – one worries the cooperation itself will go worse. But I also worry that if you’re basing your policy on “something something social contract,” without a clear sense of which social contract and why, you’re at risk of doing kind of random stuff. There are important differences, for example, between a Rawlsian veil of ignorance, and a Gauthierian bargain; between a proviso-including version of Gauthier, and a more fully realkpolitik one; between bargaining given everything you know now, vs. everything you knew at some point in the past, vs. a prior generated by an arbitrary universal turing machine; between including or excluding various types of beings (animals? non-existent agents? more on this later) from the bargaining table; and so on. Faced with such stakes, some sense of what you are doing and why seems important. My engagement with Gauthier thus far has left me wanting more in this respect.

III. Is that why you’re nice?

Let’s turn to a different objection: namely, that Gauthier-ish morality gives the wrong type of reasons for moral behavior.

Suppose that you are an egoist, entirely indifferent to the welfare of others. One night, bored and looking for a bit of fun, you find a homeless man sleeping in an alleyway, beat him up, and get away with it. (Let’s set aside for a moment the question of whether this man’s lack of resources gives him less moral weight in the eyes of Gauthier’s social contract (see below) and assume he is entitled to full moral protection.)

What kind of mistake did you make, on Gauthier’s picture? Why was this a bad choice? The answer, ultimately, is that this choice was, from some (still hazy) ex ante perspective, bad for your own welfare (the only thing you care about). It was a (complicated) prudential mistake, like failing to order your favorite thing on the menu, or to invest enough in your 401k. The welfare of the homeless man enters into the story only as a part of the explanation of why you are losing welfare: e.g., because this homeless man values his welfare, he would’ve bargained to get “no beating me up just for fun” into an ex ante agreement with you; access to the rest of such an agreement would net good for you; and access would be denied to you if he knew you would beat him up anyway.

Here, it seems like some “actually about the Other” aspect of moral life has fallen out of the picture – and with it, a lot of our everyday moral phenomenology (see Southwood (2010), section 2.2 for more on this). The experience of guilt for performing an act like this, for example, might seem to involve some essential focus on e.g. the suffering you caused, rather than on its implications for what you’d expect to get out of some hypothetical bargain. And we don’t tend to direct reactions like blame and punishment to other sorts of (even quite complicated) prudential mistakes in others. Perhaps, for your own sake, you really should’ve gotten that chemotherapy, instead of relying on the homeopath; but your beating up the homeless man prompts a different sort of reaction from your peers.

One way to notice the disconnect here is to reflect on the fact that eligibility for participation in a Gauthier-ish social contract need not reflect salient properties we often think of in the context of moral status – in particular, properties like consciousness. If it is possible to build a totally non-conscious system that there are prudential benefits to cooperating with (more discussion here), and which strongly “prefers” that you don’t bring gummy bears into a certain patch of desert (even if it will never find out), then for Gauthier, your reasons not to take the gummies on your safari are of the same type as your reasons not to beat up a homeless man – and the “guilt” you should feel for violation is the same as well. But if rocks (corporations?) could bargain, and had enough goodies to offer, would that suffice to make them people?

There is also a broader worry, here: namely, that Gauthier is proposing a type of morality that participating agents should view as a burden, as a “necessary evil” that it (sometimes) makes sense to sign up for, but which ultimately imposes constraints that it kind of sucks, later, to uphold (here I expect some people are like: wait, this is an objection?). Indeed, if they can get away with it, a Gauthier-ian agent looks for every reasonable opportunity to do less in the way of moral stuff – and even, to deceive others about their degree of moral compliance/motivation, to undermine the rationality and discernment of fellow cooperators, and so on, wherever being disposed to do so does not adversely affect ex ante prospects for cooperation. That is, a Gauthier-ian agent’s morality is (or: can be) reluctant, minimal, calculating, devoid of any real fellow-feeling, and reliable only up to a specific and contingent threshold of ex ante advantage. It’s a morality, one worries, for mafia bosses; cold-hearted killers; wolves acting just enough like sheep.

I used to treat objections in this vein – and in particular, “your welfare is not the main thing in the homeless man case” – as pretty much decisive. Now, though, I’ve softened a bit. This is for a few reasons.

First, for subjectivists (and I’m sympathetic to subjectivism), the most salient alternative to a Gauthier-style morality is just to say that morality is about caring intrinsically about certain things: for example, about the welfare of the homeless man. Subjectivists will generally admit, though, that whether or not someone possesses this care (even after some idealization process) is a contingent matter: egoists, paperclippers, and so on do not. Gauthier, though, can say all this too. If you happen to care about the homeless man’s welfare, then Gauthier says: great, go ahead and be nice for reasons directly grounded in the man’s welfare. But Gauthier is interested in a morality that binds less contingently: that somehow gets a grip on your rationality, regardless of what you care about it. A morality that responds to “but I’ll get away with beating up the homeless man, and I don’t care if it harms him” with “doesn’t matter, it’s still a mistake.”

Vindicating such a morality is a classic project, and one that remains, for subjectivists (and others), quite mysterious. If practical rationality is about promoting what you care about, how can morality bind your practical rationality regardless of what you care about? The realists will hope, here, that the objective fabric of normative reality can help (e.g., for them, practical rationality is about responding to the reasons that exist independent of what you care about, so they can say there just is non-instrumental reason not to harm the homeless man, to which egoists and paperclippers are blind). But subjectivists have no such resource.

Still, Gauthier tries to thread the needle, and his tack shows some promise. On the one hand, he maintains some connection with what you care about, via the perspective of ex ante bargaining. On the other, he captures the sense in which moral compliance often seems in tension with what you care about: ex post, it can involve actually giving up utility. Put in terms I’ve written about previously, Gauthier is trying to capture the sense in which morality actually does seem like “taxes,” while still saying that subjectivists should pay.

Whether he succeeds is a further question. But a lot of the objections listed at the beginning of this section fall fairly directly out the nature of his project. That is, if subjectivists want treating the homeless person well to be instrumentally rational even for paperclippers, they had better get used to saying something, ultimately, about paperclips. The fact that they have to do this is, indeed, an objection to subjectivism. But a standard subjectivism just tells Clippy to turn the homeless man into clips. Gauthier, at least, looks for some further constraint – and plausibly, subjectivists should be taking what they can get (though they should also watch out for wishful thinking).

What’s more, while the constraint Gauthier identifies does in fact imply a kind of reluctant, burdensome morality, this is, in fact, a prominent feature of how morality often presents itself. Indeed, perhaps Gauthier’s minimalist morality fits better with people’s intuitions about the degree of moral demandingness they’re up for, and we might wonder about its fit with the evolutionary origins of human moral experience as well.^[2]

Gauthier also offers some psychological commentary that he hopes will assuage worries that his agents stay cold-hearted, strategic calculators rather than warm and empathetic participants in society. In a chapter called “The Liberal Individual,” Gauthier suggests that people engaged in functional and mutually advantageous cooperative arrangements will naturally develop an interest in and concern for their fellow cooperators, especially insofar as the cooperation makes possible valuable forms of joint activity (e.g., playing together in an orchestra, as opposed to merely leaving each other alone) that would’ve otherwise been unavailable. It’s a necessary condition for the rationality of such cooperation that it pass a certain kind of cold-hearted, ex ante utility check; but once it does so, and is made stable, Gauthier’s liberal individual rejoices in the fruits of cooperation, which serve as a backdrop for more generous and altruistic sociality. The emotional dynamics of friendship or romantic attachment might be one analogy. One wants such relationships to be genuinely good for all parties involved, especially ex ante. But if they are, they are fertile gardens in which healthy, other-directed concern can grow.

Obviously, this sort of picture rests on contingent aspects of the psychology of the participators: what’s “natural” for (many) humans need not be natural for paperclippers, or for bargaining rocks. Still, though, I found Gauthier’s image of the liberal individual evocative. At the very least, it does seem like participating in a Gauthier-ian social contract involves quite a bit of modeling what other agents would prefer – and perhaps it quickly becomes more cognitively efficient to just give such preferences intrinsic weight in your deliberation, rather than constantly relating them back to the utility you yourself receive via some ex ante bargain. Even if not, though, I do find myself, personally, intrinsically compelled by some of the warm fuzzies about liberalism that Gauthier hopes for. If one day we meet aliens, and successfully cooperate with them rather than burning what we both love in zero-sum conflict, then even if our values are totally different, I can well imagine giving them (and us) a giant high-five: “hell yes, we beat Moloch, you guys are awesome, let’s have a giant fruits-of-cooperation party.” I can even imagine looking on with affection while they do their slimey incomprehensible alien thing. Perhaps we can have an orchestra together. “But what if they’re not even conscious, Joe?” Whatever, no need for metaphysics. They could’ve defected, and they didn’t. They are, just for that, a certain kind of friend.

Still, though, I don’t want to overplay the emotional appeal of Gauthier’s picture. Ultimately, we should readily acknowledge, this isn’t the “actually about the Other” morality one might’ve hoped to vindicate. Ultimately, indeed, it’s not obviously something morality-flavored at all. Rather, it’s just an instrumental strategy like all the rest – justified, like all the rest, by whatever your personal, contingent equivalent of paperclips is. Maybe this reproduces moralish behavior some of the time (or maybe not: see below); but so do more conventional incentives like cops and courts and reputations. And just as a conventional egoist would kill you in a heartbeat if they could get away with it, Gauthier’s “moral” egoist would kill you in a heartbeat if they could get away with it ex ante. One might’ve thought morality would do more to protect you.

IV. Threats and exploitation

This worry becomes more acute once we start double-clicking on exactly what agents can get away with, ex ante. Here I’ll note a few problems that arise at the bargaining table. Then I’ll turn to who gets how much of a seat.

As I mentioned in section I, Gauthier’s bargaining table requires that its participants adhere to a kind of “proto-morality” even before they come to an agreement about the full-scale morality they will abide by. In particular, they aren’t allowed to threaten each other (e.g., to go out of their way to impose costs on each other, beyond what they would naturally impose via straightforward utility-maximization); nor, even, are they always allowed to treat their utility-maximizing individual strategies as the “disagreement point” (as one might’ve thought natural). Rather, per Gauthier’s “Lockean proviso,” they are required, in their disagreement point strategy, to avoid “worsening” each other’s situation, except insofar as necessary to avoid worsening their own – where “worsening” is assessed relative to the utility they would receive absent any interaction at all (which I think, for Gauthier, tends to imply “if the other player didn’t exist”).

Why no threats? Gauthier gives two reasons. The first is that threateners won’t actually follow through if cooperation breaks down, because by hypothesis the threat behavior isn’t individually utility-maximizing for them. But this response seems strange to me. Gauthier is elsewhere assuming agents are able to acquire dispositions, make binding pre-commitments, etc, as necessary for accomplishing their goals. Why not here?

I like his other reason better: namely, that threat behavior, by hypothesis, is net negative for both parties, so it looks like a strong candidate for the type of thing that rational agents pre-commit to not doing to each other, where such a commitment is suitably reciprocated (and which they might also pre commit to not responding to and/or to punishing super hard – though I don’t think Gauthier gets into this). That is: if you’re choosing to play a bargaining game where both parties can credibly threaten to hurt each other as much as possible (even at intense costs to themselves), or one where neither can, then from lots of ex ante perspectives, the thought goes, “neither can” looks better (you don’t risk bearing the costs of the other agent’s enforced threats, or paying the costs of enforcing yours). Gauthier doesn’t work through the details here, though, and (as ever) the specific ex ante perspectives at stake will presumably matter quite a bit.

Regardless, though, note that we’ve now introduced another ex ante perspective, from which to evaluate precommitments you make before entering Gauthier’s hypothetical bargaining room – the room which itself then serves as the ex ante perspective from which to evaluate a further set of precommitments to abide by various moral principles. The layers of precommitments are stacking up, here, and just as it was unclear exactly which hypothetical first-order bargaining table Gauthier has in mind, and what justifies that one in particular, so too is it unclear what second-order ex ante perspective informs the dispositions you’re supposed to bring to the first.

This problem becomes more acute when we turn to Gauthier’s appeal to the “Lockean proviso” – a highly specific principle, the justification of which I remain unclear on. Does Gauthier think all instrumentally rational agents will pre-commit to requiring adherence to the proviso as prerequisite to cooperation, from whatever ex ante perspective is supposed to apply prior to entering the bargaining room? If so, I remain unpersuaded (see footnote for details).^[3] And if not, I don’t understand how the proviso gets off the ground from the maximizing, subjectivist standpoint that Gauthier is starting with.

One can imagine, though, a more thoroughly realpolitik version of Gauthier’s theory, which takes a proviso-less “state of nature” (e.g., what everyone would do if they were just following their utility-maximizing individual strategies) as the backdrop to, and disagreement point for, the bargaining process. Of course, the state of nature, Gauthier readily acknowledges, may incentivize all sorts of Hobbesian horrors (slavery, killing, rape, etc). And depending on which aspects of our actual lives are imported into the bargaining rooms, and when (as ever, I’m unclear on what knowledge and history Gauthier wants to “rewind”), this theory gives the wolves an incentive to exploit others as much as possible prior to arriving at the bargaining table, so that fruits of that exploitation can go to their advantage in the bargaining process (this is the type of thing that precommitment to requiring proviso-adherence is supposed to disincentivize).

V. Moral status = power?

This brings me to my final objection to a Gauthier-style picture: namely, that it ties an agent’s moral status much too closely to that agent’s power, and licenses arbitrarily bad behavior towards the sufficiently weak (and unloved-by-the-powerful).

Consider lies. For Gauthier, the whole reason to actually adopt a compliance disposition, as opposed to merely pretend to do so, is that other people would catch the pretense and refuse to cooperate with you. Faced with agents who can’t tell wolves wearing sheepskin from real sheep, though, the Gauthierian wolf will just don the skin and say “baaaa,” then feast come nightfall. Gauthier hopes that humans are sufficiently “translucent” (e.g., our true dispositions are sufficiently epistemically accessible to one another) that the incentives will work out well in practice, but it’s an open question, and one dependent on the specific agents involved. In principle, at least, Gauthier will endorse defecting more readily on less intelligent or socially discerning humans.

In this sense, the ability to detect future defection is key to moral status, on Gauthier’s picture. And such status is importantly relative to the deceptive capabilities of the maybe-wolf. Thus, Bob may have moral status relative to a clumsy con artist, but not to a more sophisticated one; and if you’re a good enough con, you may be able to transcend moral constraint altogether, at least in your local scene. But is a better poker face such a direct route to blamelessness? Is a lie detector also a moral-status-increaser?

More importantly, though: discernment alone won’t save you from the Gautherian wolves. You also need to be sufficiently advantageous to cooperate with. And here, I think, is where Gauthier’s picture leaves conventional morality most fully behind. A Gauthierian wolf refrains from feasting on a sheep when, and because, they can grow a garden together instead. But some sheep can’t garden; and some taste better than broccoli. In Gauthier’s world, these sheep just get eaten. Indeed, in this world, a wolf can do whatever sadistic and horrifying stuff it wants to a sufficiently disempowered sheep. Regardless of the sheep’s other properties (consciousness, capacity for suffering, etc), it has no morally relevant interests whatsoever. With no might, it’s got no rights.

Thus, suppose that two men, Big and Little, can each create one happy day for themselves, or they can give two happy days to the other. But Big also has a third option: to create three happy days for himself by burning Little alive. Assuming that Big is an egoist, here Gauthier feeds Little to the flames.

And if we lower Little’s capacity for contribution to a cooperative arrangement, Gauthier burns him more and more readily. Thus, if Little can only offer Big 1.1 happy days, Big will burn Little if it gets him 1.2. And in the limit where Little can offer nothing, Big excludes him from moral consideration entirely, regardless of the other payoffs.

Perhaps one protests: Big and Little would have agreed to some other arrangement, from behind some veil of further ignorance – for example, a veil that left Big uncertain whether he would be lighting the fire, or feeling it burn. But Gauthier eschews such Rawlsian namby-pamby, at least in full generality. When bargaining, Gauthierian agents know who they are.

Perhaps some will look, nevertheless, for some other ex ante perspective, compatible with Big’s knowledge of his identity, but which nevertheless incentivizes him to pre-commit to nicer behavior if he ever encounters a situation like this. But as I tried to emphasize eearlier, you can’t just point to any old hypothetical ex ante perspective that would mandate the behavior you’re hoping for. You have to explain why Big should abide by the precommitments he would have made from this perspective in particular. Recall the problems with Principle P above: hypothetically rational precommitments are too cheap.

And note that the proviso doesn’t help, either. The proviso, for Gauthier, only governs the behavior of agents who are intending to cooperate with each other. But the whole problem, in cases like ones above, is that Little isn’t useful enough to Big to be worth cooperating with.

Maybe some other powerful-but-altruistic person’s preferences will protect Little from Big? But in the case above, no altruist saves Little from the flames. And regardless: does morality leave the powerless so vulnerable to the whims of the powerful, and to the bargains the altruists happen to be a part of?

These are classic problems for social contract-ish theories. Where do they leave people with disabilities? People who live in the future (and so can’t reciprocate cooperation)? Non-human animals? And such questions bear, as well, on more descriptive investigations of how social-contract-ish lenses can illuminate structures of domination, in which one group coordinates to exploit and perpetuate the disempowerment of another (see e.g. Pateman (1988) and Mills (1997)). On Gauthier’s view, some exploitation of this type will likely end up compatible with “morality.”

In the face of problems like this, some social contract-ish theorists, like Scanlon, say that their social contract only covers one part of morality: your reasons not to e.g. torture a helpless deer just for fun come from some other part. But such a move isn’t available to Gauthier, who is trying to build his morality on the cold rock of agents maximizing arbitrary utility functions.

This rock is just too cold.

VI. Reconstructing vs. predicting morality

Analytic ethics is sometimes accused of aiming overmuch at justifying the contingent, bourgeois morality of its time. Philosophers and their historical peers have some stuff they “want to say” about morality – some moral-ish stuff they are already doing (or not doing), and which they hope to put on more rational footing. But too often, they end up contorting their premises, or hand-waving about their inferences, in order to reach their favored conclusions.

Gauthier does less of this than some. There will be differences, he acknowledges, between his morality, and “the morality learned from parents and peers, priests and teachers” (p. 6). Still, though, his project (like that of Rawls, Scanlon, and others) feels like one of reconstruction. He hopes to emerge with a fairly conventional, liberal morality (albeit, with a more realpolitik flavor); to have justified playing by its rules even to egoists, paperclippers, fools, knaves; and to have done so by appeal only to the hard-nosed logic of (suitably sophisticated) utility maximization.

In this, I think, he fails. In particular: the morality he gives us is not the morality we wanted. It’s a morality that burns Little alive, tortures deer just for fun, and advises you not to beat up homeless men because in some alternative universe you’d get fewer cookies. If you’re lucky (read: powerful), it’s better than Hobbes’s state of nature – and perhaps, in practice (and together with more conventional incentives) this counts for a lot. But the moralists wanted more.

But beyond Gauthier’s failure to reconstruct conventional morality, I also worry that the temptation to try to reconstruct conventional morality, via tools like game theory and decision theory, will distort our approach to a different but arguably more important project: that is, predicting how very sophisticated rational agents (for example, advanced AI systems) will in fact behave in different real-world circumstances. In a sense, Gauthier tries both projects at once; and perhaps for this reason, he succeeds, I think, at neither (where else, in game theory that isn’t aiming to reconstruct liberal philosophy, have you seen talk about the Lockean proviso?). And failures in the latter respect are not just academic, especially as we transition into an era of building very powerful, non-human agents – agents that might be, to us, as Big is to Little. If, buoyed by wishful attempts to justify or explain our pre-existing practices via some universal foundation, we come to expect that wolves and paperclippers will become nice cooperative liberals as soon as they’re smart and self-modifiable enough, then I worry that we’re in for a rude awakening.

That said, I also want to say a few final words on Gauthier’s behalf. First, I do think that a lot of everyday moral life has more of a Gauthier-ian flavor than the moralists in the philosophy seminars readily acknowledge. Such seminars often focus on agents falling over themselves to put their purely Other-focused aspirations into action (pushing trolleys, jumping in ponds – the usual). But the warp and woof of everyday moral life weaves in a much more in the way of self, reputation, reciprocity (not to mention more straightforward punishment and reward) – dynamics that Gauthier makes feel more familiar than do some of his more idealistic peers.

Second, there can be something over-pious about relentlessly tut-tut-ing at every failed effort to deliver the morality the moralists wanted – the morality that binds your rationality regardless of your goals, and authoritatively demands your favored forms of niceness from all agents in all circumstances. Following the Parfitians, one can always demand that such morality be conjured from some non-natural air, else all is dust and ashes; but naturalists, especially, should take seriously the possibility that it’s just not there, and to go looking for what is. And Gauthier, at least, is pointing at real stuff, and trying to build on it. Perhaps we should not follow him in trying to call what he’s pointing at “morality” – “cooperation” seems to me a fine term on its own, and for reasons to do with Big vs. Little, worth keeping distinct. But if this thing, whatever we call it, leaves us closer than we would’ve liked to the state of nature, we still should wonder whether, perhaps, that’s where we actually live. And as ever – but especially when dealing with wolves and paperclippers – it’s best to know where you are.

What’s more, I remain more compelled by some vision of “playing by the rules of cooperative arrangements you’d want to make possible”, and of “approaching people ready to abide by agreements you would’ve made,” then the arguments and objections I’ve surveyed here support. Whether Gauthier captures it or no, something in the vicinity seems to me pretty cool. On its own, I wouldn’t count on it to protect the weak from the strong, or the naive from the devious – we need something else for that, and something, I expect, that doesn’t feel like “taxes.” And it won’t tell a paperclipper to love rainbows and butterflies instead. But plausibly—and especially with some rainbows, butterflies, and more conventional incentives thrown in—it helps to prevent wars and to build civilizations; it helps to make the commons less tragic; it helps prisoners of dilemmas go free. Perhaps, one day, it can even save the world. It’s not, on its own, the morality we wanted. But I wonder how often it’s the morality we need.

^
Here, some people I know will want to appeal to the precommitments you’d make from the perspective of the Universal Distribution. And note that some actions that look like “burning value for certain” stop doing so if you posit that there are lots of other versions of yourself – across the multiverse, in different quantum branches, in the platonic realm of all logically possible objects, in the realm of logically impossible objects, outside the simulation you maybe live in, and so on – whose utilities you care about, and whose fates your actions have acausal implications for. I’m going to save views in this vein for another post, though. Gauthier, at least, doesn’t appeal to such fanciness, and it seems strange, pro tanto, if the rationality of moral behavior requires it.
^
H/t Carl Shulman for pointing me to Richard Wrangham’s (2019) work on the possible role of “if I’m not cooperative, a group of language-using males may team up to literally execute me” in explaining humanity’s path to domestication – though Gauthierian moral motivations are less directly instrumental.
^
There’s a ton of detail in Gauthier about the proviso, a lot of which I find kind of gnarly and unsatisfying – and which I currently expect to be vulnerable to tons of counterexamples if we put our mind to it – but which I also haven’t tried to sort through in depth. That said, here’s an example, adapted from Gauthier’s own (see p. 190) to illustrate a few of the issues here.

Suppose that two men – Big and Little – get stranded together on an island, which naturally grows ten utils of fruit. Big is much stronger and faster than Little, so he takes all the fruit immediately, and then decides to coerce Little into growing a five util garden for him, too, by periodically beating Little up. After a while, though, he realizes that both he and Little would do better if (a) he committed to stopping the beatings, and (b) Little committed to working voluntarily (let’s say that mechanisms are available for making binding and credible commitments of both kinds). So Big offers this deal to Little. And let’s say, further, that this is in fact the deal that would be chosen via the KS-bargaining procedure, if we took the current Little-gets-beaten-up situation as the disagreement point (I spent a bit trying to set up the example more quantitatively, but it’s a bit of a hassle so I’m going to skip).

My understanding is that according to Gauthier, Little should refuse this offer, and hold out instead for the outcome that would be chosen via the KS-solution, relative to some other disagreement point, in which Big hadn’t “bettered” his situation in a way that “worsened” Little’s. In the case above, for example, if Big had merely taken all the fruit, and stopped there, he would’ve gotten just as much as he would’ve gotten if Little didn’t exist (let’s say that Big doesn’t know how to garden), and Big would not have “bettered” his situation, even though he would have worsened Little’s (recall that the proviso allows agents to worsen the situations of others in order to avoid worsening their own). So perhaps that should be the disagreement point instead.

One issue here is that this calculus changes in unintuitive ways if, for example, the island would have grown twenty utils of fruit, if Big had been alone on it (perhaps its ecosystem suffers from the mere presence of two people). In that case, it’s not clear that Big’s coercion towards Little does violate the proviso – it leaves Little worse off, yes, but only in service of Big’s avoiding worsening his own situation as much as possible (relative to his twenty util baseline). I expect the proviso as stated to include lots of weird sensitivities like this (and/or unclear implications in cases that push on it).

A bigger issue, though, is that it’s unclear to me what distinguishes Little’s precommitment to refusing proviso-incompatible disagreement points from other sorts of Ultimatum-game flavored precommitments Big could make. That is, just as Little can pre-commit to refusing offers like Big’s, in favor of something better-for-Little and still pareto-improving on the status quo, so too can Big refuse Little’s offer, in favor of something pareto-improving and better-for-Big. So it’s not enough to show that Little will do better if he can “get his precommitment in” ahead of Big (this will often be true). In general, we need to show that proviso-flavored precommitments are uniquely favored for both parties to engage in and accept. Gauthier has various things to say about “impartiality,” here, and about the sense in which the proviso is the uniquely best balance between “not being taken advantage of” and “freedom to benefit yourself” (see e.g. p. 227). But I don’t understand what underlying story selects the proviso in particular, here – and Gauthier doesn’t do much to survey the alternatives.

Another issue is that sometimes, requiring adherence to the proviso will make it the case that the agreement point is no longer advantageous for Big, relative to the status quo – thereby nullifying Big’s incentive to bargain at all, thus leaving Little worse off than he didn’t require proviso-compatibility for the disagreement point. Here, Gauthier (p. 230) says some stuff about distinguishing between “compliance” and “acquiescence,” where the latter is kind of a second-rate form of cooperation that allows for disagreement points that are incompatibility with the proviso (thereby making cooperation possible at all), and which Gauthier says Little should go for in this case.

Indeed, Gauthier admits that agents with superior technology will frequently be in a position to violate the proviso, and then to induce acquiescence of this type. He writes that because technological advantage amounts to greater ability to achieve your ends, it is in a sense a form of superior “rationality”; but his argument in support of the proviso “rests on an assumption of equality rationality which differences in technology deny” (p. 231). I find this a strange concession, as I would’ve thought we would want our morality to handle interactions between agents with different levels of technological power.

More broadly, I just don’t expect the proviso to be the most attractive principle for structuring our bargaining set up. In particular, as I understand it (I’m maybe missing some details here), it includes a kind of three-tiered lexical priority ranking, where you first (a) avoid worsening your own situation, then (b) avoid worsening someone else’s, and then (c) better your own situation as much as possible. I generally expect lexical priority stuff of this flavor to go badly (especially if it doesn’t incorporate risk – though Gauthier often works with expected values built in), and to throw one value under the bus for arbitrarily small gains in another value. Thus, e.g., here you can do arbitrarily terrible stuff to others to avoid worsening your own situation even slightly, you forgo arbitrary benefits to yourself to avoid worsening someone else’s situation slightly, and you skip arbitrary bettering of someone else’s situation for tiny betterings of yours. Couldn’t rational agents pre-committing to some bargaining set up do better than this?