Consequentialists (in society) should self-modify to have side constraints

Tl;dr In a society of peers where one’s internal motivations are somewhat transparent to others, having a deontological aversion to antisocial actions can make you more trustworthy than just abstaining from those actions for consequentialist reasons. Since being highly trusted is instrumentally valuable for many purposes, consequentialists should often prefer to self-modify[1] to have deontological constraints on their actions[2].

The utilitarian and deontological surgeons

You live in a city well known for its shortage of organs for transplant. Typically when a healthy young person dies this can be used to save the lives of several other people.

You’re choosing between two surgeons:

  • Andrew is a deontologist about murder. He would never murder a patient even if their organs could help more people, because he thinks murder is abhorrent, and approximately unthinkable.

  • Bob is a consequentialist about murder. He would never murder a patient even if their organs could help more people, because he thinks the risks of being discovered are too large and would lead both to him personally and to the reputation of consequentialism are too large to be worth countenancing.

Otherwise they’re both rated as excellent. Who would you feel more comfortable having surgery from?

We think for most people the answer is “Andrew”. While both surgeons have strong ethical reasons not to murder you, Andrew’s reasons feel more robust. In order to murder a patient Andrew would have to throw out the entire basis of his morality. Bob would merely have to make a large error in reasoning — perhaps mistakenly thinking that in this particular case the risks of discovery are so low as to justify the murder. Or worse, perhaps correctly noticing that in this particular case the chances of discovery are essentially zero, so the murder is mandated.

Since both surgeons would prefer to have more patients choose them (so that they can make more money that they can use for ends that seem good to them), it’s likely that Bob should self-modify (if he can) to adopt a deontological constraint on murder. This needn’t stop him remaining consequentialist in other regards.

It seems possible that people who are utilitarian with side-constraints have intuitions that track this argument.

Other examples

  • It’s easy to feel higher trust in the word of a person who abhors lying than the word of someone that you know doesn’t want to lie because they don’t want a reputation as a liar

  • We can feel deeper trust that our friends will be there for us even if something goes very wrong in our lives if their kindness is based on something like the virtue of being there for one’s friends rather than a consequentialist belief that it’s worth it for the sake of what you’ll go on to do

  • We feel more trust in organizations that avoid behaviours we feel are immoral if we think they are avoiding them because they’re wrong rather than because of PR reasons

Doesn’t integrity for consequentialists solve this?

Maybe. But the same arguments about murder or lying might apply — we may be more suspicious of someone who seems to hold onto integrity purely as an instrumental virtue for consequentialist ends (since they might later discover arguments that they believe mean they shouldn’t have integrity in this particular case).

On the other hand it could make total sense for consequentialists to adopt integrity as an intrinsic virtue, and let most of the deontological stuff follow from that.

What are the limits on this recommendation?

From a consequentialist perspective, it won’t always be worth self-modifying to be deontologist or intrinsically virtuous about an issue. It depends on how large the benefits are (in terms of increased trust), and how large the costs are (in terms of times one might be forced to depart from the consequentially-optimal behaviour).

It’s important to work fully that the self-modification is ~irrevocable — that one feels a deep sense of rightness or wrongness about the relevant actions. Otherwise it may seem to others like the self-modification could be undone given strong enough consequentialist reasons. (Of course that would reduce the costs, so it might sometimes be correct to do such a reversible self-modification, so long as the circumstances in which one would self-modify are not trivial.)

Our argument here doesn’t specify how much self-modification consequentialists should do. We’re just pointing out the basic mechanism for why the correct amount is probably nonzero for most agents (like humans) whose inner workings are somewhat transparent to their peers, and who benefit from being trusted. (It’s an open question which AI systems this would apply to.)

  1. ^

    In practice, this probably looks like leaning into and/​or refraining from over-riding common-sense moral intuitions.

  2. ^

    Though self-identified consequentialists are probably wrong if they think they don’t have some of these deontological constraints already—e.g. wrong if they think their intuitive morality wouldn’t prevent them from directly and deliberately harming someone even if there were a consequentialist argument for it.