I feel increasingly unsympathetic to hedonism (and maybe experientalism generally?). Yes, emotions matter, and the strength of emotions could be taken to mean how much something matters, but if you separate a cow and her calf and they’re distressed by this, the appropriate response for their sake is not to drug or fool them until they feel better, it’s to reunite them. What they want is each other, not to feel better. Sometimes I think about something bad in the world that makes me sad; I don’t think you do me any favour by just taking away my sadness; I don’t want to stop feeling sad, what I want is for the bad in the world to be addressed.

Rather than affect being what matters in itself, maybe affect is a signal for what matters and its intensity tells us how much it matters. Hedonism as normally understood would therefore be like Goodhart’s law: it ignores the objects of our emotions.

Of course, often we do just want to feel better, and that matters, too. If someone wants to not suffer, then of course they should not suffer.

Utility functions (preferential or ethical, e.g. social welfare functions) can have lexicality, so that a difference in category A can be larger than the maximum difference in category B, but we can still make probabilistic tradeoffs between them. This can be done, for example, by having separate utility functions, fA:X→R and fB:X→R for A and B, respectively, such that

fA(x)−fA(y)≥1 for all x satisfying the condition P(x) and all y satisfying Q(y) (e.g.Q(y) can be the negation of P(y), although this would normally lead to discontinuity).

fB is bounded to have range in the interval [0,1] (or range in an interval of length at most 1).

Then we can define our utility function as the sum f=fA+fB , so

f(x)=fA(x)+fB(x)

This ensures that all outcomes with P(x) are at least as good as all outcomes with Q(x), without being Pascalian/fanatical to maximize fA regardless of what happens to fB. Note, however, that fB may be increasingly difficult to change as the number of moral patients increases, so we may approximate Pascalian fanaticism in this limit, anyway.

For example, fA(x)≤−1 if there is any suffering in x that meets a certain threshold of intensity, Q(x), and fA(x)=0 if there is no suffering at all in x, P(x). f can still be continuous this way.

If the probability that this threshold is met is p,0≤p<1 and the expected value of fA conditional on this is bounded below by −L, L>0, regardless of p for the choices available to you, then increasing fB by at least pL, which can be small, is better than trying to reduce p.

As another example, an AI could be incentivized to ensure it gets monitored by law enforcement. Its reward function could look like

f(x)=∞∑i=1IMi(x)+fB(x)

where IMi(x) is 1 if the AI is monitored by law enforcement and passes some test (or did nothing?) in period i, and 0 otherwise. You could put an upper bound on the number of periods or use discounting to ensure the right term can’t evaluate to infinity since that would allow fB to be ignored (maybe the AI will predict its expected lifetime to be infinite), but this would eventually allow fB to overcome the IMi, unless you also discount the future in fB.

This should also allow us to modify the utility function fB, if preventing the modification would cause a test to be failed.

Furthermore, satisfying the IMi(x) strongly lexically dominates increasing fB(x), but we can still make expected tradeoffs between them.

The problem then reduces to designing the AI in such a way that it can’t cheat on the test, which might be something we can hard-code into it (e.g. its internal states and outputs are automatically sent to law enforcement), and so could be easier than getting fB right.

This overall approach can be repeated for any finite number of functions, f1,f2,…,fn. Recursively, you could define

gn+1(x)=σ(gn(x))+fn+1(x)

for σ:R→R increasing and bounded with range in an interval of length at most 1, e.g. some sigmoid function. In this way, each fk dominates the previous ones, as above.

To adapt to a more deontological approach (not rule violation minimization, but according to which you should not break a rule in order to avoid violating a rule later), you could use geometric discounting, and your (moral) utility function could look like:

f(x)=−∞∑i=0riI(xi),

where

1.x is the act and its consequences without uncertainty and you maximize the expected value of f over uncertainty in x,

2.x is broken into infinitely many disjoint intervals xi, with xi coming just before xi+1 temporally (and these intervals are chosen to have the same time endpoints for each possible x),

3.I(xi)=1 if a rule is broken in xi, and 0 otherwise, and

4.r is a constant, 0<r≤0.5.

So, the idea is that f(x)>f(y) if and only if the earliest rule violation in x happens later than the earliest one in y (at the level of precision determined by how the intervals are broken up). The value of r≤0.5 ensures this. (Well, there are some rare exceptions if r=0.5). You essentially count rule violations and minimize the number of them, but you use geometric discounting based on when the rule violation happens in such a way to ensure that it’s always worse to break a rule earlier than to break any number of rules later.

However, breaking x up into intervals this way probably sucks for a lot of reasons, and I doubt it would lead to prescriptions people with deontological views endorse when they maximize expected values.

This approach basically took for granted that a rule is broken not when I act, but when a particular consequence occurs.

If, on the other hand, a rule is broken at the time I act, maybe I need to use some functions Ii(x) instead of the I(xi), because whether or not I act now (in time interval i) and break a rule depends on what happens in the future. This way, however, Ii(x) could basically always be 1, so I don’t think this approach works.

This nesting approach with σ above also allows us to “fix” maximin/leximin under conditions of uncertainty to avoid Pascalian fanaticism, given a finite discretization of welfare levels or finite number of lexical thresholds. Let the welfare levels be t0>t1>⋯>tn, and define:

fk(x)=−∑iI(ui≤tk)

i.e.fk(x) is the number of individuals with welfare level at most tk, where uiis the welfare of individual i, and I(ui≤tk) is 1 if ui≤tk and 0 otherwise. Alternatively, we could use I(tk+1<ui≤tk).

In situations without uncertainty, this requires us to first choose among options that minimize the number of individuals with welfare at most tn, because fn takes priority over fk, for all k<n, and then, having done that, choose among those that minimize the number of individuals with welfare at most tn−1, since fn−1 takes priority over fk, for all k<n−1, and then choose among those that minimize the number of individuals with welfare at most tn−2, and so on, until t0.

This particular social welfare function assigns negative value to new existences when there are no impacts on others, which leximin/maximin need not do in general, although it typically does in practice, anyway.

This approach does not require welfare to be cardinal, i.e. adding and dividing welfare levels need not be defined. It also dodges representation theorems like this one (or the stronger one in Lemma 1 here, see the discussion here), because continuity is not satisfied (and welfare need not have any topological structure at all, let alone be real-valued). Yet, it still satisfies anonymity/symmetry/impartiality, monotonicity/Pareto, and separability/independence. Separability means that whether one outcome is better or worse than another does not depend on individuals unaffected by the choice between the two.

Here’s a way to capture lexical threshold utilitarianism with a separable theory and while avoiding Pascalian fanaticism, with a negative threshold t−<0 and a positive threshold t+ > 0:

σ(∑iui)+∑iI(ui≥t+)−∑iI(ui≤t−)

The first term is just standard utilitarianism, but squashed with a function σ:R→R into an interval of length at most 1.

The second/middle sum is the number of individuals (or experiences or person-moments) with welfare at least t+, which we add to the first term. Any change in number past this threshold dominates the first term.

The third/last sum is the number of individuals with welfare at most t−, which we subtract from the rest. Any change in number past this threshold dominates the first term.

Either of the second or third term can be omitted.

We could require t−≤ui≤t+ for all i, although this isn’t necessary.

More thresholds could be used, as in this comment: we would apply σ to the whole expression above, and then add new terms like the second and/or the third, with thresholds t++>t+ and t−−<t−, and repeat as necessary.

while the fact that a person’s life would be worse than no life at all … constitutes a strong moral reason for not bringing him into existence, the fact that a person’s life would be worth living provides no (or only a relatively weak) moral reason for bringing him into existence.

This is a summary of the argument for the procreation asymmetry here and in the comments, especially this comment, which also looks further at the case of bringing someone into existence with a good life. I think this is an actualist argument, similar to Krister Bykvist’s argument in 2.1 (which cites Dan Brock from this book) and Derek Parfit’s argument on p.150 of Reasons and Persons, and Johann Frick’s argument (although his is not actualist, and he explicitly rejects actualism). The starting claim is that your ethical reasons are in some sense conditional on the existence of individuals, and the asymmetry between existence and nonexistence can lead to the procreation asymmetry.

1. From an outcome in which an individual doesn’t/won’t exist, they don’t have any interests that would give you a reason to believe that another outcome is better on their account (they have no account!). So, ignoring other reasons, this outcome is not dominated by any other, and the welfare of an individual whom we could bring into existence is not in itself a reason to bring them into existence. This is reflected by the absence of arrows starting from the Nonexistence block in the image above.

2. An existing individual (or an individual who will exist) has interests. In an outcome in which they have a bad life, an outcome in which they didn’t exist would have been better for them from the point of view of the outcome in which they do exist with a bad life, so an outcome with a bad life is dominated by one in which they don’t exist, ignoring other reasons. Choosing an outcome which is dominated this way is worse than choosing an outcome that dominates it. So, that an individual would have negative welfare is a reason to prevent them from coming into existence. This is reflected by the arrow from Negative existence to Nonexistence in the image above.

3. If the individual would have had a good life, we could say that this would be better than their nonexistence and dominates it (ignoring other reasons), but this only applies from outcomes in which they exist and have a good life. If they never existed, because of 1, it would not be dominated from that outcome (ignoring other reasons).

Together, 1 and 2 are the procreation asymmetry (reversing the order of the two claims from McMahan’s formulation).

Consequently, even if it is better for p to exist than not to exist, assuming she has a life worth living, it doesn’t follow that it would have been worse for p if she did not exist, since one of the relata, p, would then have been absent. What does follow is only that non-existence is worse for her than existence (since ‘worse’ is just the converse of ‘better’), but not that it would have been worse if she didn’t exist.

The footnote that expands on this:

Rabinowicz suggested this argument already back in 2000 in personal conversation with Arrhenius, Broome, Bykvist, and Erik Carlson at a workshop in Leipzig; and he has briefly presented it in Rabinowicz (2003), fn. 29, and in more detail in Rabinowicz (2009a), fn. 2. For a similar argument, see Arrhenius (1999), p. 158, who suggests that an affirmative answer to the existential question “only involves a claim that if a person exists, then she can compare the value of her life to her non-existence. A person that will never exist cannot, of course, compare “her” non-existence with her existence. Consequently, one can claim that it is better … for a person to exist … than … not to exist without implying any absurdities.” Cf. also Holtug (2001), p. 374f. In fact, even though he accepted the negative answer to the existential question (and instead went for the view that it can be good but not better for a person to exist than not to exist), Parfit (1984) came very close to making the same point as we are making when he observed that there is nothing problematic in the claim that one can benefit a person by causing her to exist: “In judging that some person’s life is worth living, or better than nothing, we need not be implying that it would have been worse for this person if he had never existed. --- Since this person does exist, we can refer to this person when describing the alternative [i.e. the world in which she wouldn’t have existed]. We know who it is who, in this possible alternative, would never have existed” (pp. 487-8, emphasis in original; cf. fn. 9 above). See also Holtug (2001), Bykvist (2007) and Johansson (2010).

You could equally apply this argument to individual experiences, for an asymmetry between suffering and pleasure, as long as whenever an individual suffers, they have an interest in not suffering, and it’s not the case that each individual, at every moment, has an interest in more pleasure, even if they don’t know it or want it.

Something only matters if it matters (or will matter) to someone, and an absence of pleasure doesn’t necessarily matter to someone who isn’t experiencing pleasure* and certainly doesn’t matter to someone who does not and will not exist, and so we have no inherent reason to promote pleasure. On the other hand, there’s no suffering unless someone is experiencing it, and according to some definitions of suffering, it necessarily matters to the sufferer.

* for example, when concentrating in a flow state, while asleep, when content.

And we can turn this into a wide person-affecting view to solve the Nonidentity problem by claiming that identity doesn’t matter. To make the above argument fit better with this, we can rephrase it slightly to refer to “extra individuals” or “no extra individuals” rather than any specific individuals who will or won’t exist. Frick makes a separate general claim that if exactly one of two normative standards (e.g. people, with interests) will exist, and they are standards of the same kind (e.g. the extent to which people’s interests are satisfied can be compared), then it’s better for the one which will be better satisfied to apply (e.g. the better off person should come to exist).

On the other hand, a narrow view might still allow us to say that it’s worse to bring a worse off individual into existence with a bad life than a better off one, if our reasons against bringing an individual into existence with a bad life are stronger the worse off they would be, a claim I’d expect to be widely accepted. If we apply the view to individual experiences or person-moments, the result seems to be a negative axiology, in which only the negative matters, on, and with hedonism, only suffering would matter. Whether or not this follows can depend on how the procreation asymmetry is captured, and there are systems in which it would not follow, e.g. the narrow asymmetric views here, although these reject the independence of irrelevant alternatives.

Under standard order assumptions which include the independence of irrelevant alternatives and completeness, the procreation asymmetry does imply a negative axiology.

I think EA hasn’t sufficiently explored the use of different types of empirical studies from which we can rigorously estimate causal effects, other than randomized controlled trials (or other experiments). This leaves us either relying heavily on subjective estimates of the magnitudes of causal effects based on weak evidence, anecdotes, expert opinion or basically guesses, or being skeptical of interventions whose cost-effectiveness estimates don’t come from RCTs. I’d say I’m pretty skeptical, but not so skeptical that I think we need RCTs to conclude anything about the magnitudes of causal effects. There are methods to do causal inference from observational data.

2. Relying too much on guesses and poor studies in the effective animal advocacy space (especially in the past), for example overestimating the value of leafletting. I think things have improved a lot since then, and I thought the evidence presented in the work of Rethink Priorities, Charity Entrepreneurship and Founders Pledge on corporate campaigns was good enough to meet the bar for me to donate to support corporate campaigns specifically. Humane League Labs and some academics have done and are doing research to estimate causal effects from observational data that can inform EAA.

This is an argument against hedonic utility being cardinal and for widespread commensurability between hedonic experiences of different kinds. It seems that our tradeoffs, however we arrive at them, don’t track the moral value of hedonic experiences.

Let X be some method or system by which we think we can establish the cardinality and/or commensurability of our hedonic experiences, and rough tradeoff rates. For example, X=reinforcement learning system in our brains, our actual choices, or our judgements of value (including intensity).

If X is not identical to our hedonic experiences, then it may be the case that X is itself what’s forcing the observed cardinality and/or commensurability onto our hedonic experiences. But if it’s X that’s doing this, and it’s the hedonic experiences themselves that are of moral value, then that cardinality and/or commensurability are properties of X, not our hedonic experiences themselves. So the observed cardinality and/or commensurability is a moral illusion.

Here’s a more specific illustration of this argument:

Do our reinforcement systems have access to our whole experiences (or the whole hedonic component), or only some subsets of those neurons that are firing that are responsible for them? And what if they’re more strongly connected to parts of the brain for certain kinds of experiences than others? It seems like there’s a continuum of ways our reinforcement systems could be off or even badly off, so it would be more surprising to me that it would track true moral tradeoffs perfectly. Change (or add or remove) one connection between a neuron in the hedonic system and one in the reinforcement system, and now the tradeoffs made will be different, without affecting the moral value of the hedonic states. If the link between hedonic intensity and reinforcement strength is so fragile, what are the chances the reinforcement system has got it exactly right in the first place? Should be 0 (assuming my model is right).

At least for similar hedonic experiences of different intensities, if they’re actually cardinal, we might expect the reinforcement system to capture some continuous monotonic transformation and not a linear transformation. But then it could be applying different monotonic transformations to different kinds of hedonic experiences. So why should we trust the tradeoffs between these different kinds of hedonic experiences?

The “cardinal hedonist” might object that X (e.g. introspective judgement of intensity) could be identical to our hedonistic experiences, or does track their cardinality closely enough.

I think, as a matter of fact, X will necessarily involve extra (neural) machinery that can distort our judgements, as I illustrate with the reinforcement learning case. It could be that our judgements are still approximately correct despite this, though.

Most importantly, the accuracy of our judgements depends on there being something fundamental that they’re tracking in the first place, so I think hedonists who use cardinal judgements of intensity owe us a good explanation for where this supposed cardinality comes from, which I expect is not possible with our current understanding of neuroscience, and I’m skeptical that it will ever be possible. I think there’s a great deal of unavoidable arbitrariness in our understanding of consciousness.

Here’s an illustration with math. Let’s consider two kinds of hedonic experiences, A and B, with at least three different (signed) intensities each, a1<a2<a3 and b1<b2<b3, respectively, with IA={a1,a2,a3},IB={b1,b2,b3}. These intensities are at least ordered, but not necessarily cardinal like real numbers or integers and we can’t necessarily compare A and B. For example, A and B might be pleasure and suffering generally (with suffering negatively signed), or more specific experiences of these.

Then, what X does is map these intensities to numbers through some function,

f:IA∪IB→R

satisfying f(a1)<f(a2)<f(a3) and f(b1)<f(b2)<f(b3). We might even let IA and IB be some ordered continuous intervals, isomorphic to a real-valued interval, and have f be continuous and increasing on each of IA and IB, but again, it’s f that’s introducing the cardinalization and commensurability (or a different cardinalization and commensurability from the real one, if any); these aren’t inherent to A and B.

1. each individual can sometimes sacrifice some A for more B for themself,

2. we should be impartial, and

3. transitivity and the independence of irrelevant alternatives hold,

then it’s sometimes ethical to sacrifice A from one individual for more B for another. This isn’t too surprising, but let’s look at the argument, which is pretty simple, and discuss some examples.

Proof. Consider the following three options, with two individuals, x and y, and a+>a amounts of A, b+>b amounts of B:

i. x:(a:A,b+:B), y:(a:A,b:B) , read as x has amount a of A and amount b+ of B, while y has amount a of A and amount b of B.

ii. x:(a+:A,b:B), y:(a:A,b:B)

iii. x:(a:A,b:B), y:(a+:A,b:B)

Here we have i > ii by 1 for some a, a+, b and b+, and ii = iii by impartiality, so together i > iii by 3, and we sacrifice some A from y for some B from for x. QED

Remark: I did choose the amounts of A and B pretty specifically in this argument to match in certain ways. With continuous personal tradeoffs between A and B, and continuous tradeoffs between amounts of A between different individuals at all base levels of A, I think this should force continuous tradeoffs between one individual’s amount of A and another’s amount of B. We can omit the impartiality assumption in this case.

Possible examples:

A= hedonistic welfare, B= some non-hedonistic values

A= experiential values, B= some non-experiential values

A= absence or negative of suffering, B= knowing the truth, for its own sake (not its instrumental value)

A= absence or negative of suffering, B= pleasure

A= absence or negative of suffering, B= anything else that could be good

A= absence or negative of intense suffering, B= absence or negative of mild suffering

In particular, if you’d be willing to endure torture for some other good, you should be willing to allow others to be tortured for you to get more of that good.

I imagine people will take this either way, e.g. some will accept that it’s actually okay to let some be tortured for some other kind of benefit to different people, and others will accept that nothing can compensate them for torture. I fall into the latter camp.

Others might also reject the independence of irrelevant alternatives or transitivity, or their “spirit”, e.g. by individuating options to option sets. I’m pretty undecided about independence these days.

I’ve been thinking more lately about how I should be thinking about causal effects for cost-effectiveness estimates, in order to clarify my own skepticism of more speculative causes, especially longtermist ones, and better understand how skeptical I ought to be. Maybe I’m far too skeptical. Maybe I just haven’t come across a full model for causal effects that’s convincing since I haven’t been specifically looking. I’ve been referred to this in the past, and plan to get through it, since it might provide some missing pieces for the value of research. This also came up here.

Suppose I have two random variables, X and Y, and I want to know the causal effect of manipulating X on Y, if any.

1. If I’m confident there’s no causal relationship between the two, say due to spatial separation, I assume there is no causal effect, and Y conditional on the manipulation of X to take value A (possibly random), Y|do(X=A), is identical to Y, i.e. Y|do(X=A)=Y. (The do notation is Pearl’s do-calculus notation.)

2. If X could affect Y, but I know nothing else,

a. I might assume, based on symmetry (and chaos?) for Y, that Y|do(X=A) and Y are identical in distribution, but not necessarily literally equal as random variables. They might be slightly “shuffled” or permuted versions of each other (see symmetric decreasing rearrangements for specific examples of such a permutation). The difference in expected values is still 0. This is how I think about the effects of my every day decisions, like going to the store, breathing at particular times, etc. on future populations. I might assume the same for variables that depend on Y.

b. Or, I might think that manipulating X just injects noise into Y, possibly while preserving some of its statistics, e.g. the mean or median. A simple case is just adding random symmetric noise with mean and median 0 to Y. However, whether or not a statistic is preserved with the extra noise might be sensitive to the scale on which Y is measured. For example, if Y is real-valued, and f:R→R is strictly increasing, then for the median, med(f(Y))=f(med(Y)), but the same is not necessarily true for the expected value of Y, or for other variables that depend on Y.

c. Or, I might think that manipulating X makes Ycloser to a “default” distribution over the possible values of Y, often but not always uninformed or uniform. This can shift the mean, median, etc., of Y. For example, Y could be the face of the coin I see on my desk, and X could be whether I flip the coin or not, with X being not by default. So, if I do flip the coin and hence manipulate X, this randomizes the value of Y, making my probability distribution for its value uniformly random instead of a known, deterministic value. You might think that some systems are the result of optimization and therefore fragile, so random interventions might return them to prior “defaults”, e.g. naive systemic change or changes to ecosystems. This could be (like) regression to the mean.

I’m not sure how to balance these three possibilities generally. If I do think the effects are symmetric, I might go with a or b or some combination of them. In particular asymmetric cases, I might also combine c.

3. Suppose I have a plausible argument for how X could affect Y in a particular way, but no observations that can be used as suitable proxies, even very indirect, for counterfactuals with which to estimate the size of the effect. I lean towards dealing with this case as in 2, rather than just making assumptions about effect sizes without observations.

For example, someone might propose a causal path through which X affects Y with a missing estimate of effect size at at least one step along the path, but an argument to that this should increase the value of Y. It is not enough to consider only one such path, since there may be many paths from X to Y, e.g. different considerations for how X could affect Y, and these would need to be combined. Some could have opposite effects. By 2, those other paths, when combined with the proposed causal path, reduce the effects of X on Y through the proposed path. The longer the proposed path, the more unknown alternate paths.

I think this is where I am now with speculative longtermist causes. Part of this may be my ignorance of the proposed causal paths and estimates of effect sizes, since I haven’t looked too deeply at the justifications for these causes, but the dampening from unknown paths also applies when the effect sizes along a path are known, which is the next case.

4. Suppose I have a causal path through some other variable Z, X→Z→Y, so that X causes Z and Z causes Y, and I model both the effects of X→Z and Z→Y, based on observations. Should I just combine the two for the effect of X on Y? In general, not in the straightforward way. As in 3, there could be another causal path, X→Z′→Y (and it could be longer, instead of with just a single intermediate variable).

As in case 3, you can think of X→Z′→Y as dampening the effect of X→Z→Y, and with long proposed causal paths, we might expect the net effect to be small, consistently with the intuition that the predictable impacts on the far future decrease over time due to ignorance/noise and chaos, even though the actual impacts may compound due to chaos.

Maybe I’ll write this up as a full post after I’ve thought more about it. I imagine there’s been writing related to this, including in the EA and rationality communities.

Let’s consider a given preference from the point of view of a given outcome after choosing it, in which the preference either exists or does not, by cases:

1. The preference exists:

a. If there’s an outcome in which the preference exists and is more satisfied, and all else is equal, it would have been irrational to have chosen this one (over it, and at all).

b. If there’s an outcome in which the preference exists and is less satisfied, and all else is equal, it would have been irrational to have chosen the other outcome (over this one, and at all).

c. If there’s an outcome in which the preference does not exist, and all else is equal, the preference itself does not tell us if either would have been irrational to have chosen.

2. The preference doesn’t exist:

a. If there’s an outcome in which the preference exists, regardless of its degree of satisfaction, and all else equal, the preference itself does not tell us if either would have been irrational to have chosen.

So, all else equal besides the existence or degree of satisfaction of the given preference, it’s always rational to choose an outcome in which the preference does not exist, but it’s irrational to choose an outcome in which the preference exists but is less satisfied than in another outcome.

(I made a similar argument in the thread starting here.)

I also think that antifrustrationism in some sense overrides interests less than symmetric views (not to exclude “preference-affecting” views or mixtures as options, though). Rather than satisfying your existing preferences, according to symmetric views, it can be better to create new preferences in you and satisfy them, against your wishes. This undermines the appeal of autonomy and subjectivity that preference consequentialism had in the first place. If, on the other hand, new preferences don’t add positive value, then they can’t compensate for the violation of preferences, including the violation of preferences to not have your preferences manipulated in certain ways.

Consider the following two options for interests within one individual:

A. Interest 1 exists and is fully satisfied

B. Interest 1 exists and is not fully satisfied, and interest 2 exists and is (fully) satisfied.

A symmetric view would sometimes choose B, so that the creation of interests can take priority over interests that would exist regardless. In particular, the proposed benefit comes from satisfying an interest that would not have existed in the alternative, so it seems like we’re overriding the interests the individual would have in A with a new interest, interest 2. For example, we make someone want something and satisfy that want, at the expense of their other interests.

On the other hand, consider:

A. Interest 1 exists and is partially unsatisfied

B. Interest 1 exists and is fully satisfied, and interest 2 exists and is partially unsatisfied.

In this case, antifrustrationism would sometimes choose A, so that the removal or avoidance of an otherwise unsatisfied interest can take priority over (further) satisfying an interest that would exist anyway. But in this case, if we choose A because of concerns for interest 2, at least interest 2 would exist in the alternative A, so the benefit comes from the avoidance of an interest that would have otherwise existed. In A, compared to B, I wouldn’t say we’re overriding interests, we’re dealing with an interest, interest 2, that would have existed otherwise.

Smith and Black’s “The morality of creating and eliminating duties” deals with duties rather than preferences, and argues that assigning positive value to duties and their satisfaction leads to perverse conclusions like the above with preferences, and they have a formal proof for this under certain conditions.

Some related writings, although not making the same point I am here:

I also think this argument isn’t specific to preferences, but could be extended to any interests, values or normative standards that are necessarily held by individuals (or other objects), including basically everything people value (see here for a non-exhaustive list). See Johann Frick’s paper and thesis which defend the procreation asymmetry, and my other post here.

Then, if you extend these comparisons to satisfy the independence of irrelevant alternatives by stating that in comparisons of multiple choices in an option set, all permissible options are strictly better than all impermissible options regardless of option set, extending these rankings beyond the option set, the result is antifrustrationism. To show this, you can use the set of the following three options, which are identical except in the ways specified:

A: a preference exists and is fully satisfied,

B: the same preference exists and is not fully satisfied, and

C: the preference doesn’t exist,

and since B is impermissible because of the presence of A, this means C>B, and so it’s always better for a preference to not exist than for it to exist and not be fully satisfied, all else equal.

I think cluster thinking and the use of sensitivity analysis are approaches for decision making under deep uncertainty, when it’s difficult to commit to a particular joint probability distribution or weight considerations. Robust decision making is another. The maximality rule is another: given some set of plausible (empirical or ethical) worldviews/models for which we can’t commit to quantifying our uncertainty, if A is worse in expectation than B under some subset of plausible worldviews/models, and not better than B in expectation under any such set of plausible worldviews/models, we say A < B, and we should rule out A.

It seems like EAs should be more familiar with the field of decision making under deep uncertainty. (Thanks to this post by weeatquince for pointing this out.)

See also:

Deep Uncertainty by Walker, Lempert and Kwakkel for a short review.

The above mentioned papers by Mogensen and Thorstad are critical of the maximality rule for being too permissive, but here’s a half-baked attempt to improve it:

Suppose you have a social welfare function U, and want to compare two options, A and B. Suppose further that you have two sets of probability distributions of size n for the outcome X of each of A of and B, PA,PB. Then A≿B (A is at least as good as B) if (and only if) there is a bijection f:PA→PB such that

EX∼P[U(X)]≥EX∼f(P)[U(X)], for all P∈PA, (1)

and furthermore, A≻B (A is strictly better than B) if the above inequality is strict for some P∈PA.

This means pairing asymmetric/complex cluelessness arguments. Suppose you think helping an elderly person cross the street might have some important effect on the far future (you have some P∈PA), but you think not doing so could also have a similar far-future effect (according to P′∈PB), but the short-term consequences are worse, and under some pairing of distributions/arguments f:PA→PB, helping the elderly person always looks at least as good and under one pair (P,f(P)) looks better, so you should do it. Pairing distributions like this in some sense forces us to give equal weight to P and f(P), and maybe this goes too far and assumes away too much of our cluelessness or deep uncertainty?

The maximality rule as described in Maximal Cluelessness effectively assumes a pairing is already given to you, by instead using a single set of distributions P that can each be conditioned on taking action A or B. We’d omit f, and the expression replacing (1) above would be

EX∼P|A[U(X)]≥EX∼P|B[U(X)], for all P∈P.

I’m not sure what to do for different numbers of distributions for each option or infinitely many distributions. Maybe the function f should be assumed given, as a preferred mapping between distributions, and we could relax the surjectivity, total domain, injectivity and even fact that it’s a function, e.g. we compare for pairs (PA,PB)∈R, for some relation (subset) R⊆PA×PB. But assuming we already have such a function or relation seems to assume away too much of our deep uncertainty.

One plausibly useful first step is to sort PA and PB according to the expected values of U(A) and U(B) under their corresponding probability distributions, respectively. Should the mapping or relation preserve the min and max? How should we deal with everything else? I suspect any proposal will seem arbitrary.

Perhaps we can assume slightly more structure on the sets PA for each option A by assuming multiple probability distributions on PA, and go up a level (and we could repeat). Basically, I want to give probability ranges to the expected value of the action A, and then compare the possible expected values of these expected values. However, if we just multiply our higher-order probability distributions by the lower-order ones, this comes back to the original scenario.

1. it’s always better to improve the welfare of an existing person (or someone who would exist anyway) than to bring others into existence, all else equal, and

2. two outcomes are (comparable and) equivalent if they have the same distribution of welfare levels (but possibly different identities; this is often called Anonymity),

then not only would we reject Mere Addition (the claim that adding good lives, even those which are barely worth living but still worth living, is never bad), but the following would be true:

Given any two nonempty populations A and B, if any individual in B is worse off than any individual in A, then A∪B is worse than A. In other words, we shouldn’t add to a population any individual who isn’t at least as well off as the best off in the population, all else equal.

Intuitively, adding someone with worse welfare than someone who would exist anyway is equivalent to reducing the existing individual’s welfare and adding someone with better welfare than them; you just swap their welfares.

More formally, suppose a, a member of the original population A with welfare u, is better off than b, a member of the added population B with welfare v, so u>v. Then consider

A′ which is A, but has b instead of a, with welfare u.

B′ which is B, but has a instead of b, with welfare v.

Then, A is better than A′∪B′ , by the first hypothesis, because the latter has all the same individuals from A (and extras from B) with exactly the same welfare levels, except for a (from A and B′) who is worse off with welfare v (from B′) instead of u (from A). So A≻A′∪B′.

And A′∪B′ is equivalent to A∪B , by the second hypothesis, because the only difference is that we’ve swapped the welfare levels of a and b. So A′∪B′≃A∪B.

So, by transitivity (and the independence of irrelevant alternatives),

If welfare is real-valued (specifically from an interval I⊆R), then Maximin (maximize the welfare of the worst off individual) and theories which assign negative value to the addition of individuals with non-maximal welfare satisfy the properties above.

Furthermore, if along with welfare from a real interval and property 1 in the previous comment (2. Anonymity is not necessary), the following two properties also hold:

3. Extended Continuity, a modest definition of continuity for a theory comparing populations with real-valued welfares which must be satisfied by any order representable by a real-valued function that is continuous with respect to the welfares of the individuals in each population, and

4. Strong Pareto (according to one equivalent definition, under transitivity and the independence of irrelevant alternatives): if two outcomes with the same individuals in their populations differ only by the welfare of one individual, then the outcome in which that individual is better off is strictly better than the other,

then the theory must assign negative value to the addition of individuals with non-maximal welfare (and no positive value to the addition of individuals with maximal welfare) as long as any individual in the initial population has non-maximal welfare. In other words, the theory must be antinatalist in principle, although not necessarily in practice, since all else is rarely equal.

Proof : Suppose A is any population with an individual a with some non-maximal welfare u and consider adding an individual b who would also have some non-maximal welfare v. Denote, for all ϵ>0 small enough (0<ϵ<ϵ0),

A+ϵχa: the population A, but where individual a has welfare u+ϵ (which exists for all sufficiently small ϵ>0, since u is non-maximal, and welfare comes from an interval).

Also denote

B: the population containing only b, with non-maximal welfare v, and

C: the population containing only b, but with some welfare w>v (v is non-maximal, so there must be some greater welfare level).

Then

1.A+ϵχa≻A∪C, for all ϵ,0<ϵ<ϵ0, and 2.A∪C≻A∪B,

where the first inequality follows from the hypothesis that it’s better to improve the welfare of an existing individual than to add any others, and the second inequality follows from Strong Pareto, because the only difference is b’s welfare.

Then, by Extended Continuity and the first inequality for all (sufficiently small) ϵ>0, we can take the limit (infimum) of A+ϵχa as ϵ→0 to get

A⪰A∪C,

so, it’s no better to add b even if they would have maximal welfare, and by transitivity (and the independence of irrelevant alternatives) with 2.A∪C≻A∪B,

A≻A∪B,

so it’s strictly worse to add b with non-maximal welfare. This completes the proof.

My current best guess on what constitutes welfare/wellbeing/value (setting aside issues of aggregation):

1. Suffering is bad in itself.

2. Pleasure doesn’t matter in itself.

3. Conscious disapproval might be bad in itself. If bad, this could capture the badness of suffering, since I see suffering as affective conscious disapproval (an externalist account).

4. Conscious approval doesn’t matter in itself in an absolute sense (it may matter in a relative sense, as covered by 5). Pleasure is affective conscious approval.

5. Other kinds of preferences might matter, but only comparatively (in a wide/non-identity way) when they exist in both outcomes, i.e. between a preference that’s more satisfied and the same or a different preference (of the same kind?) that’s less satisfied, an outcome with the more satisfied one is better than an outcome with the less satisfied one, ignoring other reasons. This is a kind of preference-affecting principle.

Also, I lean towards experientialism on top of this, so I think the degree of satisfaction/frustration of the preference has to be experienced for it to matter.

To expand on 5, the fact that you have an unsatisfied preference doesn’t mean you disapprove of the outcome, it only means another outcome in which it is satisfied is preferable, all else equal. For example, that someone would like to go to the moon doesn’t necessarily make them worse off than if they didn’t have that desire, all else equal. That someone with a certain kind of disability would like to live without that disability and might even trade away part of their life to do so doesn’t necessarily make them worse off, all else equal. This is incompatible with the way QALYs are estimated and used.

I think this probably can’t be reconciled with the independence of irrelevant alternatives in a way that I would find satisfactory, since it would either give us antifrustrationism (which 5 explicitly rejects) or allow that sometimes having a preference is better than not, all else equal.

I feel increasingly unsympathetic to hedonism (and maybe experientalism generally?). Yes, emotions matter, and the strength of emotions could be taken to mean how much something matters, but if you separate a cow and her calf and they’re distressed by this, the appropriate response for their sake is not to drug or fool them until they feel better, it’s to reunite them. What they want is each other, not to feel better. Sometimes I think about something bad in the world that makes me sad; I don’t think you do

meany favour by just taking away my sadness; I don’t want to stop feeling sad, what I want is for the bad in the world to be addressed.Rather than affect being what matters in itself, maybe affect is a signal for what matters and its intensity tells us how much it matters. Hedonism as normally understood would therefore be like Goodhart’s law: it ignores the objects of our emotions.

Of course, often we do just want to feel better, and that matters, too. If someone wants to not suffer, then of course they should not suffer.

Related: wireheading, the experience machine, complexity of value.

Utility functions (preferential or ethical, e.g. social welfare functions) can have lexicality, so that a difference in category A can be larger than the maximum difference in category B, but we can still make probabilistic tradeoffs between them. This can be done, for example, by having separate utility functions, fA:X→R and fB:X→R for A and B, respectively, such that

fB is bounded to have range in the interval [0,1] (or range in an interval of length at most 1).

Then we can define our utility function as the sum f=fA+fB , so

This ensures that all outcomes with P(x) are at least as good as all outcomes with Q(x), without being Pascalian/fanatical to maximize fA regardless of what happens to fB. Note, however, that fB may be increasingly difficult to change as the number of moral patients increases, so we may approximate Pascalian fanaticism in this limit, anyway.

For example, fA(x)≤−1 if there is any suffering in x that meets a certain threshold of intensity, Q(x), and fA(x)=0 if there is no suffering at all in x, P(x). f can still be continuous this way.

If the probability that this threshold is met is p,0≤p<1 and the expected value of fA conditional on this is bounded below by −L, L>0, regardless of p for the choices available to you, then increasing fB by at least pL, which can be small, is better than trying to reduce p.

As another example, an AI could be incentivized to ensure it gets monitored by law enforcement. Its reward function could look like

where IMi(x) is 1 if the AI is monitored by law enforcement and passes some test (or did nothing?) in period i, and 0 otherwise. You could put an upper bound on the number of periods or use discounting to ensure the right term can’t evaluate to infinity since that would allow fB to be ignored (maybe the AI will predict its expected lifetime to be infinite), but this would eventually allow fB to overcome the IMi, unless you also discount the future in fB.

This should also allow us to modify the utility function fB, if preventing the modification would cause a test to be failed.

Furthermore, satisfying the IMi(x) strongly lexically dominates increasing fB(x), but we can still make expected tradeoffs between them.

The problem then reduces to designing the AI in such a way that it can’t cheat on the test, which might be something we can hard-code into it (e.g. its internal states and outputs are automatically sent to law enforcement), and so could be easier than getting fB right.

This overall approach can be repeated for any finite number of functions, f1,f2,…,fn. Recursively, you could define

for σ:R→R increasing and bounded with range in an interval of length at most 1, e.g. some sigmoid function. In this way, each fk dominates the previous ones, as above.

To adapt to a more deontological approach (not rule violation minimization, but according to which you should not break a rule in order to avoid violating a rule later), you could use geometric discounting, and your (moral) utility function could look like:

where

1.x is the act and its consequences without uncertainty and you maximize the expected value of f over uncertainty in x,

2.x is broken into infinitely many disjoint intervals xi, with xi coming just before xi+1 temporally (and these intervals are chosen to have the same time endpoints for each possible x),

3.I(xi)=1 if a rule is broken in xi, and 0 otherwise, and

4.r is a constant, 0<r≤0.5.

So, the idea is that f(x)>f(y) if and only if the earliest rule violation in x happens later than the earliest one in y (at the level of precision determined by how the intervals are broken up). The value of r≤0.5 ensures this. (Well, there are some rare exceptions if r=0.5). You essentially count rule violations and minimize the number of them, but you use geometric discounting based on when the rule violation happens in such a way to ensure that it’s always worse to break a rule earlier than to break any number of rules later.

However, breaking x up into intervals this way probably sucks for a lot of reasons, and I doubt it would lead to prescriptions people with deontological views endorse when they maximize expected values.

This approach basically took for granted that a rule is broken not when I act, but when a particular consequence occurs.

If, on the other hand, a rule is broken at the time I act, maybe I need to use some functions Ii(x) instead of the I(xi), because whether or not I act now (in time interval i) and break a rule depends on what happens in the future. This way, however, Ii(x) could basically always be 1, so I don’t think this approach works.

This nesting approach with σ above also allows us to “fix” maximin/leximin under conditions of uncertainty to avoid Pascalian fanaticism, given a finite discretization of welfare levels or finite number of lexical thresholds. Let the welfare levels be t0>t1>⋯>tn, and define:

i.e.fk(x) is the number of individuals with welfare level at most tk, where uiis the welfare of individual i, and I(ui≤tk) is 1 if ui≤tk and 0 otherwise. Alternatively, we could use I(tk+1<ui≤tk).

In situations without uncertainty, this requires us to first choose among options that minimize the number of individuals with welfare at most tn, because fn takes priority over fk, for all k<n, and then, having done that, choose among those that minimize the number of individuals with welfare at most tn−1, since fn−1 takes priority over fk, for all k<n−1, and then choose among those that minimize the number of individuals with welfare at most tn−2, and so on, until t0.

This particular social welfare function assigns negative value to new existences when there are no impacts on others, which leximin/maximin need not do in general, although it typically does in practice, anyway.

This approach does not require welfare to be cardinal, i.e. adding and dividing welfare levels need not be defined. It also dodges representation theorems like this one (or the stronger one in Lemma 1 here, see the discussion here), because continuity is not satisfied (and welfare need not have any topological structure at all, let alone be real-valued). Yet, it still satisfies anonymity/symmetry/impartiality, monotonicity/Pareto, and separability/independence. Separability means that whether one outcome is better or worse than another does not depend on individuals unaffected by the choice between the two.

Here’s a way to capture lexical threshold utilitarianism with a separable theory and while avoiding Pascalian fanaticism, with a negative threshold t−<0 and a positive threshold t+ > 0:

The first term is just standard utilitarianism, but squashed with a function σ:R→R into an interval of length at most 1.

The second/middle sum is the number of individuals (or experiences or person-moments) with welfare at least t+, which we add to the first term. Any change in number past this threshold dominates the first term.

The third/last sum is the number of individuals with welfare at most t−, which we subtract from the rest. Any change in number past this threshold dominates the first term.

Either of the second or third term can be omitted.

We could require t−≤ui≤t+ for all i, although this isn’t necessary.

More thresholds could be used, as in this comment: we would apply σ to the whole expression above, and then add new terms like the second and/or the third, with thresholds t++>t+ and t−−<t−, and repeat as necessary.

The procreation asymmetry can be formulated this way (due to Jeff McMahan):

This is a summary of the argument for the procreation asymmetry here and in the comments, especially this comment, which also looks further at the case of bringing someone into existence with a good life. I think this is an

actualistargument, similar to Krister Bykvist’s argument in 2.1 (which cites Dan Brock from this book) and Derek Parfit’s argument on p.150 of Reasons and Persons, and Johann Frick’s argument (although his is not actualist, and he explicitly rejects actualism). The starting claim is that your ethical reasons are in some sense conditional on the existence of individuals, and the asymmetry between existence and nonexistence can lead to the procreation asymmetry.1. From an outcome in which an individual doesn’t/won’t exist, they don’t have any interests that would give you a reason to believe that another outcome is better on their account (they have no account!). So, ignoring other reasons, this outcome is not dominated by any other, and the welfare of an individual whom we could bring into existence is not in itself a reason to bring them into existence. This is reflected by the absence of arrows starting from the Nonexistence block in the image above.

2. An existing individual (or an individual who will exist) has interests. In an outcome in which they have a bad life, an outcome in which they didn’t exist would have been better for them

from the point of view of the outcome in which they do exist with a bad life, so an outcome with a bad life is dominated by one in which they don’t exist, ignoring other reasons. Choosing an outcome which is dominated this way is worse than choosing an outcome that dominates it. So, that an individual would have negative welfare is a reason to prevent them from coming into existence. This is reflected by the arrow from Negative existence to Nonexistence in the image above.3. If the individual would have had a good life, we could say that this would be better than their nonexistence and dominates it (ignoring other reasons), but

this only applies from outcomes in which they exist and have a good life. If they never existed, because of 1, it would not be dominated from that outcome (ignoring other reasons).Together, 1 and 2 are the procreation asymmetry (reversing the order of the two claims from McMahan’s formulation).

I think my argument builds off the following from “The value of existence” by Gustaf Arrhenius and Wlodek Rabinowicz (2016):

The footnote that expands on this:

You could equally apply this argument to individual experiences, for an asymmetry between suffering and pleasure, as long as whenever an individual suffers, they have an interest in not suffering, and it’s not the case that each individual, at every moment, has an interest in more pleasure, even if they don’t know it or want it.

Something only matters if it matters (or will matter) to someone, and an absence of pleasure

doesn’t necessarilymatter to someone who isn’t experiencing pleasure* and certainly doesn’t matter to someone who does not and will not exist, and so we have no inherent reason to promote pleasure. On the other hand, there’s no suffering unless someone is experiencing it, and according to some definitions of suffering, it necessarily matters to the sufferer.* for example, when concentrating in a flow state, while asleep, when content.

See also tranquilism and this post I wrote.

And we can turn this into a

wideperson-affecting view to solve the Nonidentity problem by claiming that identity doesn’t matter. To make the above argument fit better with this, we can rephrase it slightly to refer to “extra individuals” or “no extra individuals” rather than anyspecificindividuals who will or won’t exist. Frick makes a separate general claim that if exactly one of two normative standards (e.g. people, with interests) will exist, and they are standards of the same kind (e.g. the extent to which people’s interests are satisfied can be compared), then it’s better for the one which will be better satisfied to apply (e.g. the better off person should come to exist).On the other hand, a narrow view might still allow us to say that it’s worse to bring a worse off individual into existence with a bad life than a better off one, if our reasons against bringing an individual into existence with a bad life are stronger the worse off they would be, a claim I’d expect to be widely accepted. If we apply the view to individual experiences or person-moments, the result seems to be a negative axiology, in which only the negative matters, on, and with hedonism, only suffering would matter. Whether or not this follows can depend on how the procreation asymmetry is captured, and there are systems in which it would not follow, e.g. the narrow asymmetric views here, although these reject the independence of irrelevant alternatives.

Under standard order assumptions which include the independence of irrelevant alternatives and completeness, the procreation asymmetry does imply a negative axiology.

I think EA hasn’t sufficiently explored the use of different types of empirical studies from which we can rigorously estimate causal effects, other than randomized controlled trials (or other experiments). This leaves us either relying heavily on subjective estimates of the magnitudes of causal effects based on weak evidence, anecdotes, expert opinion or basically guesses, or being skeptical of interventions whose cost-effectiveness estimates don’t come from RCTs. I’d say I’m pretty skeptical, but not so skeptical that I think we

needRCTs to conclude anything about the magnitudes of causal effects. There are methods to do causal inference from observational data.I think this has lead us to:

1. Underexploring the global health and development space. See John Halstead’s and Hauke Hillebrandt’s “Growth and the case against randomista development”. I think GiveWell is starting to look beyond RCTs. There’s probably already a lot of research out there they can look to.

2. Relying too much on guesses and poor studies in the effective animal advocacy space (especially in the past), for example overestimating the value of leafletting. I think things have improved a lot since then, and I thought the evidence presented in the work of Rethink Priorities, Charity Entrepreneurship and Founders Pledge on corporate campaigns was good enough to meet the bar for me to donate to support corporate campaigns specifically. Humane League Labs and some academics have done and are doing research to estimate causal effects from observational data that can inform EAA.

This is an argument against hedonic utility being cardinal and for widespread commensurability between hedonic experiences of different kinds. It seems that our tradeoffs, however we arrive at them, don’t track the moral value of hedonic experiences.

Let X be some method or system by which we think we can establish the cardinality and/or commensurability of our hedonic experiences, and rough tradeoff rates. For example, X=reinforcement learning system in our brains, our actual choices, or our judgements of value (including intensity).

If X is not identical to our hedonic experiences, then it may be the case that X is itself what’s forcing the observed cardinality and/or commensurability onto our hedonic experiences. But if it’s X that’s doing this, and it’s the hedonic experiences themselves that are of moral value, then that cardinality and/or commensurability are properties of X, not our hedonic experiences themselves. So the observed cardinality and/or commensurability is a moral illusion.

Here’s a more specific illustration of this argument:

Do our reinforcement systems have access to our whole experiences (or the whole hedonic component), or only some subsets of those neurons that are firing that are responsible for them? And what if they’re more strongly connected to parts of the brain for certain kinds of experiences than others? It seems like there’s a continuum of ways our reinforcement systems could be off or even badly off, so it would be

more surprisingto me that it would track true moral tradeoffs perfectly. Change (or add or remove) one connection between a neuron in the hedonic system and one in the reinforcement system, and now the tradeoffs made will be different, without affecting the moral value of the hedonic states. If the link between hedonic intensity and reinforcement strength is so fragile, what are the chances the reinforcement system has got it exactly right in the first place? Should be 0 (assuming my model is right).At least for similar hedonic experiences of different intensities, if they’re actually cardinal, we might expect the reinforcement system to capture some continuous monotonic transformation and not a linear transformation. But then it could be applying different monotonic transformations to different kinds of hedonic experiences. So why should we trust the tradeoffs between these different kinds of hedonic experiences?

The “cardinal hedonist” might object that X (e.g. introspective judgement of intensity) could be identical to our hedonistic experiences, or does track their cardinality closely enough.

I think, as a matter of fact, X will necessarily involve extra (neural) machinery that can distort our judgements, as I illustrate with the reinforcement learning case. It could be that our judgements are still approximately correct despite this, though.

Most importantly, the accuracy of our judgements depends on there being something fundamental that they’re tracking in the first place, so I think hedonists who use cardinal judgements of intensity owe us a good explanation for where this supposed cardinality comes from, which I expect is not possible with our current understanding of neuroscience, and I’m skeptical that it will ever be possible. I think there’s a great deal of unavoidable arbitrariness in our understanding of consciousness.

Here’s an illustration with math. Let’s consider two kinds of hedonic experiences, A and B, with at least three different (signed) intensities each, a1<a2<a3 and b1<b2<b3, respectively, with IA={a1,a2,a3},IB={b1,b2,b3}. These intensities are at least ordered, but not necessarily cardinal like real numbers or integers and we can’t necessarily compare A and B. For example, A and B might be pleasure and suffering generally (with suffering negatively signed), or more specific experiences of these.

Then, what X does is map these intensities to numbers through some function,

f:IA∪IB→Rsatisfying f(a1)<f(a2)<f(a3) and f(b1)<f(b2)<f(b3). We might even let IA and IB be some ordered continuous intervals, isomorphic to a real-valued interval, and have f be continuous and increasing on each of IA and IB, but again, it’s f that’s introducing the cardinalization and commensurability (or a different cardinalization and commensurability from the real one, if any); these aren’t inherent to A and B.

If you’re a consequentialist and you think

1. each individual can sometimes sacrifice some A for more B for themself,

2. we should be impartial, and

3. transitivity and the independence of irrelevant alternatives hold,

then it’s sometimes ethical to sacrifice A from one individual for more B for another. This isn’t too surprising, but let’s look at the argument, which is pretty simple, and discuss some examples.

Proof. Consider the following three options, with two individuals, x and y, and a+>a amounts of A, b+>b amounts of B:

i. x:(a:A,b+:B), y:(a:A,b:B) , read as x has amount a of A and amount b+ of B, while y has amount a of A and amount b of B.

ii. x:(a+:A,b:B), y:(a:A,b:B)

iii. x:(a:A,b:B), y:(a+:A,b:B)

Here we have i > ii by 1 for some a, a+, b and b+, and ii = iii by impartiality, so together i > iii by 3, and we sacrifice some A from y for some B from for x. QED

Remark: I did choose the amounts of A and B pretty specifically in this argument to match in certain ways. With continuous personal tradeoffs between A and B, and continuous tradeoffs between amounts of A between different individuals at all base levels of A, I think this should force continuous tradeoffs between one individual’s amount of A and another’s amount of B. We can omit the impartiality assumption in this case.

Possible examples:

A= absence or negative of suffering, B= knowing the truth, for its own sake (not its instrumental value)

A= absence or negative of suffering, B= pleasure

A= absence or negative of suffering, B= anything else that could be good

A= absence or negative of

intensesuffering, B= absence or negative ofmildsufferingIn particular, if you’d be willing to endure torture for some other good, you should be willing to allow others to be tortured for you to get more of that good.

I imagine people will take this either way, e.g. some will accept that it’s actually okay to let some be tortured for some other kind of benefit to different people, and others will accept that nothing can compensate them for torture. I fall into the latter camp.

Others might also reject the independence of irrelevant alternatives or transitivity, or their “spirit”, e.g. by individuating options to option sets. I’m pretty undecided about independence these days.

I’ve been thinking more lately about how I should be thinking about causal effects for cost-effectiveness estimates, in order to clarify my own skepticism of more speculative causes, especially longtermist ones, and better understand how skeptical I ought to be. Maybe I’m far too skeptical. Maybe I just haven’t come across a full model for causal effects that’s convincing since I haven’t been specifically looking. I’ve been referred to this in the past, and plan to get through it, since it might provide some missing pieces for the value of research. This also came up here.

Suppose I have two random variables, X and Y, and I want to know the causal effect of manipulating X on Y, if any.

1. If I’m confident there’s no causal relationship between the two, say due to spatial separation, I assume there is no causal effect, and Y conditional on the manipulation of X to take value A (possibly random), Y|do(X=A), is

identicalto Y, i.e. Y|do(X=A)=Y. (The do notation is Pearl’s do-calculus notation.)2. If X could affect Y, but I know nothing else,

a. I might assume, based on symmetry (and chaos?) for Y, that Y|do(X=A) and Y are

identical in distribution, but not necessarily literally equal as random variables. They might be slightly “shuffled” or permuted versions of each other (see symmetric decreasing rearrangements for specific examples of such a permutation). The difference in expected values is still 0. This is how I think about the effects of my every day decisions, like going to the store, breathing at particular times, etc. on future populations. I might assume the same for variables that depend on Y.b. Or, I might think that manipulating X just injects noise into Y, possibly while preserving some of its statistics, e.g. the mean or median. A simple case is just adding random symmetric noise with mean and median 0 to Y. However, whether or not a statistic is preserved with the extra noise might be sensitive to the scale on which Y is measured. For example, if Y is real-valued, and f:R→R is strictly increasing, then for the median, med(f(Y))=f(med(Y)), but the same is not necessarily true for the expected value of Y, or for other variables that depend on Y.

c. Or, I might think that manipulating X makes Y

closerto a “default” distribution over the possible values of Y, often but not always uninformed or uniform. This can shift the mean, median, etc., of Y. For example, Y could be the face of the coin I see on my desk, and X could be whether I flip the coin or not, with X being not by default. So, if I do flip the coin and hence manipulate X, this randomizes the value of Y, making my probability distribution for its value uniformly random instead of a known, deterministic value. You might think that some systems are the result of optimization and therefore fragile, so random interventions might return them to prior “defaults”, e.g. naive systemic change or changes to ecosystems. This could be (like) regression to the mean.I’m not sure how to balance these three possibilities generally. If I do think the effects are symmetric, I might go with a or b or some combination of them. In particular asymmetric cases, I might also combine c.

3. Suppose I have a plausible argument for how X could affect Y in a particular way, but no observations that can be used as suitable proxies, even very indirect, for counterfactuals with which to estimate the size of the effect. I lean towards dealing with this case as in 2, rather than just making assumptions about effect sizes without observations.

For example, someone might propose a causal path through which X affects Y with a missing estimate of effect size at at least one step along the path, but an argument to that this should increase the value of Y. It is not enough to consider only one such path, since there may be many paths from X to Y, e.g. different considerations for how X could affect Y, and these would need to be combined. Some could have

opposite effects. By 2, those other paths, when combined with the proposed causal path, reduce the effects of X on Y through the proposed path. The longer the proposed path, the more unknown alternate paths.I think this is where I am now with speculative longtermist causes. Part of this may be my ignorance of the proposed causal paths and estimates of effect sizes, since I haven’t looked too deeply at the justifications for these causes, but the dampening from unknown paths also applies when the effect sizes along a path are known, which is the next case.

4. Suppose I have a causal path through some other variable Z, X→Z→Y, so that X causes Z and Z causes Y, and I model both the effects of X→Z and Z→Y, based on observations. Should I just combine the two for the effect of X on Y? In general, not in the straightforward way. As in 3, there could be another causal path, X→Z′→Y (and it could be longer, instead of with just a single intermediate variable).

As in case 3, you can think of X→Z′→Y as dampening the effect of X→Z→Y, and with long proposed causal paths, we might expect the net effect to be small, consistently with the intuition that the predictable impacts on the far future decrease over time due to ignorance/noise and chaos, even though the

actualimpacts may compound due to chaos.Maybe I’ll write this up as a full post after I’ve thought more about it. I imagine there’s been writing related to this, including in the EA and rationality communities.

Fehige defends the asymmetry between preference satisfaction and frustration on rationality grounds. I start from a “preference-affecting view” in this comment, and in replies, describe how to get to antifrustrationism and argue against a symmetric view.

Let’s consider a given preference

from the point of view of a given outcome after choosing it, in which the preference either exists or does not, by cases:1. The preference exists:

a. If there’s an outcome in which the preference exists and is more satisfied, and all else is equal, it would have been irrational to have chosen this one (over it, and at all).

b. If there’s an outcome in which the preference exists and is less satisfied, and all else is equal, it would have been irrational to have chosen the other outcome (over this one, and at all).

c. If there’s an outcome in which the preference does not exist, and all else is equal, the preference itself does not tell us if either would have been irrational to have chosen.

2. The preference doesn’t exist:

a. If there’s an outcome in which the preference exists, regardless of its degree of satisfaction, and all else equal, the preference itself does not tell us if either would have been irrational to have chosen.

So, all else equal besides the existence or degree of satisfaction of the given preference, it’s always rational to choose an outcome in which the preference does not exist, but it’s irrational to choose an outcome in which the preference exists but is less satisfied than in another outcome.

(I made a similar argument in the thread starting here.)

I also think that antifrustrationism in some sense overrides interests

lessthan symmetric views (not to exclude “preference-affecting” views or mixtures as options, though). Rather than satisfying your existing preferences, according to symmetric views, it can be better to create new preferences in you and satisfy them, against your wishes. This undermines the appeal of autonomy and subjectivity that preference consequentialism had in the first place. If, on the other hand, new preferences don’t add positive value, then they can’t compensate for the violation of preferences, including the violation of preferences to not have your preferences manipulated in certain ways.Consider the following two options for interests within one individual:

A. Interest 1 exists and is fully satisfied

B. Interest 1 exists and is not fully satisfied, and interest 2 exists and is (fully) satisfied.

A symmetric view would sometimes choose B, so that the creation of interests can take priority over interests that would exist regardless. In particular, the proposed benefit comes from satisfying an interest that

would not have existed in the alternative, so it seems like we’re overriding the interests the individual would have in A with a new interest, interest 2. For example, we make someone want something and satisfy that want, at the expense of their other interests.On the other hand, consider:

A. Interest 1 exists and is partially unsatisfied

B. Interest 1 exists and is fully satisfied, and interest 2 exists and is partially unsatisfied.

In this case, antifrustrationism would sometimes choose A, so that the removal or avoidance of an otherwise unsatisfied interest can take priority over (further) satisfying an interest that would exist anyway. But in this case, if we choose A because of concerns for interest 2, at least interest 2 would exist in the alternative A, so the benefit comes from the avoidance of an interest that would have otherwise existed. In A, compared to B, I wouldn’t say we’re overriding interests, we’re dealing with an interest, interest 2, that

would have existed otherwise.Smith and Black’s “The morality of creating and eliminating duties” deals with duties rather than preferences, and argues that assigning positive value to duties and their satisfaction leads to perverse conclusions like the above with preferences, and they have a formal proof for this under certain conditions.

Some related writings, although not making the same point I am here:

Brian Tomasik’s “Does Negative Utilitarianism Override Individual Preferences?”

Simon Knutsson’s “What is the difference between weak negative and non-negative ethical views?” (On Center for Long-Term Risk’s website)

Toby Ord’s “Why I’m Not a Negative Utilitarian”

I also think this argument isn’t specific to preferences, but could be extended to any interests, values or normative standards that are necessarily

heldby individuals (or other objects), including basically everything people value (see here for a non-exhaustive list). See Johann Frick’s paper and thesis which defend the procreation asymmetry, and my other post here.Then, if you extend these comparisons to satisfy the independence of irrelevant alternatives by stating that in comparisons of multiple choices in an option set, all permissible options are strictly better than all impermissible options regardless of option set, extending these rankings beyond the option set, the result is antifrustrationism. To show this, you can use the set of the following three options, which are identical except in the ways specified:

and since B is impermissible because of the presence of A, this means C>B, and so it’s always better for a preference to not exist than for it to exist and not be fully satisfied, all else equal.

I think cluster thinking and the use of sensitivity analysis are approaches for decision making under deep uncertainty, when it’s difficult to commit to a particular joint probability distribution or weight considerations. Robust decision making is another. The maximality rule is another: given some set of plausible (empirical or ethical) worldviews/models for which we can’t commit to quantifying our uncertainty, if A is worse in expectation than B under some subset of plausible worldviews/models, and not better than B in expectation under any such set of plausible worldviews/models, we say A < B, and we should rule out A.

It seems like EAs should be more familiar with the field of decision making under deep uncertainty. (Thanks to this post by weeatquince for pointing this out.)

See also:

Deep Uncertainty by Walker, Lempert and Kwakkel for a short review.

Decision Making under Deep Uncertainty: From Theory to Practice for a comprehensive text.

Heuristics for clueless agents: how to get away with ignoring what matters most in ordinary decision-making by David Thorstad and Andreas Mogensen

Many Weak Arguments vs. One Relatively Strong Argument and Robustness of Cost-Effectiveness Estimates and Philanthropy by Jonah Sinick

Why I’m skeptical about unproven causes (and you should be too) by Peter Hurford (LW, blog)

The Optimizer’s Curse & Wrong-Way Reductions by Chris Smith (blog)

EDIT: I think this approach isn’t very promising.

The above mentioned papers by Mogensen and Thorstad are critical of the maximality rule for being too permissive, but here’s a half-baked attempt to improve it:

Suppose you have a social welfare function U, and want to compare two options, A and B. Suppose further that you have two sets of probability distributions of size n for the outcome X of each of A of and B, PA,PB. Then A≿B (A is at least as good as B) if (and only if) there is a bijection f:PA→PB such that

EX∼P[U(X)]≥EX∼f(P)[U(X)], for all P∈PA, (1)

and furthermore, A≻B (A is strictly better than B) if the above inequality is strict for some P∈PA.

This means

pairingasymmetric/complex cluelessness arguments. Suppose you think helping an elderly person cross the street might have some important effect on the far future (you have some P∈PA), but you think not doing so could also have a similar far-future effect (according to P′∈PB), but the short-term consequences are worse, and under some pairing of distributions/arguments f:PA→PB, helping the elderly person always looks at least as good and under one pair (P,f(P)) looks better, so you should do it. Pairing distributions like this in some sense forces us to giveequal weightto P and f(P), and maybe this goes too far and assumes away too much of our cluelessness or deep uncertainty?The maximality rule as described in Maximal Cluelessness effectively assumes a pairing is already given to you, by instead using a single set of distributions P that can each be conditioned on taking action A or B. We’d omit f, and the expression replacing (1) above would be

EX∼P|A[U(X)]≥EX∼P|B[U(X)], for all P∈P.

I’m not sure what to do for different numbers of distributions for each option or infinitely many distributions. Maybe the function f should be assumed given, as a

preferredmapping between distributions, and we could relax the surjectivity, total domain, injectivity and even fact that it’s a function, e.g. we compare for pairs (PA,PB)∈R, for some relation (subset) R⊆PA×PB. But assuming we already have such a function or relation seems to assume away too much of our deep uncertainty.One plausibly useful first step is to sort PA and PB according to the expected values of U(A) and U(B) under their corresponding probability distributions, respectively. Should the mapping or relation preserve the min and max? How should we deal with everything else? I suspect any proposal will seem arbitrary.

Perhaps we can assume slightly more structure on the sets PA for each option A by assuming multiple probability distributions on PA, and go up a level (and we could repeat). Basically, I want to give probability

rangesto theexpectedvalue of the action A, and then compare the possible expected values of these expected values. However, if we just multiply our higher-order probability distributions by the lower-order ones, this comes back to the original scenario.If we think

1. it’s always better to improve the welfare of an existing person (or someone who would exist anyway) than to bring others into existence, all else equal, and

2. two outcomes are (comparable and) equivalent if they have the same distribution of welfare levels (but possibly different identities; this is often called Anonymity),

then not only would we reject Mere Addition (the claim that adding good lives, even those which are barely worth living but still worth living, is never bad), but the following would be true:

Given any two nonempty populations A and B, if any individual in B is worse off than any individual in A, then A∪B is worse than A. In other words, we shouldn’t add to a population any individual who isn’t at least as well off as the best off in the population, all else equal.

Intuitively, adding someone with worse welfare than someone who would exist anyway is equivalent to reducing the existing individual’s welfare and adding someone with better welfare than them; you just swap their welfares.

More formally, suppose a, a member of the original population A with welfare u, is better off than b, a member of the added population B with welfare v, so u>v. Then consider

A′ which is A, but has b instead of a, with welfare u.

B′ which is B, but has a instead of b, with welfare v.

Then, A is better than A′∪B′ , by the first hypothesis, because the latter has all the same individuals from A (and extras from B) with exactly the same welfare levels, except for a (from A and B′) who is worse off with welfare v (from B′) instead of u (from A). So A≻A′∪B′.

And A′∪B′ is equivalent to A∪B , by the second hypothesis, because the only difference is that we’ve swapped the welfare levels of a and b. So A′∪B′≃A∪B.

So, by transitivity (and the independence of irrelevant alternatives),

If welfare is real-valued (specifically from an interval I⊆R), then Maximin (maximize the welfare of the worst off individual) and theories which assign negative value to the addition of individuals with non-maximal welfare satisfy the properties above.

Furthermore, if along with welfare from a real interval and property 1 in the previous comment (2. Anonymity is not necessary), the following two properties also hold:

3. Extended Continuity, a modest definition of continuity for a theory comparing populations with real-valued welfares which must be satisfied by any order representable by a real-valued function that is continuous with respect to the welfares of the individuals in each population, and

4. Strong Pareto (according to one equivalent definition, under transitivity and the independence of irrelevant alternatives): if two outcomes with the same individuals in their populations differ only by the welfare of one individual, then the outcome in which that individual is better off is strictly better than the other,

then the theory

mustassign negative value to the addition of individuals with non-maximal welfare (and no positive value to the addition of individuals with maximal welfare) as long as any individual in the initial population has non-maximal welfare. In other words, the theory must beantinatalistin principle, although not necessarily in practice, since all else is rarely equal.Proof: Suppose A is any population with an individual a with some non-maximal welfare u and consider adding an individual b who would also have some non-maximal welfare v. Denote, for all ϵ>0 small enough (0<ϵ<ϵ0),A+ϵχa: the population A, but where individual a has welfare u+ϵ (which exists for all sufficiently small ϵ>0, since u is non-maximal, and welfare comes from an interval).

Also denote

B: the population containing only b, with non-maximal welfare v, and

C: the population containing only b, but with some welfare w>v (v is non-maximal, so there must be some greater welfare level).

Then

where the first inequality follows from the hypothesis that it’s better to improve the welfare of an existing individual than to add any others, and the second inequality follows from Strong Pareto, because the only difference is b’s welfare.

Then, by Extended Continuity and the first inequality for all (sufficiently small) ϵ>0, we can take the limit (infimum) of A+ϵχa as ϵ→0 to get

so, it’s

no betterto add b even if they would have maximal welfare, and by transitivity (and the independence of irrelevant alternatives) with 2. A∪C≻A∪B,so it’s

strictly worseto add b with non-maximal welfare. This completes the proof.My current best guess on what constitutes welfare/wellbeing/value (setting aside issues of aggregation):

1. Suffering is bad in itself.

2. Pleasure doesn’t matter in itself.

3. Conscious disapproval

mightbe bad in itself. If bad, this could capture the badness of suffering, since I see suffering asaffectiveconscious disapproval (an externalist account).4. Conscious approval doesn’t matter in itself in an absolute sense (it may matter in a relative sense, as covered by 5). Pleasure is

affectiveconscious approval.5. Other kinds of preferences might matter, but only comparatively (in a wide/non-identity way) when they exist in both outcomes, i.e. between a preference that’s more satisfied and the same or a different preference (of the same kind?) that’s less satisfied, an outcome with the more satisfied one is better than an outcome with the less satisfied one, ignoring other reasons. This is a kind of preference-affecting principle.

Also, I lean towards experientialism on top of this, so I think the degree of satisfaction/frustration of the preference has to be experienced for it to matter.

To expand on 5, the fact that you have an unsatisfied preference doesn’t mean you disapprove of the outcome, it only means another outcome in which it is satisfied is preferable, all else equal. For example, that someone would like to go to the moon doesn’t necessarily make them worse off than if they didn’t have that desire, all else equal. That someone with a certain kind of disability would like to live without that disability and might even trade away part of their life to do so doesn’t necessarily make them worse off, all else equal. This is incompatible with the way QALYs are estimated and used.

I think this probably can’t be reconciled with the independence of irrelevant alternatives in a way that I would find satisfactory, since it would either give us antifrustrationism (which 5 explicitly rejects) or allow that sometimes having a preference is better than not, all else equal.

More here, here and here on my shortform, and in this post.