A Happiness Manifesto: Why and How Effective Altruism Should Rethink its Approach to Maximising Human Welfare

TL;DR I ar­gue effec­tive al­tru­ists can and should use hap­piness sur­veys to de­ter­mine cost-effec­tive­ness and show how do­ing this gen­er­ates some sub­stan­tially differ­ent char­ity recom­men­da­tions from those given by GiveWell.

Ab­stract:

I ar­gue that, de­spite some long-stand­ing doubts, hap­piness can be mea­sured through self-re­ports and there­fore hap­piness sur­veys should be used to de­ter­mine how much hap­piness differ­ent out­comes pro­duce. Speci­fi­cally, I recom­mend life satis­fac­tion (LS), found by ask­ing “Over­all, how satis­fied are you with your life nowa­days?” (0 − 10), as a suit­able, though im­perfect, mea­sure of hap­piness. As, pre­sum­ably, ev­ery­one val­ues hap­piness to some ex­tent, it fol­lows that ev­ery­one should in­cor­po­rate in­for­ma­tion from LS scores when de­ter­min­ing how to do the most good. I show how the LS ap­proach could be ap­plied to a novel area: as­sess­ing which char­i­ties pro­duce the most hap­piness. Ac­cord­ing to GiveWell, which does not as­sess char­i­ties solely (or even pri­mar­ily) by LS scores, cer­tain life-sav­ing and poverty-alle­vi­at­ing char­i­ties in the de­vel­op­ing world do the most good per dol­lar. I show that, if we un­der­stand good in terms of max­imis­ing self-re­ported LS, alle­vi­at­ing poverty is sur­pris­ingly un­promis­ing whereas men­tal health in­ter­ven­tions, which have so far been over­looked, seem more effec­tive. Var­i­ous philo­soph­i­cal and method­olog­i­cal is­sues leave it un­clear whether GiveWell’s life-sav­ing recom­men­da­tions are more cost-effec­tive than the men­tal health char­ity I dis­cuss. I con­clude by ex­plain­ing the im­pli­ca­tions of this anal­y­sis for effec­tive al­tru­ists who want to in­crease hu­man hap­piness. 1

Table of contents

1. In­tro­duc­tion

2. The re­la­tion­ship be­tween hap­piness and mea­sure of sub­jec­tive well-being

3. Can hap­piness re­ally be mea­sured?

4. Can we com­pare in­di­vi­d­u­als’ hap­piness scores?

5. Us­ing life satis­fac­tion as the com­mon cur­rency for well-be­ing ad­justed life years (WALYs)

6. Eval­u­at­ing the life satis­fac­tion im­pact of (GiveWell) charities

7. What should effec­tive al­tru­ists do next?

An­nex A: How good are QALYs and DALYs as prox­ies for hap­piness?

1. Introduction

How should we com­pare the im­pact of var­i­ous out­comes, such as the treat­ment of a health con­di­tion or poverty re­duc­tion, in terms of how much good they do? The cur­rent method used by effec­tive al­tru­ists (EAs) is, ul­ti­mately, to rely on their own sub­jec­tive judge­ments; the value of these out­comes is weighed in the mind. EAs have of­ten used health met­rics, such as QALYs and DALYs—which rely on other peo­ple’s ag­gre­gated sub­jec­tive judge­ments—but health is not the only item of in­ter­est, there­fore sub­jec­tive judge­ments are still needed to de­cide how to com­pare the value of health out­comes to other out­comes. As they make clear in their cost-effec­tive­ness anal­y­sis, GiveWell de­ter­mine the value of differ­ent in­ter­ven­tions by pol­ling their staff mem­bers. The main ques­tion in their anal­y­sis is “how many years of dou­bled con­sump­tion are as morally valuable as sav­ing the life of an un­der-5-year old child?” GiveWell then use the me­dian an­swer to es­tab­lish the trade-off ra­tio be­tween these two out­comes.

We might think rely­ing on our sub­jec­tive judge­ments is both un­ob­jec­tion­able and un­avoid­able on the grounds we are mak­ing moral eval­u­a­tions, and there is sim­ply no other way to do this. How­ever, claiming these are moral judge­ments is only partly true and, in­deed, may not be true at all. Some of what ap­pear to be moral eval­u­a­tions are judge­ments about facts and so, in prin­ci­ple, em­piri­cal ques­tions. Here’s an anal­ogy. Sup­pose you and I are try­ing to de­ter­mine which of two oddly-shaped jars con­tains the most wa­ter. What sort of as­sess­ment are we mak­ing here? Not a moral judge­ment, but a sub­jec­tive judge­ment of fact. We could go on to say the best jar is the one that holds the most wa­ter, which would be a moral judge­ment.2 Yet, re­gard­less of whether we’ve made that moral eval­u­a­tion, the ques­tion of how much wa­ter the jars hold is still a fac­tual one. Now, sup­pose you and I are try­ing to as­sess which of two out­comes—cur­ing some health con­di­tion or alle­vi­at­ing poverty by some amount—in­creases hap­piness by more. Sup­pose we agree on our con­cept of hap­piness: a pos­i­tive bal­ance of en­joy­ment over suffer­ing. Are we mak­ing a moral eval­u­a­tion when we state which out­come we think would in­crease hap­piness more? Clearly, as the jar anal­ogy showed, we are not. Note, need not as­sume hap­piness is of any moral im­por­tance—per­haps we con­clude only liberty has value—to com­pare the out­comes.

De­cid­ing (a) which thing or things are in­trin­si­cally valuable and this con­sti­tute a good (b) how to ag­gre­gate goods to de­ter­mine over­all value—in philo­soph­i­cal jar­gon, an ax­iol­ogy, the method of rank­ing states of af­fairs in terms of the their ul­ti­mate value—is, cer­tainly, a moral judge­ment.3 Sup­pose, for ex­am­ple, we de­cide the value of an out­come is the sum to­tal of hap­piness in it. How­ever, once you’ve done that, de­ter­min­ing how much good an out­come con­tains is, in prin­ci­ple, an em­piri­cal ques­tion. Any sub­jec­tive as­sess­ments one makes there­after will be judge­ments of facts, not of value. I hope it’s ob­vi­ous that we will, where pos­si­ble, want to mea­sure the good(s) di­rectly, and that ob­jec­tive mea­sure­ment of the facts ‘trumps’ our sub­jec­tive eval­u­a­tions of the facts. If we could mea­sure the wa­ter ca­pac­ity of the jars, that would set­tle the ques­tion of which held more and ren­der our guess­work ob­so­lete.

If we want to have pro­duc­tive dis­agree­ments with one an­other about which out­comes do more good, it’s im­por­tant to make it clear whether dis­agree­ments arise from claims about value or from claims about facts. Sup­pose Singer and MacAskill dis­agree about whether op­tion A is bet­ter than op­tion B. Given Singer and MacAskill have (I think) the same views on value, the dis­agree­ment is a fac­tual one.4 By com­par­i­son, GiveWell’s anal­y­sis com­bines the opinions of mul­ti­ple peo­ple who have differ­ent views about value, it’s not pos­si­ble to tell whether one dis­agrees with GiveWell (sup­pos­ing one does) be­cause one differs about facts or about val­ues. To make this plain, as noted, the key ques­tion in GiveWell’s cost-effec­tive­ness is “how many years of dou­bled con­sump­tion are as morally valuable as sav­ing the life of an un­der-5-year old child?” Here is a non-ex­haus­tive list of fac­tors two peo­ple could dis­agree about when an­swer­ing that ques­tion, and whether that fac­tor is a ques­tion of fact or value.

Factor

Value or fact ques­tion?

How much does dou­bling con­sump­tion for a year in­crease well-be­ing?

Fact, for a given ac­count of well-being

What is well-be­ing?

Value

The well-be­ing the child would have if it lived?

Fact, given ex­pected well-being

How many years the child will live for?

Fact

What the bad­ness of death is (i.e. is it bet­ter, all else equal, to save a 2-year old or a 20-year old?

Value


This doc­u­ment aims t
o illu­mi­nate one fac­tor that, pre­sum­ably, will be of in­ter­est to ev­ery­one: how much differ­ent out­comes im­prove hap­piness. While this has gen­er­ally been judged sub­jec­tively, it is a ques­tion of fact, not of value (for a given ac­count of hap­piness). I ar­gue that, de­spite long-stand­ing doubts, hap­piness can be mea­sured through pop­u­la­tion sur­veys and there­fore we should use data from hap­piness sur­veys, rather than rely­ing on our own sub­jec­tive judge­ments, to de­ter­mine what in­creases hap­piness and by how much. Speci­fi­cally, I recom­mend life satis­fac­tion (‘LS’) scores, which is found by in­quiring “How satis­fied are you with your life nowa­days?” (on a scale from 0 “not at all” to 10 “com­pletely”) as the most suit­able (al­though not ideal) proxy mea­sure of hap­piness. While these claims may be un­fa­mil­iar and con­tentious within philos­o­phy and effec­tive al­tru­ism, they are com­mon knowl­edge within (cer­tain cor­ners of) eco­nomics and psy­chol­ogy. This part of the doc­u­ment largely restates claims made by oth­ers.5 Hav­ing ar­gued that we should use LS scores to de­ter­mine what max­imises hap­piness in gen­eral, I ap­ply it to the spe­cific case of de­ter­min­ing what the most cost-effec­tive char­i­ties are at pro­duc­ing hap­piness, which has not yet been done. I show how the LS-ap­proach works differ­ently to and gen­er­ates par­tially differ­ent recom­men­da­tions from GiveWell, who do not pri­mar­ily use LS scores to de­ter­mine cost-effec­tive­ness.

The rest of the doc­u­ment pro­ceeds as fol­lows. Sec­tion 2 ex­plains how ‘sub­jec­tive well-be­ing’ (SWB), as it is nor­mally called is so­cial sci­ence, is mea­sured, how SWB re­lates to hap­piness and why LS is a suit­able proxy for hap­piness. Sec­tion 3 ar­gues SWB mea­sures are valid and re­li­able. Sec­tion 4 ar­gues SWB mea­sures can be used to make in­ter­per­son­ally car­di­nal com­par­i­sons. Sec­tion 5 out­lines how LS can be used to de­ter­mine what in­crease hap­piness in gen­eral. Sec­tion 6 ap­plies the LS ap­proach to char­ity eval­u­a­tion. Sec­tion 7 sets out fu­ture work for effec­tive al­tru­ists in­ter­ested in in­creas­ing hap­piness.

2. The re­la­tion­ship be­tween hap­piness and mea­sures of sub­jec­tive well-being

So­cial sci­en­tists (mostly economists and psy­chol­o­gists) talk about mea­sures of ‘sub­jec­tive well-be­ing’ (‘SWB’), which are, to quote Met­calfe and Dolan (two so­cial sci­en­tists) ‘rat­ings of thoughts and feel­ings about life’ (Dolan and Met­calfe (2012). SWB is typ­i­cally thought to have three com­po­nents, these are (OECD 2013):

Eval­u­a­tion (some­times called ‘cog­ni­tive’) - re­flec­tive as­sess­ment on a per­son’s life or some spe­cific as­pect of it. Life satis­fac­tion is a life eval­u­a­tion ques­tion but not the only one.

Ex­pe­rience (some­times called ‘af­fec­tive’ or ‘he­do­nic’) - a per­son’s feel­ing or emo­tional states, typ­i­cally mea­sured with refer­ence to a par­tic­u­lar point in time

Eu­daimo­nia—a sense of mean­ing and pur­pose in life, or psy­cholog­i­cal func­tion­ing.

For a list of ex­am­ple SWB ques­tions, see OECD (2013, An­nex A).

What is the re­la­tion­ship be­tween the SWB mea­sures and hap­piness?6 Often, mea­sures of SWB are referred to as mea­sures of ‘hap­piness’. This is tech­ni­cally in­cor­rect and also mis­lead­ing. On the ear­lier defi­ni­tion of hap­piness—a pos­i­tive bal­ance of en­joy­ment over suffer­ing—the ex­pe­rience com­po­nent of SWB is iden­ti­cal with hap­piness. Eval­u­a­tions mea­sure how peo­ple feel about their lives, rather than how happy they feel dur­ing them. Eu­daimonic mea­sures may tap into psy­cholog­i­cal states—ones re­lated to mean­ing, what­ever this is—that, pre­sum­ably, feel en­joy­able to ex­pe­rience and thus com­prise hap­piness, but do not cap­ture all the psy­cholog­i­cal states rele­vant to hap­piness. Hence, SWB is not only a mea­sure of hap­piness.

The ‘gold stan­dard’ for mea­sur­ing hap­piness is the ex­pe­rience sam­pling method (ESM), where par­ti­ci­pants are prompted to record their feel­ings and pos­si­bly their ac­tivi­ties one or more times a day.7 While this is an ac­cu­rate record of how peo­ple feel, it is ex­pen­sive to im­ple­ment and in­tru­sive for re­spon­dents. A more vi­able ap­proach is the day re­con­struc­tion method (DRM) where re­spon­dents use a time-di­ary to record and rate their pre­vi­ous day. DRM pro­duces com­pa­rable re­sults to ESM, but is less bur­den­some to use (Kah­ne­man et al. 2004).

Given we are in­ter­ested in mea­sur­ing hap­piness, we might think we should ig­nore the non-ex­pe­rience com­po­nents al­to­gether. Prac­ti­cally, how­ever, this is un­fea­si­ble and we are forced to rely on life satis­fac­tion mea­sures as the main proxy mea­sure for hap­piness (a ‘proxy’ mea­sure is an in­di­rect mea­sure of the phe­nomenon of in­ter­est). It is much eas­ier to col­lect LS data as it re­quires just one quick ques­tion that takes sub­jects around 30 sec­onds to an­swer, whereas the DRM takes ap­prox­i­mately 40 min­utes to fill out. As a re­sult of this ease of use, it is the SWB mea­sure on which most data has been col­lected and most anal­y­sis done. It is now pos­si­ble to say, a point I re­turn to later, to what ex­tent var­i­ous out­comes cause an ab­solute in­crease in life satis­fac­tion on a 0-10 scale, which is what we need to de­ter­mine cost-effec­tive­ness (see La­yard et al. 2018). By con­trast, to the best of my knowl­edge, there is in­suffi­cient re­search on ex­pe­rience mea­sures to draw the same con­clu­sions.

How much of a prob­lem is it to use eval­u­a­tive mea­sures in lieu of ex­pe­rience ones? Ex­pe­rience and eval­u­a­tive mea­sures are con­cep­tu­ally differ­ent and an­swered in some­what differ­ent ways. As Deaton and Stone (2013) ex­plain:

He­donic [i.e. ex­pe­rience] mea­sures are un­cor­re­lated with ed­u­ca­tion, vary over the days of the week, im­prove with age,and re­spond to in­come only up to a thresh­old. Eval­u­a­tive mea­sures re­main cor­re­lated with in­come even at high lev­els of in­come, are strongly cor­re­lated with ed­u­ca­tion, are of­ten U-shaped in age, and do not vary over the days of the week (Stone et al. 2010; Kah­ne­man and Deaton 2010)

This doesn’t mean eval­u­a­tive mea­sures can’t be used as prox­ies for hap­piness. The eval­u­a­tive and ex­pe­rience mea­sures do cor­re­late, sug­gest­ing eval­u­a­tive judge­ments are, in part if not in whole, de­ter­mined by how happy peo­ple are (OECD 2013, p32-34). While Deaton and Stone iden­tify some cases where they come apart, it’s hard to think of in­stances where there would be differ­ent pri­ori­ties, ei­ther for gov­ern­ments or for effec­tive al­tru­ists, if the goal was try­ing to max­imise life satis­fac­tion scores rather than hap­piness scores. The sen­si­ble ap­proach seems to be use hap­piness data where it’s available, but LS data where it isn’t and, when us­ing LS data to de­ter­mine cost-effec­tive­ness, to keep in mind how the two might differ. Fur­ther work to in­ves­ti­gate if and when us­ing one mea­sure over an­other would gen­er­ate differ­ent pri­ori­ties seems valuable.

While eu­daimo­nia mea­sures are re­garded as a com­po­nent of SWB, I will not re­fer to them again. Not only are they not the most rele­vant com­po­nent, lit­tle data has been col­lected on them and it’s not con­cep­tu­ally clear what they cap­ture.

3. Can hap­piness re­ally be mea­sured?

There are long-stand­ing doubts (in eco­nomics) that hap­piness ei­ther can be or needs to be mea­sured. Some his­tor­i­cal con­text may be helpful here. Ac­cord­ing to La­yard (2003):

In the eigh­teenth cen­tury Ben­tham and oth­ers pro­posed that the ob­ject of pub­lic policy should be to max­imise the sum of hap­piness in so­ciety. So eco­nomics evolved as the study of util­ity or hap­piness, which was as­sumed to be in prin­ci­ple mea­surable and com­pa­rable across peo­ple. It was also as­sumed that the marginal util­ity of in­come was higher for poor peo­ple than for rich peo­ple, so that in­come ought to be re­dis­tributed un­less the effi­ciency cost was too high.

All these as­sump­tions were challenged by Lionel Rob­bins in his fa­mous book on the Na­ture and Sig­nifi­cance of Eco­nomic Science pub­lished in 1932. Rob­bins ar­gued cor­rectly that, if you wanted to pre­dict a per­son’s be­havi­our, you need only as­sume he has a sta­ble set of prefer­ences. His level of hap­piness need not be mea­surable nor need it be com­pared with other peo­ple. More­over eco­nomics was, as Rob­bins put it, about “the re­la­tion­ship be­tween given ends and scarce means”, and how the “ends” or prefer­ences came to be formed was out­side its scope.

In­ter­est in mea­sur­ing hap­piness has re­turned in re­cent decades.8 This seems to be caused by 1) the Easter­lin para­dox, the (con­tested) find­ing that while richer peo­ple are more satis­fied with their lives than poor peo­ple, an in­crease in av­er­age wealth does not raise av­er­age life satis­fac­tion; 2) the be­havi­oural eco­nomic work of Tver­sky, Kah­ne­man and oth­ers sug­gest­ing in­di­vi­d­u­als do not, when left to their own de­vices, seem to max­imise their own util­ity9 and 3) grow­ing dis­satis­fac­tion with GDP as a mea­sure of progress.10

The idea gov­ern­ments should mea­sure SWB and use it to guide policy has started to take root. In 2013 the OECD is­sued guidelines recom­mend­ing its mem­ber-na­tions col­lect SWB data:

There is now wide­spread ac­knowl­edge­ment that mea­sur­ing sub­jec­tive well-be­ing is an es­sen­tial part of mea­sur­ing qual­ity of life alongside other so­cial and eco­nomic di­men­sions [...] The Guidelines also out­line why mea­sures of sub­jec­tive well-be­ing are rele­vant for mon­i­tor­ing and policy mak­ing.

The UK’s Office of Na­tional Statis­tics have been col­lect­ing data on SWB since 2012 and cur­rently polls 158,000 peo­ple a year. (Read­ers un­fa­mil­iar with SWB mea­sures may find their FAQs helpful.)

Now, some schol­ars who ar­gue we shouldn’t use SWB mea­sures, such as Fleur­baey, Schokkaert and Den­canq (2009), nev­er­the­less ac­cept such mea­sures are mean­ingful:

With the mass of data ac­cu­mu­lated on hap­piness and satis­fac­tion and the de­vel­op­ment of their econo­met­ric ex­ploita­tion, sub­jec­tive util­ity seems more mea­surable than ever. There now seem to be good rea­sons to trust the ex­is­tence of suffi­cient reg­u­lar­ity in hu­man psy­chol­ogy, so that in­ter­per­sonal com­par­i­sons ap­pear fea­si­ble in prin­ci­ple.Th­ese new de­vel­op­ments have trig­gered a re­vival of welfarism as well. If util­ity can be mea­sured af­ter all, why not take it as the met­ric of so­cial welfare? Sev­eral au­thors have taken this line (Kah­ne­man et al. 2004b, La­yard 2005). How­ever, none of the re­cent de­vel­op­ments in the field of mea­sure­ment di­rectly un­der­mine the ar­gu­ments that were raised against welfarism in the philo­soph­i­cal de­bates of the pre­vi­ous decades. The fact that some­thing be­comes eas­ier to mea­sure does not give any new nor­ma­tive rea­son to rely on it.

While the above ex­plains there has been a shift in opinion re­gard­ing the mea­sura­bil­ity of SWB, I have not yet said why we would think the SWB mea­sures are ac­cu­rate—they suc­ceed in mea­sur­ing what they set out to mea­sure.

The ac­cu­racy of a mea­sure is usu­ally as­sessed in terms of its val­idity and re­li­a­bil­ity. Val­idity refers to whether the mea­sure cap­tures the un­der­ly­ing con­cept that it pur­ports to mea­sure. Sup­pose I try to mea­sure your height by weigh­ing you on a set of bath­room scales. The scales might be valid mea­sure of weight but it’s clear, I hope, they are not a valid mea­sure of height. Reli­a­bil­ity is about whether the mea­sure gives con­sis­tent re­sults in iden­ti­cal cir­cum­stances (i.e. it has a high sig­nal-to-noise ra­tio). If my scales pro­duce a ran­dom num­ber ev­ery time I step on them, they are not re­li­able. Reli­a­bil­ity is nec­es­sary but not suffi­cient for val­idity; if I used a nor­mal, non-bro­ken set of scales to mea­sure your height it would give me the same score, and so be re­li­able (as­sum­ing your weight doesn’t fluc­tu­ate), but still wouldn’t be valid. As the re­li­a­bil­ity and val­idity of SWB scales has been cov­ered at great length in (OECD 2013) and el­se­where I will largely con­fine my­self to ex­plain­ing the key ideas and pro­vid­ing sev­eral illus­tra­tive quo­ta­tions from that doc­u­ment.

Reli­a­bil­ity can be as­sessed in two ways: by in­ter­nal con­sis­tency—whether the items with a multi-item scale cor­re­late, or differ­ent scales of the same mea­sure cor­re­late—and by test-retest re­li­a­bil­ity, where the same ques­tion is given to the same re­spon­dent more than once at differ­ent times. Note that if the item in ques­tion gen­uinely does change be­tween mea­sures, we would ex­pect the test-retest re­li­a­bil­ity to be low.

Re­gard­ing life eval­u­a­tions, quot­ing (OECD 2013, pp47):

Bjorn­shov (2010), for ex­am­ple, finds a cor­re­la­tion of 0.75 be­tween the av­er­age Cantril Lad­der mea­sure of life eval­u­a­tion from the Gal­lup World Poll and life satis­fac­tion as mea­sured in the World Values Sur­vey for a sam­ple of over 90 coun­tries. [...] Test-retest re­sults for sin­gle item life eval­u­a­tion mea­sure tend to yield cor­re­la­tions of be­tween 0.5 and 0.7 for time pe­riod of 1 day to 2 weeks (Krueger and Schkade, 2008). Michalos and Kahlke (2010) re­port that a sin­gle-item mea­sure of life satis­fac­tion had a cor­re­la­tion of 0.65 for a one year pe­riod and of 0.65 for a two-year pe­riod.

And re­gard­ing af­fect/​ex­pe­rience mea­sures:

There is less in­for­ma­tion available on the re­li­a­bil­ity of mea­sure of af­fect and eu­daimonic well-be­ing than is the case for mea­sures of life eval­u­a­tion. How­ever, the available in­for­ma­tion is largely con­sis­tent with the pic­ture for life satis­fac­tion. In terms of in­ter­nal con­sis­tency re­li­a­bil­ity, Diener et al. (2009) re­port [...] the pos­i­tive, nega­tive and af­fec­tive bal­ance sub­scale of their Scale of Pos­i­tive and Nega­tive Ex­pe­rience (SPANE) have alphas of 0.84, 0.88, and 0.88 re­spec­tively. [...] In the case of test-retest re­li­a­bil­ity, [...] Krueger and Schkade (2008) re­port test-retest scores of 0.5 and 0.7 for a range of differ­ent mea­sures of af­fect over a 2-week pe­riod.

The au­thors con­clude the life eval­u­a­tion and af­fect mea­sures ex­hibit suffi­cient cor­re­la­tion, by the stan­dards of so­cial sci­ence, to be deemed ac­cept­ably re­li­able.

Val­idity, by con­trast, is some­what harder to test than re­li­a­bil­ity be­cause the un­der­ly­ing phe­nom­ena SWB mea­sures at­tempt to cap­ture are sub­jec­tive, hence there is no ob­jec­tive way to demon­strate suc­cess. Nev­er­the­less are three ways to as­sess val­idity. All of these ul­ti­mately rely on whether the mea­sures con­form to our ex­pec­ta­tion about the item we are in­tend­ing to mea­sure.

The first is face val­idity—do re­spon­dents judge the ques­tions as an ap­pro­pri­ate way to mea­sure the con­cept of in­ter­est? If not, it’s likely the mea­sures aren’t valid. In the case of SWB mea­sures, it’s some­what ob­vi­ous this is the case, e.g. that ask­ing peo­ple whether they felt happy yes­ter­day is a good way to as­sess whether they felt happy yes­ter­day. Par­ti­ci­pants aren’t gen­er­ally asked about face val­idity, but this can be tested by (a) re­sponse speed and (b) non-re­sponse rates: if peo­ple don’t take a long time, or don’t an­swer, that sug­gests they don’t un­der­stand the ques­tion. Me­dian re­sponse rates for SWB ques­tions are around 30 sec­onds for sin­gle item mea­sure, sug­gest­ing the ques­tions are not con­cep­tu­ally difficult. (ONS, 2011) Quot­ing from (OECD 2013, pp49): “in a large anal­y­sis by Smith (2013) cov­er­ing three datasets [...] and over 400,000 ob­ser­va­tions, item-spe­cific non-re­sponse rates for life eval­u­a­tion and af­fect were found to be similar for those for [the straight­for­ward] mea­sures of ed­u­ca­tional at­tain­ment, mar­i­tal and labour force sta­tus” which, again, sup­ports the face val­idity of the ques­tions.

The sec­ond is con­ver­gent val­idity—does the item cor­re­late with other proxy mea­sures for the same con­cept? Kah­ne­man and Krueger (2006) list the fol­low­ing as cor­re­lates of both high life satis­fac­tion and hap­piness: smil­ing fre­quency; smil­ing with the eyes (“un­fake­able smile”); rat­ing of one’s hap­piness made by friends; fre­quent ver­bal ex­pres­sions of pos­i­tive emo­tions; hap­piness of close rel­a­tives; self-re­ported health. In ad­di­tion, OECD (2013) states “Diener (2011), sum­maris­ing the re­search in this area, notes that life satis­fac­tion pre­dicts suici­dal ideation (r=0.44) and the low life satis­fac­tion scores pre­dicted suicide 20 years later in a later epi­demiolog­i­cal sur­vey from Fin­land (af­ter con­trol­ling for other risk fac­tors [..]” Such items al­low us to as­sess the mea­sures from the per­spec­tive of falsifi­a­bil­ity: if we ex­pect that (say) those with low life satis­fac­tion would com­mit suicide more of­ten, but our mea­sure of life satis­fac­tion found those with high LS com­mit suicide more of­ten, that would sug­gest the mea­sure lacked val­idity. As it stands, the re­sults sup­port the val­idity of the ex­pe­rience and eval­u­a­tion mea­sures of SWB.

The third is con­struct val­idity—while con­ver­gent val­idity as­sesses how closely the mea­sure cor­re­lates with other proxy mea­sures of the same con­cept, con­struct val­idity con­cerns it­self with whether the mea­sure performs in the way we ex­pect it to. From OECD (2013, pp51):

Mea­sures of [SWB] broadly show the ex­pected re­la­tion­ship with other in­di­vi­d­ual, so­cial and eco­nomic de­ter­mi­nants. Among in­di­vi­d­u­als, higher in­comes are as­so­ci­ated with higher lev­els of life satis­fac­tion and af­fect, and wealthier coun­tries have higher av­er­age lev­els of both types of sub­jec­tive well-be­ing than poorer coun­tries (Sacks, Steven­son and Wolfers, 2010). At the in­di­vi­d­ual level, health sta­tus, so­cial con­tact and ed­u­ca­tion and be­ing in a sta­ble re­la­tion­ship with a part­ner are all as­so­ci­ated with higher lev­els of life satis­fac­tion (Dolan, Peas­good and White, 2008), while un­em­ploy­ment has a large nega­tive im­pact on life satis­fac­tion (Winkel­mann and Winkel­mann, 1998). Kah­ne­man and Krueger (2006) re­port in­ti­mate re­la­tions, so­cial­is­ing, re­lax­ing, eat­ing and pray­ing are as­so­ci­ated with higher lev­els of pos­i­tive af­fect; con­versely, com­mut­ing, work­ing and child­care and house­work are as­so­ci­ated with low level of net pos­i­tive af­fect. Boar­ini et al. (2012) find that af­fect mea­sures have the same broad set of drivers as mea­sures of life satis­fac­tion, al­though the rel­a­tive im­por­tance of some fac­tors changes.

Ma­jor life events, such as un­em­ploy­ment, mar­riage, di­vorce and widow­hood, are shown to re­sult in long-term, sub­stan­tial changes to SWB, just as one would ex­pect them to. The time-se­ries in figure 1, from Clark, Diener, Geogel­lis and Lu­cas (2007), dis­plays the LS-im­pact of such events for males (con­trol­ling for other vari­ables) be­fore, dur­ing and af­ter they oc­cur (y-axis records the change in LS on a 0-10 scale; the re­sults are similar for fe­males). Note the time se­ries shows an­ti­ci­pa­tion of the event. We can see, for ex­am­ple, a de­crease in LS lead­ing up to a di­vorce, whereas widow­hood is barely an­ti­ci­pated and comes as a huge shock.

Figure 1. The dy­namic effects of life and labour mar­ket events on life satis­fac­tion (male) (Clark, Diener, Geogel­lis and Lu­cas (2007) (Y-axis rep­re­sents ab­solute change in of life satis­fac­tion on 1-10 scale)

Figure 2 from Clark, Fleche, La­yard, Powdthavee, Ward (2017, p100) shows a similar time-se­ries, this time for dis­abil­ity from three differ­ent data-sets. In­di­vi­d­u­als seem to par­tially, rather than fully, adapt to dis­abil­ity.11 This is what we might sup­pose would hap­pen: be­com­ing dis­abled is very bad, but be­ing dis­abled is some­what less bad as one’s lifestyle and mind­set ad­justs. It’s worth not­ing here one ma­jor po­ten­tial ob­jec­tion to the use of SWB mea­sures is that peo­ple do not re­ally adapt to changes in cir­cum­stances, they sim­ply change how they use their scales. How­ever, if scale re-norm­ing did take place, we would ex­pect to see adap­ta­tion to all con­di­tions. Yet, we do not see this: the LS scores in figure 1 above show peo­ple adapt to some things and not oth­ers. Fur­ther, Oswald and Powdthawee (2008) find there is less adap­ta­tion to se­vere dis­abil­ity than to mild or mod­er­ate dis­abil­ity, sug­gest­ing scale norm­ing is not oc­cur­ing and that the SWB scores are re­flect­ing re­al­ity.

Figure 2. Adap­ta­tion to dis­abil­ity in differ­ent coun­try data-sets Clark et al.(2017, p100)

As men­tioned be­fore, if the SWB mea­sures had pro­duced counter-in­tu­itive re­sults (got the ‘wrong an­swers’) that could lead us to con­clude they were not valid. The above seems to match our ex­pec­ta­tions.

One find­ing that might, at least at first, seem coun­ter­in­tu­itive is the re­la­tion­ship be­tween SWB and in­come. While there is lit­tle dis­agree­ment that richer peo­ple within a given coun­try re­port higher SWB (both on ex­pe­rience and eval­u­a­tion mea­sures), and richer coun­tries re­port higher SWB, there is less con­sen­sus over whether SWB in­creases over time as coun­tries be­come wealthier. This is the so-called ‘Easter­lin Para­dox’, dis­played in figure 3 be­low Clark et al. (2018, p203). A crit­i­cal re­sponse to SWB mea­sures could be made as fol­lows “the Easter­lin Para­dox shows in­creas­ing over­all eco­nomic pros­per­ity doesn’t in­crease SWB. But it’s ob­vi­ous in­creas­ing over­all eco­nomic should raise SWB. There­fore, the SWB mea­sures must be wrong”.

Such a re­sponse is too quick. First, the de­bate still rages over whether the Easter­lin Para­dox holds - Steven­son and Wolfers (2008) ar­gues it does not, Easter­lin et al. (2016) re­ply. Se­cond, as Clark (2016) notes, a large body of re­search finds in­di­vi­d­ual SWB de­pends not just on the in­di­vi­d­ual’s own in­come, but also their in­come rel­a­tive to that of the refer­ence group she com­pared her in­come to. Thus, if I am a wealthier than you, I should ex­pect to have higher SWB. How­ever, if my in­come rises but the in­come of those I com­pare my in­come to also rises, these effects can­cel out, leav­ing my SWB un­changed. Hence the Easter­lin para­dox can be ex­plained in large part by the phe­nomenon of so­cial com­par­i­son: we judge our lives against those of oth­ers.

Figure 3. Change in sub­jec­tive well-be­ing and GDP/​head over time

In a par­tic­u­larly in­sight­ful study, Solnick and He­men­way (2005), in­di­vi­d­u­als were asked to choose be­tween differ­ent states of the world, as fol­lows.

A: Your cur­rent yearly in­come is $50,000; oth­ers earn $25,000

B: Your cur­rent yearly in­come is $100,000; oth­ers earn $200,000

Ab­solute in­come is higher in B than in A, while rel­a­tive in­come is higher in A than in B. In­di­vi­d­u­als ex­press a clear prefer­ence for A, clearly sug­gest­ing the im­por­tance of rel­a­tive in­come. Hence, with fur­ther anal­y­sis, the Easter­lin para­dox is not as counter-in­tu­itive it might seem.

Over­all, the eval­u­a­tion and ex­pe­rience SWB mea­sures seem both re­li­able and valid.

4. Can we com­pare in­di­vi­d­u­als’ hap­piness scores?

We now move on from whether SWB mea­sures are ac­cu­rate, to whether they are com­pa­rable be­tween in­di­vi­d­u­als. Here’s a po­ten­tial con­cern: for a given scale, say life satis­fac­tion, is go­ing from 7 to 8 for one per­son equiv­a­lent to an­other per­son go­ing from 2 to 3? In jar­gon, this is the ques­tion of whether the scales ex­hibit in­ter­per­sonal car­di­nal­ity. Read­ers who are not in­ter­ested in or con­cerned by this prob­lem are wel­come to skip to sec­tion 5.

I un­pack this con­cern in stages.

The first ques­tion to ask is: does the un­der­ly­ing phe­nom­ena of in­ter­est—the thing the SWB scales are try­ing to mea­sure—have a car­di­nal struc­ture, or is it merely or­di­nal? That is, it rep­re­sents some­thing that can be quan­tified—like length, height, weight, etc. - or does it merely rep­re­sent an or­der­ing—like ‘A is taller than B’? (1st, 2nd, 3rd … are the or­di­nal num­bers, 1, 2, 3, … are the car­di­nal num­bers).

It is in­tu­itively ob­vi­ous hap­piness is car­di­nal, as re­vealed by our lin­guis­tic use. It is en­tirely sen­si­ble to say ’X hurt twice as much as Y” or “I feel 10 times bet­ter than I did yes­ter­day”.12 If hap­piness were or­di­nal, the most we could say would be “X hurts worse than Y” and “I feel bet­ter to­day than I did yes­ter­day”.

If life satis­fac­tion scales cap­ture a psy­cholog­i­cal state of satis­fac­tion, then this would be car­di­nal; as above, in­tu­itively, one can feel twice as satis­fied about X vs Y.13

Given the un­der­ly­ing phe­nom­ena of in­ter­est has a car­di­nal struc­ture, the next ques­tion is whether in­di­vi­d­ual’s re­port­ing on the scale is equal-in­ter­val (an­other term for this is lin­ear), i.e. go­ing from 510 to 610 is an equiv­a­lent im­prove­ment as go­ing from 710 to 810. One worry is that in­di­vi­d­u­als in­ter­pret SWB scales as log­a­r­ith­mic, like the Richter scale, where the mag­ni­tude of go­ing from 610 to 710 is 10 times that of go­ing from 510 to 610, rather than as lin­ear/​equal-in­ter­val.14

While pos­si­ble, non-lin­ear re­port­ing seem un­likely.15 Ex­per­i­men­tal ev­i­dence from Van Praag (1993) sug­gests that when pre­sented with a num­ber of (non SWB-re­lated) points, re­spon­dents au­to­mat­i­cally treat the differ­ence be­tween points as roughly equal-in­ter­val. Fur­ther, it is in­tu­itively much harder for or­di­nary peo­ple (i.e. non-math­e­mat­i­ci­ans) to re­port how happy/​satis­fied they feel on a log­a­r­ith­mic scale than a lin­ear one. If I ask my­self “how happy am I right now in a 0-10 log­a­r­ith­mic scale?” to try to an­swer this ques­tion I first have to think “how happy am I on a lin­ear 0-10 scale.” I then try to re­mem­ber how log­a­r­ithms work and con­vert from there. This is so much harder to do that I as­sume scale use must be equal-in­ter­val.

Given the scales have in­trap­er­sonal car­di­nal­ity, the fi­nal ques­tion is whether they have in­ter­per­sonal car­di­nal­ity: is one per­son’s re­ported one point in­crease on a 0 to 10 scale equiv­a­lent to a one point in­crease for some­one else?

There are two differ­ent con­cerns here. First, in­di­vi­d­u­als could cor­rectly re­port where they are be­tween the min­i­mum and max­i­mum points of the scales, but have differ­ent ca­pac­i­ties for SWB. There could be ‘util­ity mon­sters’ who ex­pe­rience 1000 times more hap­piness than oth­ers. Se­cond, in­di­vi­d­u­als could have the same max­i­mum and min­i­mum ca­pac­i­ties, but use the scales differ­ently. Sup­pose al­most ev­ery­one re­ports a given sen­sa­tion as 610, but a few peo­ple re­port the same feel­ing as an 810; keep­ing the same ter­minol­ogy, this lat­ter group are ‘lan­guage mon­sters’.

We can make the same re­ply to both con­cerns. So long as these differ­ences are ran­domly dis­tributed, they will wash out as ‘noise’ across large num­bers of peo­ple: there will be as many peo­ple with a greater ca­pac­ity for SWB as those with less, and as many who use the scale too con­ser­va­tively as use it too gen­er­ously. Se­cond, in re­sponse to util­ity mon­sters, it seems un­likely, given our shared biol­ogy, that the util­ity ca­pac­i­ties of hu­mans will, in prac­tice, vary by very much.16 Third, re­gard­ing lan­guage, I ob­serve we do tend to, in gen­eral, reg­u­late one an­other’s lan­guage use. For in­stance, if I say “I’m hav­ing a ter­rible day: I stubbed my toe” you are likely to say “Hold on. That’s not a ter­rible day. That’s a mildly bad day”. A hy­poth­e­sis, which could con­ceiv­ably be tested, is that this lan­guage reg­u­la­tion pushes us to­wards us­ing SWB scales in a similar way. If lan­guage did not have a shared mean­ing, it would be of no use at all.

We might ob­ject to the this last point that, even if groups reg­u­late their mem­bers’ lan­guage use, differ­ent groups could still use scales differ­ently. As an em­piri­cal test on this, a study by Hel­liwell et al. (2016) of im­mi­grants mov­ing from 100 differ­ent coun­tries to Canada found that, re­gard­less of coun­try of ori­gin, the av­er­age lev­els and dis­tri­bu­tions of life satis­fac­tion among im­mi­grants mimic those of Cana­di­ans, sug­gest­ing LS re­ports are pri­mar­ily driven by life cir­cum­stances. If there re­ally was sub­stan­tial cul­tural differ­ence in LS scale use, this re­sult would not oc­cur.

There­fore, it seems rea­son­able to in­ter­pret SWB data as in­ter­per­son­ally car­di­nal. How­ever, as this point seems im­por­tant, more work here would be wel­come.

5. Us­ing life satis­fac­tion as the com­mon cur­rency for well-be­ing ad­justed life years (WALYs)

Sup­pose we ac­cept we can use LS scores to mea­sure hap­piness. What next? One, straight­for­ward op­tion would be to mea­sure LS (and other SWB met­rics) im­pacts di­rectly in RCTs. If we know the costs of a pro­gramme, we could then es­tab­lish how much it costs to pro­duce one ‘life satis­fac­tion point-year’ or ‘LSP’ - equiv­a­lent to in­creas­ing life satis­fac­tion for one per­son by one point on a 10 point scale for a year. This method is struc­turally similar to as­sess­ing cost per Qual­ity-Ad­justed Life Year (QALY) - which effec­tive al­tru­ists are already fa­mil­iar with and I won’t go into—ex­cept QALYs are mea­sured on a 0-1 scale whereas LS is on a 0-10 scale.

Table 1. How adult life satis­fac­tion (0-10) is af­fected by cur­rent cir­cum­stances (BHPS) (cross-sec­tion) (Clark et al. 2018, p199)

I ex­pect many effec­tive al­tru­ists will wel­come the idea of us­ing LSPs in­stead of QALYs. Most EAs already ac­cept that, in prin­ci­ple, we need a mea­sure of ‘well-be­ing ad­justed life-years’ (WALYs).17 QALYs cap­ture

health, and as I noted at the start, not only is health not all that mat­ters, we will still need a com­mon cur­rency that al­lows health and non-health out­comes to be traded-off against one an­other, and a non-ar­bi­trary method to de­ter­mine the value of out­comes in this cur­rency. LSPs could par­tially or fully fulfill the role of be­ing the WALY met­ric. For those who think hap­piness is the only in­trin­sic good, LSPs should be suffi­cient—un­less and un­til a bet­ter mea­sure of hap­piness can be found. Those that value goods other than hap­piness will, pre­sum­ably, value hap­piness to some ex­tent, and inas­much as they do, LSPs will be one as­pect of WALYs they need to con­sider alongside other goods.18

RCT data us­ing LS will not always be available. Where it is not, an al­ter­nate way to de­ter­mine how differ­ent out­comes af­fect LS is to rely on data from large pop­u­la­tion sur­veys. Us­ing a mul­ti­vari­ate re­gres­sion anal­y­sis that con­trols for differ­ent cir­cum­stances, re­searchers can then es­ti­mate the strength of the cor­re­la­tions be­tween LS and var­i­ous other fac­tors. Table 1 from Clark et al. (2018, p199) con­tains the re­sults of such an anal­y­sis both for the im­pact a given change has on an in­di­vi­d­ual’s LS and that which it has on oth­ers.

This in­for­ma­tion can be used to make in­fer­ences about the ex­pected LS effect of a given out­come with­out re­quiring an RCT, at least if it’s straight­for­ward to mea­sure the out­come, as it is in cases of un­em­ploy­ment. In other cases the re­la­tion­ship be­tween life satis­fac­tion and other mea­sures, such as par­tic­u­lar health met­rics, it will need to be es­tab­lished so other met­rics can be con­verted in LS scores. Some of this work been done: see La­yard (2016) for such a table con­vert­ing LS scores into both other SWB mea­sure and var­i­ous health met­rics.

Two sets of com­ments are worth men­tion­ing be­fore we turn to char­ity anal­y­sis. First, three re­marks on the re­sults in the table that will be rele­vant again shortly: 1) dou­bling in­come is as­so­ci­ated with a con­stant in­crease in life satis­fac­tion; 2) the gain one in­di­vi­d­ual re­ceives from a dou­bled in­come causes a nearly equally large equiv­a­lent loss in LS to oth­ers; 3) men­tal health, em­ploy­ment and part­ner­ship have a much big­ger per-per­son im­pact that a dou­bling of in­come does.

Se­cond, while it is already pos­si­ble to es­ti­mate the LS effect of many out­comes, if effec­tive al­tru­ists want to use SWB data to as­sess effec­tive­ness, they should en­courage re­searchers—most ob­vi­ously those work­ing in global de­vel­op­ment—to col­lect it alongside other vari­ables. This only re­quires quickly sur­vey­ing in­di­vi­d­u­als at the start and end of an im­pact as­sess­ment. This gen­er­ates ex­tra work, but also al­lows di­rect mea­sure­ment of the out­come that is (pre­sum­ably) of most in­ter­est.

6. Eval­u­at­ing the life satis­fac­tion im­pact of (GiveWell) charities

GiveWell has iden­ti­fied the char­i­ties it con­sid­ers do the most good per dol­lar. We can use the LS lens to as­sess how good these char­i­ties are at in­creas­ing life satis­fac­tion. GiveWell’s top char­i­ties can be di­vided into (1) life-sav­ing char­i­ties, such as the Against Malaria Foun­da­tion, and (2) life-im­prov­ing char­i­ties, those that in­crease in­di­vi­d­u­als’ well-be­ing dur­ing their lives, such as GiveDirectly and the Schis­to­so­mi­a­sis Con­trol Ini­ti­a­tive (SCI). Ac­cord­ing to GiveWell, the vast ma­jor­ity the benefit of their recom­mended life-im­prov­ing char­i­ties arises due to even­tual in­come and con­sump­tions gains, rather than gains to health.19 We’ve already seen that mak­ing peo­ple richer is a sur­pris­ingly un­promis­ing way of in­creas­ing LS in de­vel­oped coun­tries (as in­creas­ing the wealth of some re­duces the hap­piness of oth­ers). I show this also seems to hap­pen even at a low-level, such that treat­ing men­tal health via a char­ity like StrongMinds looks much more cost-effec­tive. Then I illus­trate how to com­pare life-sav­ing to life-im­prov­ing char­i­ties. I claim it’s un­clear, due to some method­olog­i­cal is­sues, which of the in­ter­ven­tions in­creases LS more effec­tively.

6.1 Life-im­prov­ing charities

Let’s start with GiveDirectly, a char­ity which pro­vides un­con­di­tional cash trans­fers to Kenyan farm­ers, as there have now been three stud­ies con­ducted used life satis­fac­tion data (alongside other SWB met­rics). Re­search sug­gests GiveDirectly’s cash trans­fers in­crease life satis­fac­tion by about 0.3 life satis­fac­tion points - LSPs—on a 10 point scale.20 This was mea­sured af­ter 4.3 months on av­er­age, but let’s as­sume this effect lasts a whole year, this af­fect ap­plies to ev­ery­one in the re­cip­i­ent house­hold, and there are 5 peo­ple per house­hold on av­er­age. The av­er­age cash trans­fer is $750, which gen­er­ates 1.5 LSPs with our as­sump­tions (0.3 x 1 x 5), im­ply­ing a cost-effec­tive­ness of 2 LSPs/​$1000.

How­ever, this es­ti­mate is likely to sub­stan­tially over­state the effec­tive­ness of cash trans­fers. It only ac­counts for the life satis­fac­tion in­crease of re­cip­i­ents. Re­search into GiveDirectly has sug­gested that their cash trans­fers, while mak­ing some peo­ple wealthier (and so more satis­fied with life) have nega­tive spillovers: it makes non-re­cip­i­ents less satis­fied. As Haushofer, Reis­inger and Shapiro (2015, p1) state:

The de­crease in life satis­fac­tion in­duced by trans­fers to neigh­bors more than offsets the di­rect pos­i­tive effect of trans­fers, and is largest for in­di­vi­d­u­als who did not re­ceive a di­rect trans­fer them­selves.

This might seem sur­pris­ing, but the find­ing that in­creas­ing wealth leaves ag­gre­gate LS rel­a­tively un­changed is en­tirely con­sis­tent with the find­ings in table 1 and re­sults men­tioned in Clark (2016) above.

We might hope these nega­tive spillovers would dis­si­pate even­tu­ally and, over the long run, cash trans­fers would be effec­tive in in­creas­ing life satis­fac­tion. How­ever, a new 2018 study on the long-term (3 year) effects of GiveDirectly by Haushofer and Shapiro (2018, p. 22) finds re­cip­i­ents, com­pared to non-re­cip­i­ents in dis­tant villages, have 40% more as­sets but that re­cip­i­ents do no bet­ter on a psy­cholog­i­cal well-be­ing in­dex. GiveWell dis­cuss this study, note it sug­gests cash trans­fers are less effec­tive than they thought, but state they are await­ing the re­sults of GiveDirectly’s “gen­eral equil­ibrium” study, which aims to as­sess spillover effects, be­fore up­dat­ing their cost-effec­tive­ness as­sess­ment.21

I also look for­ward to fur­ther re­search, but for the mo­ment I think the ev­i­dence sug­gests it’s far from ob­vi­ous cash trans­fers have a ro­bust, pos­i­tive effect on hap­piness (mea­sured as life satis­fac­tion) in ei­ther the short or the long-term. Many peo­ple as­sume cash trans­fers must in­crease hap­piness, but it’s un­clear what ev­i­dence some­one could pro­duce to sup­port this in­tu­ition.

There a few ob­jec­tions one could make to defend the effec­tive­ness of cash trans­fers here.

First, one could sim­ply ig­nore the nega­tive spillovers. It’s un­clear how this would be jus­tified. Even if we did do this, as I will shortly show, StrongMinds, a men­tal health char­ity, looks more cost-effec­tive any­way.

Se­cond, we could say that LS scores have got the in­tu­itively wrong an­swer here and there­fore they are not a valid mea­sure of hap­piness (speci­fi­cally, the claim is they lack con­struct val­idity). This re­sponse seems un­con­vinc­ing: the critic would need to ex­plain how LS seemed to get the wrong re­sult in this case whilst get­ting the right re­sult in many other ar­eas. If LS mea­sures suc­ceed in mea­sur­ing in­di­vi­d­ual’s life satis­fac­tion, pre­sum­ably they do so in gen­eral.

Third, one could claim that hap­piness is not all that mat­ters. Yet, pre­sum­ably, it is one of the things that mat­ters. Hence, most (if not all) peo­ple will want to con­sider the im­pact on hap­piness. Plau­si­bly, cash trans­fers could be jus­tified on non-hap­piness grounds, such as au­ton­omy pro­mo­tion. Some­one who pressed this would need to de­ter­mine how to trade off hap­piness against au­ton­omy and come to an over­all de­ci­sion about which char­ity did the most good; this is not a con­cern I can ad­dress here.22

Fourth, we might hope there are long-term, so­cietal effects of in­creas­ing wealth, even if it doesn’t in­crease the ag­gre­gate life satis­fac­tion of the im­me­di­ate re­cip­i­ents and their neigh­bours over the first 3 or so years. Note this would be a very differ­ent jus­tifi­ca­tion for donat­ing to GiveDirectly from the usual one given, which is that it benefits the re­cip­i­ents. It would re­quire sub­stan­tially al­ter­ing the cost-effec­tive­ness anal­y­sis. Fur­ther, it’s not ob­vi­ously true that in­creas­ing a coun­try’s wealth will in­crease ag­gre­gate life satis­fac­tion. I’ve already men­tioned the Easter­lin para­dox, which refers to de­vel­oped coun­tries, but a par­tic­u­lar ar­rest­ing case of de­vel­op­ment from a low level failing to in­crease hap­piness is China: it’s SWB seems to have gone down been 1990 and 2015, even though per cap­ita GDP in­creased by 5 times (Easter­lin, Wang and Wang 2017).23

Fifth, we might think this prob­lem could be avoided if money were given to ev­ery­one in the village, rather than just to some. This ig­nores the con­cerns about so­cial com­par­i­sons. If so­cial com­par­i­sons oc­cur, then we would ex­pect mak­ing ev­ery­one in village A richer to re­duce life satis­fac­tion in village B, an ad­ja­cent non-re­cip­i­ent village. We could re­spond to this with “Fine. But what if we made ev­ery­one richer?” which is a restate­ment of the pre­vi­ous ob­jec­tion.

To be clear, the con­cern about GiveDirectly isn’t that it is an in­effec­tive way of alle­vi­at­ing poverty. Rather, the con­cern is that alle­vi­at­ing poverty is sur­pris­ingly in­effec­tive at in­creas­ing hap­piness (mea­sured as LS). Thus, from the LS per­spec­tive, it is un­satis­fac­tory to ob­ject that other top-rated GiveWell char­i­ties are more effec­tive than GiveDirectly at alle­vi­at­ing poverty. What needs to be shown is that alle­vi­at­ing poverty in­creases hap­piness, and it’s un­clear what ev­i­dence sup­ports this the­sis.

We should now be con­cerned about the hap­piness-in­creas­ing effec­tive­ness of all of GiveWell’s life-im­prov­ing char­i­ties, as those char­i­ties are deemed effec­tive on the as­sump­tion they in­crease wealth and in­creas­ing wealth does good over­all. To illus­trate, the Schis­to­so­mi­a­sis Con­trol Ini­ti­a­tive (SCI), a char­ity with treats chil­dren for in­testi­nal worms, is a top-rated GiveWell char­ity that, in fact, GiveWell con­sider more cost-effec­tive at do­ing good than GiveDirectly. We might think the wor­ries about the in­effec­tive­ness of poverty alle­vi­a­tion would not ap­ply here as SCI pro­vides a phys­i­cal health treat­ment. Yet, al­though SCI pro­vides a health in­ter­ven­tion, GiveWell claim that only 2% of SCI’s im­pact comes from ‘short-term health gains’. The re­main­ing 98% arises from ‘even­tual in­come and con­sump­tion gains’: de­wormed chil­dren earn more in later life, and their well-be­ing rises as a re­sult of this ad­di­tional in­come. Hence, the same doubts ex­tend to SCI as well.

An ob­jec­tion here is that we should ex­pect in­ter­ven­tions which help peo­ple earn their own money—such as by im­prov­ing their health—to in­crease hap­piness by more than cash trans­fers, which sim­ply give money to them.24 It seems un­likely this would be true: it’s gen­er­ally ar­gued the merit of cash trans­fers is peo­ple can in­vest the money and use it to earn more for them­selves later. Hence the long-term value of cash trans­fers is from earned in­come too.

Now we turn to StrongMinds, a men­tal health char­ity that pro­vides in­ter­per­sonal group ther­apy to women in Uganda. As the LS anal­y­sis of Clark et al. (2018) sug­gests treat­ing men­tal health is among the most cost-effec­tive way for de­vel­oped-world gov­ern­ments to in­crease hap­piness, it is the nat­u­ral first place to look when search­ing for an effec­tive char­i­ta­ble in­ter­ven­tion. There is no re­search which has di­rectly mea­sured the LS im­pact of treat­ing men­tal health, so I es­ti­mate its effec­tive­ness us­ing other data.25 To save space I have put this anal­y­sis into a spread­sheet. I in­fer that the treat­ment effect is 0.2 LSPs per year for 4 years.[41] StrongMinds say their per-par­ti­ci­pant costs are $102 (StrongMinds Q1.2018 re­port). That sug­gests the im­pact is 0.8 LSPs (4 years * 0.2 LS gain) per $102, or 8 LSPs/​$1000 (round­ing up from 7.84). There is not space here to go into the de­tails of men­tal health treat­ments or ar­gue they are effec­tive.26

Through the LS lens, StrongMinds is more effec­tive than GiveDirectly (and other poverty-alle­vi­at­ing char­i­ties) sim­ply be­cause it seems to clearly in­crease net hap­piness.27 It’s pos­si­ble an­other char­ity or or­gani­sa­tion will be much more effec­tive at in­creas­ing hap­piness than StrongMinds, but that is a topic I plan to cover el­se­where.28

For those in­ter­ested in in­creas­ing hap­piness, this re­sult is im­por­tant. Us­ing the em­piri­cal data on hap­piness illu­mi­nates a new cat­e­gory of in­ter­ven­tion—treat­ing men­tal health—that effec­tive al­tru­ists have so far over­looked and now ap­pears more cost-effec­tive that alle­vi­at­ing poverty.29 Part of this must be due to effec­tive al­tru­ists’ his­toric re­li­ance on health met­rics—QALYs and DALYs—which seem to un­der­rate the hap­piness im­pact of men­tal health con­di­tions rel­a­tive to phys­i­cal ones. I dis­cuss this in An­nex A.

6.2 Life-sav­ing charities

As noted in the in­tro­duc­tion, the key ques­tion in GiveWell’s cost effec­tive­ness eval­u­a­tion is “how many years of dou­bled con­sump­tion are as morally valuable as sav­ing the life of an un­der-5-year old child?” This is what they need to com­pare the cost-effec­tive­ness of life-sav­ing to life-im­prov­ing char­i­ties; I stated that it com­bines judge­ments about facts and judge­ments about value to an­swer this ques­tion.

Us­ing LS scores we can take a differ­ent ap­proach to mak­ing this com­par­i­son, which is to work out the cost-effec­tive­ness of life-sav­ing in­ter­ven­tions in LSPs. First we need to know the cost to save a life. Ac­cord­ing to GiveWell’s es­ti­mates, the Against Malaria Foun­da­tion (AMF) saves a life for around $3,500 (i.e. pre­vents a pre­ma­ture death).30

Se­cond, we need the num­ber of years that per­son would have lived for. Sup­pose AMF grants 60 coun­ter­fac­tual years of life.

Third, we need to es­tab­lish how many net LSPs the per­son gains per year—how much bet­ter their lives are than the ‘neu­tral point’ equiv­a­lent to be­ing dead. Aver­age life satis­fac­tion in Kenya, where AMF op­er­ates, is 4.4/​10 (Hel­liwell, La­yard and Sachs 2017, p28). Now we run into a prob­lem. Life satis­fac­tion sur­veys don’t ask peo­ple to spec­ify what point on the 0 to 10 scale they would con­sider equiv­a­lent to not be­ing al­ive. 0 is la­bel­led not at all’ and 10 ‘com­pletely satis­fied’. In­tu­itively, the mid­point in the scale, 5, would be the neu­tral point. Yet, if that’s true, then sav­ing lives through AMF would in, fact, be bad. 4.4 is the be­low the neu­tral point so AMF would be pro­long­ing bad lives, lives worth not liv­ing.31

Let’s sup­pose in­stead the neu­tral point is 4. If this is so, sav­ing the child is worth 0.4 life satis­fac­tion points a year for 60 years, thus 24 LSPs (0.4 x 60).

Given the $3,500 cost, we can calcu­late cost-effec­tive­ness as 6.9 LSPs/​$1,000. Ear­lier, I es­ti­mated StrongMinds’ cost-effec­tive­ness was around 8LSPs/​$1000. Hence, we can now com­pare our life-sav­ing and life-im­prov­ing in­ter­ven­tions in the same units of cost-effec­tive­ness.

A prob­lem for our anal­y­sis is that these cost-effec­tive­ness num­bers are highly de­pen­dent on an (so far) ar­bi­trary de­ci­sion about where the neu­tral point goes. If some­one in­stead set the neu­tral point at 3, which in­tu­itively seems too low, then AMF’s cost-effec­tive­ness would leap to 24.4LSPs/​$1,000 and it would be more cost-effec­tive than StrongMinds.

How could we set­tle where the neu­tral point is? Two strate­gies seem pos­si­ble. First, we could find out at what LS scores in­di­vi­d­u­als re­port their lives are neu­tral on ex­pe­rience mea­sures of SWB. Se­cond, we could poll peo­ple to ask at what LS score out of 10 they would be in­differ­ent be­tween liv­ing with that score for the rest of their life or dy­ing. I am un­aware of any anal­y­sis which has been con­ducted along ei­ther lines. Thus, for the mo­ment, this un­for­tu­nately re­mains a point of arm­chair con­jec­ture.

Thus, us­ing the LS scores, we can es­tab­lish the net hap­piness of differ­ent out­comes. This is a ques­tion of fact, and while we are able to make much of the calcu­la­tion with­out rely­ing on our sub­jec­tive judge­ments, we have had to do so re­gard­ing the lo­ca­tion of the neu­tral point.

How­ever, what is a moral judge­ment, and must be made im­plic­itly or ex­plic­itly, is what the bad­ness of death is. On the ‘life com­par­a­tive’ ac­count of the bad­ness of death, the value of sav­ing a life is the to­tal well-be­ing the per­son would have had if they’d lived. The num­bers I’ve pro­duced above im­plic­itly as­sumed this was the cor­rect view.

One pop­u­lar, al­ter­na­tive view about the bad­ness of death, and the view GiveWell staffers take, is the Time-Rel­a­tive In­ter­est Ac­count (TRIA).32 Ac­cord­ing to TRIA, it’s more im­por­tant to save (say) 20-year olds than 2-year olds even though sav­ing the 2-year old would, we sup­pose, cause around 18 more years of life to be lived. The rea­son to count the 2-year old for less is that very young chil­dren, be­ing rel­a­tively un­der­de­vel­oped, will have a weaker in­ter­est in con­tin­u­ing to live than a fully-de­vel­oped 20-year old. We can see from the the ‘moral weights’ tab of GiveWell’s cost-effec­tive­ness anal­y­sis GiveWell staffers seem to adopt TRIA:33 the value to save an over-5 year old is, de­pend­ing on the staff mem­ber, 100% to 400% times that of sav­ing an un­der-5 year old; the me­dian weight is 200%.34

Note that ad­vo­cates of TRIA will still need to know what the to­tal well-be­ing the per­son would have had: TRIA re­quires that as an in­put the equa­tion where this is then dis­counts by the age of the per­son to de­ter­mine the value of sav­ing the life (as­cer­tain­ing this dis­count is a moral judge­ment).

Let’s sup­pose we are TRIA ad­vo­cates and now re­duce the cost-effec­tive­ness of AMF by half. If the neu­tral point is 3, then the cost-effec­tive­ness of AMF is 12.2 LSPs/​$1,000, and only about 50% more cost-effec­tive than the es­ti­mate for StrongMinds.35 If the neu­tral point is 4, AMFis 3.5 LSPs/​$1,000 and thus less cost-effec­tive.

Fur­ther com­pli­ca­tions arise when we try to ac­count for the ‘so­cial value’ of sav­ing lives—the im­pact sav­ing a life has on ev­ery­one apart from the saved in­di­vi­d­ual. We can di­vide this up into: (1) the effects of the death on friends and fam­ily; (2) con­cerns about un­der- or over­pop­u­la­tion;36 (3) the meat eater prob­lem, the im­pact sav­ing lives has on in­creas­ing an­i­mal suffer­ing due to meat con­sump­tion.37 (2) and (3) are im­por­tant but too much a di­ver­sion to our dis­cus­sion here on us­ing LS mea­sures. Figures for (1) could be de­rived us­ing LS data. Oswald and Powdthawee (2008) es­ti­mate that, shortly af­ter the effect, the death of a child causes a 0.6 loss (on a 0-10 LS scale), a spouse a 1.3 loss, and a par­ent and 0.4 loss. Fur­ther work would needed to as­sess what the to­tal coun­ter­fac­tual LS im­pact of sav­ing a lives on friends and fam­ily is over time, and to ad­just these figures for the de­vel­op­ing coun­try set­ting.38

As I’ve ex­plained the LS ap­proach and shown how it reaches differ­ent re­sults from GiveWell, per­haps the nat­u­ral thing to do would be to ex­plain what GiveWell’s method is and whether the differ­ences emerge from dis­agree­ments about value or dis­agree­ments about facts. Un­for­tu­nately, this is not straight­for­ward to do as GiveWells’s key met­ric (again—trad­ing off years of dou­bled con­sump­tion against the value of sav­ing a un­der-5-year old) com­bines judge­ments of facts and value. I un­der­stand some GiveWell staff in­cor­po­rate SWB sur­veys into their judge­ments, but I am not aware any staff mem­ber who solely bas­ing their anal­y­sis on. What’s more, as GiveWell take the me­dian an­swer of their staff, the ‘GiveWell view’ is a com­pos­ite held by no one in par­tic­u­lar. Hence it’s im­pos­si­ble to know ex­actly where the dis­agree­ments lie. My ap­proach has been to make it clear, on the best available ev­i­dence about hap­piness, what we can say about the hap­piness im­pact of var­i­ous out­comes, in­for­ma­tion ev­ery­one, I ex­pect, will need to take into ac­count. Read­ers who do not be­lieve the best out­come is the one with the largest sum of hap­piness will need to ad­just the anal­y­sis ac­cord­ingly.

To sum­marise this sec­tion: it is now prac­ti­cally pos­si­ble to use self-re­ported life satis­fac­tion scores to de­ter­mine how much var­i­ous out­comes in­crease hap­piness. I ap­plied the LS ap­proach to eval­u­at­ing the cost-effec­tive­ness of GiveWell’s top char­i­ties and showed poverty-alle­vi­at­ing char­i­ties are un­promis­ing com­pared to men­tal health in­ter­ven­tions; em­piri­cal and philo­soph­i­cal ques­tions re­main over com­par­ing the value of life-sav­ing to life-im­prov­ing in­ter­ven­tions.

7. What should effec­tive al­tru­ists do next?

Rely­ing on life satis­fac­tion (and other SWB mea­sures) to tell us how to max­imise hu­man hap­piness is a new ap­proach for effec­tive al­tru­ists. If—as I have ar­gued we should—we think is the cor­rect method and we ac­cept its re­sults, then it challenges the cur­rent as­sump­tions within EA about how to in­crease hap­piness. Most ob­vi­ously, it sug­gests look­ing at the best ways to im­prove men­tal health, which is cur­rently not re­garded as a pri­or­ity. The above anal­y­sis, com­par­ing de­vel­op­ing world char­i­ties, is just the first step in as­cer­tain­ing how in­di­vi­d­u­als can max­imise hap­piness with their time and money. Much more work is re­quired. We will need new eval­u­a­tions of in­ter­ven­tions and char­i­ties. We will need to iden­tify the rele­vant and use­ful play­ers within the hap­piness-in­creas­ing space and build a com­mu­nity around it. We will need to think what high-im­pact ca­reers look like for those who want to max­imise hu­man welfare.39 We will need to de­velop a re­search strat­egy and de­ter­mine what the pri­ori­ties on it are.

This is a large challenge and af­ter EAGxNether­lands in 2018 a small ‘Hu­man Welfare Task Force’ (HWTF) formed to think how we might take this for­ward. It cur­rently con­sists of my­self, Alex Lintz, Denisa Pop, Robin van Dalen, Siebe Rozen­dal, Peter Bri­et­bart and Jes­sica van Haften. If you wish to be in­volved, please email siebe[at]ea­gron­ingen.org and michael.plant[at]philos­o­phy.ox.ac.uk.

If you agree with the Man­i­festo, here are some other con­crete ac­tions you can take:

Ar­range a lo­cal EA gath­er­ing where you dis­cuss this is­sue, pos­si­bly watch­ing my talk on Max­imis­ing World Hap­piness from EA Global Lon­don 2017.

Get your­self up to speed on the lat­est re­search. I’ve pro­duced a Read­ing List: Hap­piness for Effec­tive Altru­ists and Other Hu­mans.

We have started work­ing on a Hu­man Welfare Re­search Agenda of ques­tions we think need an­swer­ing. You are wel­come to make sug­ges­tions or pick some­thing from the list to be­gin in­ves­ti­gat­ing. If you would like to in­ves­ti­gate some­thing, please get in touch so we can co­or­di­nate more effec­tively. We are also col­lect­ing some other re­search doc­u­ments in our Google Drive folder.

Join the Face­book group Effec­tive Altru­ism, Men­tal Health and Hap­piness.

If you cur­rently donate to anti-poverty char­i­ties, you could switch your dona­tion to an effec­tive men­tal health char­ity in­stead. That said, given the un­cer­tainty about what the best way to pro­mote hu­man welfare is, you may want to wait for fur­ther in­for­ma­tion. We are con­sid­er­ing the pos­si­bil­ity of set­ting up a re­search or­gani­sa­tion fo­cus on hu­man hap­piness. If you’re in­ter­ested in fund­ing re­search into this area, please email me.

An­nex A: How good are QALYs and DALYs as prox­ies for hap­piness?

Effec­tive al­tru­ists have tended to use health met­rics—QALYs and DALYs—as the proxy for WALYs. How­ever, these stan­dard health met­rics are mis­lead­ing prox­ies for hap­piness. For ease, I quote at length from Clarke et al. (2018, p85):

In the QALY sys­tem, the im­pact of a given ill­ness in re­duc­ing the qual­ity of life is mea­sured us­ing the replies of pa­tients to a ques­tion­naire known as the EQ5D. Pa­tients with each ill­ness give a score of 1, 2, or 3 to each of five ques­tions (on Mo­bil­ity, Self-care, Usual Ac­tivi­ties, Phys­i­cal Pain, and Men­tal Pain). To get an over­all ag­gre­gate score for each ill­ness a weight has to be at­tached to each of the scores. For this pur­pose mem­bers of the pub­lic are shown 45 cards on each of which an ill­ness is de­scribed in terms of the five EQ 5D di­men­sions. For each ill­ness mem­bers of the pub­lic are then asked,“Sup­pose you had this ill­ness for ten years. How many years of healthy life would you con­sider as of equiv­a­lent value to you?” The replies to this ques­tion provide 45N val­u­a­tions, where there are N re­spon­dents. The eval­u­a­tions can then be re­gressed on the differ­ent EQ5D di­men­sions. Th­ese “Time Trade-Off” val­u­a­tions mea­sure the pro­por­tional Qual­ity of Life Lost (mea­sured by equiv­a­lent changes in life ex­pec­tancy) that re­sults from each EQ5D di­men­sion.

As can be seen, these QALY val­ues re­flect how peo­ple who have mostly never ex­pe­rienced these ill­nesses imag­ine they would feel if they did so. A bet­ter al­ter­na­tive is to mea­sure di­rectly how peo­ple ac­tu­ally feel when they ac­tu­ally do ex­pe­rience the ill­ness.

The re­sult would be very differ­ent. Figure [4] con­trasts the out­comes from these two differ­ent ap­proaches. The ex­ist­ing QALY weights are shown by the shaded bars of Figure [4]. This scale has been nor­mal­ized so that the bars can be com­pared with those from a re­gres­sion of life-satis­fac­tion on the same vari­ables. This lat­ter re­gres­sion is shown in the black bars in the figure—the mag­ni­tudes here are not β-statis­tics but the ab­solute im­pact of each vari­able on life-satis­fac­tion (0–1). As can be seen from the lower part of the figure, the pub­lic hugely un­der­es­ti­mated by how much men­tal pain (com­pared with phys­i­cal pain) would re­duce their satis­fac­tion with life.

Figure 4. How life satis­fac­tion (0-1) is af­fected by the EQ5D, com­pared with weights used in QALYs

QALYs are not a very good guide to what makes peo­ple satis­fied be­cause they are based on peo­ple’s prefer­ences over how bad they imag­ine var­i­ous health states are, rather than how bad they are when they ex­pe­rience them, and as noted ear­lier, we are not very good at imag­in­ing what makes us or oth­ers happy. To high­light a par­tic­u­larly out­stand­ing dis­crep­ancy, Dolan and Met­calfe (2012, from whom the above figure 4 is de­rived) re­port sub­jects agreed to hy­po­thet­i­cally give up as many years of their re­main­ing life, about 15%, to be cured of ‘some difficulty walk­ing’ as they would to be cured of ‘mod­er­ate anx­iety or de­pres­sion.’ How­ever, from SWB mea­sures ‘mod­er­ate anx­iety or de­pres­sion’ is as­so­ci­ated with 10 times a greater loss to life satis­fac­tion, and 18 times a greater loss to daily af­fect, than ‘some difficulty walk­ing’ is. This seems com­pel­ling ev­i­dence, if we need any, that if we rely on peo­ple’s prefer­ences about imag­ined fu­tures we will get the wrong an­swers about what makes in­di­vi­d­u­als happy. The ex­pla­na­tion here is that, when imag­in­ing the fu­ture, we fail to an­ti­ci­pate that our ‘psy­cholog­i­cal im­mune sys­tem’ will ‘kick in’ and cause us to adapt to some cir­cum­stances but not oth­ers: what Gilbert et al. (2009) call ‘im­mune ne­glect’. Con­di­tions such as mo­bil­ity im­pair­ment are things we stop pay­ing at­ten­tion to, whereas men­tal ill­nesses are com­par­a­tive ‘full-time’ and con­tinue to af­fect our sub­jec­tive ex­pe­riences.

I’m un­aware of any stud­ies com­par­ing DALYs and SWB mea­sures di­rectly, but given how DALYs are con­structed—typ­i­cally by ask­ing ex­perts for rat­ings—we would ex­pect the same prob­lems to oc­cur. See Sassi (2006) for a com­par­i­son of the method­olo­gies for QALYs and DALYs.

The im­pli­ca­tion of this anal­y­sis is that we should sub­stan­tially re­duce how cost-effec­tive phys­i­cal health in­ter­ven­tions are com­pared to men­tal health in­ter­ven­tions, as­sum­ing we’d pre­vi­ously judged them by QALYs and DALYs as Giv­ing What We Can did in their re­ports into men­tal health (GWWC 2015, 2016). Thus, un­less we find phys­i­cal health in­ter­ven­tions that are in­cred­ibly cheap com­pared to men­tal health treat­ments, we should be scep­ti­cal phys­i­cal health in­ter­ven­tions will turn out to be com­par­a­tively more cost-effec­tive. The pos­si­ble ex­cep­tion would be us­ing opi­ates to treat se­vere pain: pain is clearly very bad for well-be­ing and opi­ates can be very cheap (Knaul et al. 2018).

Per­haps the rea­son the effec­tive al­tru­ism has largely over­looked men­tal ill­ness largely be­cause of the move­ment’s early re­li­ance on QALYs/​DALYs as an ap­prox­i­ma­tion of well-be­ing. Given how much QALYs un­der­rate the bad­ness of men­tal health, it’s not much of a sur­prise in­di­vi­d­u­als us­ing those met­rics would be led to the (false) con­clu­sion men­tal health is com­par­a­tively unim­por­tant.



1 I am very grate­ful to the fol­low­ing for their helpful com­ments on doc­u­ment: James Snow­den, John Halstead, Sjir Hoeij­mak­ers, Kel­lie Liket, Denisa Pop, Peter Bri­et­bart, Jes­sica van Haaften-Carr, Siebe Rozen­dal and Alex Lintz. Spe­cial thanks much go to the lat­ter two who also proofread mul­ti­ple drafts and re­moved my in­nu­mer­able mis­takes.

2 As­sum­ing we mean ‘best’ in a moral, rather than merely aes­thetic, sense.

3 Tech­ni­cally, (a) and (b) suffice only to give us a fixed-pop­u­la­tion ax­iol­ogy. A third com­po­nent (c), spec­i­fy­ing who the bear­ers of value are (i.e. pre­sent, ac­tual, nec­es­sary or pos­si­ble peo­ple) is needed to give us a vari­able-pop­u­la­tion ax­iol­ogy. Per­haps con­fus­ing, what philoso­phers call a ‘pop­u­la­tion ax­iol­ogy’ usu­ally just speci­fies (b) and (c).

4 Both think the value of a state of af­fairs is the to­tal hap­piness of those who ex­ist in that state of af­fairs (he­do­nic to­tal­ism?)

5 See Bron­steen et al. (2013) for the ar­gu­ment cost-benefit anal­y­sis should be re­placed with ‘well-be­ing anal­y­sis’, i.e. the use of SWB mea­sures. Clark et al. (2018) ar­gues SWB should be used to de­ter­mine policy and sets out the state of the art on how this could be done. Dolan and Met­calfe (2012) and Dolan, La­yard and Met­calfe (2011) make recom­men­da­tions for how gov­ern­ments should mea­sure and use well-be­ing. ,

6 The re­la­tion­ship be­tween what so­cial sci­en­tists call ‘SWB’ and philoso­phers call ‘well-be­ing’ is too big a di­gres­sion for this doc­u­ment.

7 Ar­guably, an even bet­ter method would be to mea­sure brain waves, as­sum­ing we could cor­re­late hap­piness with brain states.

8 OECD (2013, p20) notes “dur­ing the 1990s there was an av­er­age of less than five ar­ti­cles on hap­piness or re­lated sub­jects each year in the jour­nals cov­ered by the Econ­lit database. By 2008 this had risen to over fifty each year”

9 E.g. Think­ing, Fast and Slow by Kah­ne­man (2011)

10 E.g. see the Sen-Fi­toussi-Stiglitz re­port: Com­mis­sion on the Mea­sure­ment of Eco­nomic and So­cial Progress (2009)

11 In a meta-anal­y­sis, Luh­mann et al. (2012) com­pare the rates of adap­ta­tion on eval­u­a­tive and ex­pe­rience mea­sures of SWB, find­ing some differ­ences.

12 A doubt here is whether, given the di­ver­sity of ex­pe­riences, there is any one prop­erty that all happy (and un­happy) ex­pe­riences have, such that we can quan­tify them on a sin­gle scale. Fol­low­ing Crisp (2006) I think there is: the prop­erty of pleas­ant­ness, or ‘he­do­nic tone’. Even though headaches and heart­breaks feel differ­ent, I find noth­ing con­fus­ing in say­ing one can feel as bad as an­other. For dis­sent, see, e.g. Nuss­baum (2012)

13 An al­ter­na­tive mo­ti­va­tion for mea­sur­ing life satis­fac­tion is that it cap­ture a cog­ni­tive judge­ment of the ex­tent to which some­one’s prefer­ences are satis­fied—i.e. the world is go­ing the way they want it to—rather than a feel­ing. If un­clear if this cog­ni­tive judge­ment could be car­di­nal. As Haus­mann (1995) ar­gues at length, while we can or­der prefer­ences (prefer­ences are about how en­tire wor­lds go), there is in prin­ci­ple no con­cep­tual unit of dis­tance be­tween our ranked prefer­ences to gen­er­ate car­di­nal­ity. This is too much of a di­ver­sion to dis­cuss fur­ther here. Given we are ul­ti­mately in­ter­ested in hap­piness here, and eval­u­a­tions are rele­vant only as prox­ies for, it seems con­ve­nient to side-step the prob­lem say eval­u­a­tions cap­ture a felt strength of satis­fac­tion.

14 The worry in­di­vi­d­u­als may self-re­port in this way has been raised with me sep­a­rately by Toby Ord and James Snow­den.

15 If it turned out re­ported SWB was a func­tion of log(ac­tual SWB) that wouldn’t make it im­pos­si­ble to make in­ter­per­sonal car­di­nal com­par­i­sons: we sim­ply need to con­vert in­di­vi­d­u­als’ scores onto a lin­ear scale.

16 This same think­ing will not ex­tend to com­par­ing hu­mans to an­i­mals, or com­par­ing hu­mans to some hy­po­thet­i­cal hu­mans ge­net­i­cally mod­ified to ex­pe­rience greater hap­piness.

17 MacAskill ex­plic­itly states this in Do­ing Good Better

18 Sup­pose some­one thought there are two in­trin­sic goods, hap­piness and au­ton­omy. They would need a mea­sure of au­ton­omy—au­ton­omy-ad­justed life years? (AALYs) - and they would need to set up a con­ver­sion rate be­tween LSPs and AALYs to de­ter­mine which out­come did the most good on the com­pos­ite mea­sure.

19 See the ‘re­sults’ table of v4 of GiveWell’s CEA: only 2% of the benefit of de­worm­ing char­i­ties is due to health. I re­turn to this mo­men­taily.

20 Refer­ences and calcu­la­tions available in this spread­sheet: Life satis­fac­tion im­pact of treat­ing men­tal health vs alle­vi­at­ing poverty

21 Available at: https://​​blog.givewell.org/​​2018/​​05/​​04/​​new-re­search-on-cash-trans­fers/​​

22 See foot­note 13.

23 The au­thors at­tribute this de­cline (or, lack of in­crease) to re­duc­tions in job se­cu­rity and vast labour move­ments which eroded the so­cial fabric of so­ciety.

24 John Halstead sug­gested this con­cern to me.

25 Ac­cord­ing to La­yard (per­sonal con­ver­sa­tion) no work has yet been done the mea­sure the im­pact of men­tal health treat­ment us­ing LS scores.

26 See Thrive by La­yard and Clark (2015) for a good book-length sum­mary.

27 StrongMinds is still four times more effec­tive than GiveDirectly even if, for some un­clear rea­son, GiveDirectly’s nega­tive spillovers are ig­nored.

28 For ex­am­ple, Friend­ship Bench, a Zim­bab­wean men­tal health pro­gramme, has had an RCT con­ducted on its in­ter­ven­tion (Chibanda et al. 2015). The study shows that in­ter­ven­tion is more effec­tive, in terms of per-par­ti­ci­pant im­prove­ment in stan­dard­ised men­tal health scores, than ei­ther StrongMind’s eval­u­a­tion of their own pro­gramme or my es­ti­mate of StrongMind’s pro­gramme. I do not (yet) have per-par­ti­ci­pant costs for Friend­ship Bench so I can­not es­ti­mate cost-effec­tive­ness.

29 Nei­ther MacAskill or Singer men­tion it in their 2015 books. GWWC does have two blog posts on men­tal health (2015, 2016) but both of which say—based on a DALY anal­y­sis—it looks less cost-effec­tive than other health in­ter­ven­tions; of the more than 700 in­ter­na­tional char­i­ties GiveWell has listed as ‘con­tacted’ or ‘con­sid­ered’ only one ap­pears re­lated to men­tal health.

30 GiveWell, “2018 GiveWell Cost-Effec­tive­ness Anal­y­sis — Ver­sion 4,” 2018, https://​​docs.google.com/​​spread­sheets/​​d/​​1moyxmsn4UjhH3CzFJmPwAN7LUAk­m­maoXDb6bdW3WILg/​​edit#gid=1364064522.

31 Some peo­ple I’ve spo­ken to have sug­gested it’s bad to save lives solely on the grounds those in the de­vel­op­ing world lead lives be­lows the neu­tral point.

32 TRIA is most as­so­ci­ated with Jeff McMa­han. See Liao (2007) for a dis­cus­sion.

33 An al­ter­na­tive pos­si­bil­ity is that they adopt the life com­par­a­tive ac­count but think it’s bet­ter to save older peo­ple be­cause the ‘so­cial value’ of sav­ing lives—the im­pact on oth­ers—is greater for older than younger peo­ple.

34 In­ter­est­ing, we can also use LS scores to as an in­tu­ition check on the rel­a­tive weights GiveWell staffers at­tach to sav­ing an un­der-5 vs dou­bling con­sump­tion for 1 year. The me­dian GW staffer think sav­ing an un­der-5 is 50 times more valuable that dou­bling in­com­ing for 1 year. As­sum­ing the child would live 60 years, that im­plies the child is 0.83 LSPs above the neu­tral line (plau­si­ble if child lives at 4.8/​10 and death is 4) and that dou­bling 1 per­son’s in­come for a year has an 0.83 LSPs effect (not all plau­si­ble given ear­lier ev­i­dence).

35 It would be bet­ter to say that AMF is deemed as hav­ing a cost-effec­tive­ness morally equiv­a­lent to a life-im­prov­ing in­ter­ven­tion of 12.2LSP/​$1,000.

36 See Greaves (2018) for a sum­mary ar­gu­ing it’s un­clear whether the Earth is un­der- or overpopulated

37 See e.g. Weathers (2016) and EA con­cepts (n.d.). Not the meat eater prob­lem also pro­vides a fur­ther rea­son to re­duce the value of eco­nomic de­vel­op­ment, not just the value of sav­ing lives: the longer peo­ple life and the richer they be­come, the more meat they eat.

38 If a loss of life is to be ex­pected, it is pre­sum­ably less bad.

39 Out­side the EA world, there are peo­ple work­ing on hu­man hap­piness. Some economists work on what gov­ern­ments can do for their cit­i­zens. Some psy­chol­o­gists work on how in­di­vi­d­u­als can find hap­piness for them­selves. Some physi­ci­ans work on how to im­prove men­tal health treat­ments. Some or­gani­sa­tion, such as Ac­tion for Hap­piness, seek to teach in­di­vi­d­u­als to pro­mote hap­piness in their own lives and com­mu­ni­ties. What is lack­ing are any or­gani­sa­tions that seek to an­swer the EA ques­tion: how can in­di­vi­d­u­als use their spare re­sources to make other peo­ple as happy as pos­si­ble? One an­swer to this ques­tion is, of course, ‘co­or­di­nate with one an­other’.