I am a researcher at the Happier Lives Institute. In my work, I assess the cost-effectiveness of interventions in terms of subjective wellbeing.
JoelMcGuire
Finding before funding: Why EA should probably invest more in research
Hello, Vanessa
To complement Michael’s reply, I think there’s been some work decent related to two of your points, which happens to all be by the same group.
I would be more optimistic of measurements based on revealed preferences, i.e. what people actually choose given several options when they are well-informed or what people think of their past choices in hindsight (or at least what they say they would choose in hypothetical situations, but this is less reliable).
In Benjamin et al. (2012; 2014a) they find that what people choose is well predicted by what they think would make them happier or more satisfied with their life—so there may not be too much tension between these measures as is. However, if you’re interested in a measure of wellbeing more in line with people’s revealed preference then it seems your best best may still lie within the realm of SWB. See Benjamin et al., (2014b) whose title hints at the thrust of their argument “Beyond Happiness and Satisfaction: Toward Well-Being Indices Based on Stated Preference” but note, that this doesn’t mean beyond subjective wellbeing as the approach is still based on asking people about their life. They discuss their approach to SWB more in Benjamin et al., (2021).
I suspect that the meaning of the answer might differ substantially between people in different cultures and/or be normalized w.r.t. some complicated implicit baseline, such as what a person thinks they should “expect” or “deserve”.
The difference in meaning of SWB questions is still, as we note in Section 5, an area of active exploration. For instance, some recent work finds that people will respond to ambiguously worded questions about their life’s wellbeing to include considerations of how their family is doing (Benjamin et al., 2021, which contains a few other interesting findings!).
I wouldn’t be surprised if we discover than we need to do some fine-tuning to make these questions more precise, but that that to me seems like the normal hard work of iterative refinement, instead of an indictment of the whole enterprise!
To WELLBY or not to WELLBY? Measuring non-health, non-pecuniary benefits using subjective wellbeing
We’d like to express our sincere thanks to GiveWell for providing such a detailed and generous response. We are delighted that our work may lead to substantive changes, and echoing GiveWell, we encourage others to critique HLI’s work with the same level of rigour.
In response to the substantive points raised by Alex:
Using a different starting value: Our post does not present a strong argument for how exactly to include the decay. Instead, we aimed to do the closest ‘apples-to-apples’ comparison possible using the same values that GiveWell uses in their original analysis. Our main point was that including decay makes a difference, and we are encouraged to see that GiveWell will consider incorporating this into their analysis.
We don’t have a strong view of the best way to incorporate decay in the CEA. However, we intend to develop and defend our views about how the benefits change over time as we finalise our analysis of deworming in terms of subjective wellbeing.
How to weigh the decay model: We agree with Alex’s proposal to put some weight on the effects being constant. Again, we haven’t formed a strong view on how to do this yet and recognise the challenges that GiveWell faces in doing so. We look forward to seeing more of GiveWell’s thinking on this.
Improving reasoning transparency: We strongly support the plans quoted below and look forward to reading future publications that clearly lay out the importance of key judgements and assumptions.
We plan to update our website to make it clearer what key judgment calls are driving our cost-effectiveness estimates, why we’ve chosen specific parameters or made key assumptions, and how we’ve prioritized research questions that could potentially change our bottom line.
Adding uncertainty to a single intervention may not be too informative. Still, I think it’s more informative than you imply for comparing interventions—especially if you’re considering other decision frameworks for allocating funds beyond giving your money to the one with the highest average cost-effectiveness.
E.g., If you have a framework where you allocate your money in proportion to the probability it has the highest cost-effectiveness, then uncertainty quantification would be essential. I’m not sure anyone supports a rule like this.
Another potentially more real-world example: imagine you’re a grantmaker choosing between 10 interventions that are all 10x more cost-effective than GiveDirectly, but vary in uncertainty. If you’re a Bayesian with more sceptical priors than analysts, you will favour the relatively less uncertain analyses.
For instance, I never expected in my quantifaction of uncertainty in GiveDirectly that there would be practically any probability mass of it being more effective than AMF.
Really? What do you mean by practically? If we crunched the numbers, I guess there’d be a single digit likelihood that GiveDirectly would be more impactful than AMF.
I’m always excited to hear about another push to establish strong EA working communities outside of The Bay and Oxbridge. But this makes me realize a concern I have with the proliferation of “hub projects” — are they separate efforts cancelling out?
Hubs are good[1] because densifying talent increases innovation and community-building. However, if hubs are funded to draw people to their hub, and they are entirely drawing people from other hubs (that are themselves funded to do the same)… I think we can see this would be a poor use of funding.
However, I assume this concern loses its teeth upon contact with reality. I bet what is happening with 2nd tier hub projects like this[2] is that they’re primarily drawing people near them and catching some wandering, hub-less talent for short periods. In that case, they seem like a reasonable use of resources.
But as far as hubs are good meta-EA projects, I wonder how much coordination there should be to ensure we avoid the silly zero-sum race to the bottom dynamics as we see in the USA, where states will outbid each other to lure companies to their area. Is this coordination happening already? One rule that seems reasonable (and may already be implemented) is to prioritise offering residencies to hubless people (all else equal).
- ^
The logical conclusion of developing one hyper-hub seems bad because EA knowledge and talent should be geographically diverse, assuming we want EA to survive a catastrophe.
- ^
I’ve recently heard about Boston, Cape Town, Bahamas, and Berlin but I’m unsure how much they (outside of the FTX Bahamas one, which is) are offering residencies
- ^
Thanks for collecting those quotes here. Because of some of what you quoted, I was confused for a while as to how much weight they actually put on their cost-effectiveness estimates. Elie’s appearance on Spencer Greenberg’s Clearer Thinking Podcast should be the most recent view on the issue.
In my experience, GiveWell is one of the few institutions that’s trying to make decisions based on cost-effectiveness analyses and trying to do that in a consistent and principled way. GiveWell’s cost-effectiveness estimates are not the only input into our decisions to fund programs, there are some other factors, but they’re certainly 80% plus of the case. I think we’re relatively unique in that way.
(Time at the start of the quote: 29:14).
Thank you for your comment Lucas! Looking forward to seeing your forthcoming report.
Firstly, to clarify, we are doing a comparison between GiveWell’s model without decay and with decay. So to make the closest comparison possible we use the starting value and the time values that GiveWell uses. Rows 17, 18, and 19 of their CEA show the values they use for these. They consider the effects of starting 8 years after the deworming ends (~when participants start joining the labour force, see here) and continuing for 40 years with 0.006 each year. We get the same (similar because of our discretisation) total effects as GiveWell of 0.115 (0.113) for their model and show that if we use the exponential decay, we get a ~60% smaller total effect of 0.047.
While it’s plausible there’s a better value to start with; we’re trying to illustrate what would happen if GiveWell added decay to their model. It’s unclear if they would also change the starting value too, but seems like a plausible choice.
The advantage of exponential decay is that it is based on % and so we can extract it from the study and use it on any start value and period, as long as we use the same as GW on these, we can get a proportional decrease in the effect.
We also considered linear decay. When we used linear decay, we found that the reduction in benefits is more dramatic: an 88% reduction. With linear decay, we had to change the start value, but we did this both for the constant effect model and the decay models so we could compare the proportional change.
Of course, a more complex analysis, which neither ourselves nor GiveWell present, would be to model this with the whole individual data.
The main point here is that the effect is very sensitive to the choice of modelling over time and thereby should be explicitly mentioned in GiveWell’s analysis and reporting. I think this point holds.
Deworming and decay: replicating GiveWell’s cost-effectiveness analysis
This is a fascinating topic, and I’d be interested in the feasibility of other interventions to improve the quality of our dream-lives (lucid dreaming can also be extremely pleasant!).
I’m uncertain about how intense, relative to waking, is our conscious experience of dreams. I think a less vivid experience can justify a discount.
I think I would trade several dreaming hours of 9⁄10 for a waking hour of 9⁄10 pleasantness. This may be biased by my poor memory of my dreams. Maybe I’d make the same tradeoff if asked while in REM. I’ve also had a friend tell me she’d trade half a day to prevent an hour of her nightmares, which is troubling.
Would anyone else chime in on how they’d make this tradeoff from a completely self-interested hedonic perspective?
This is a subtle point that people often miss (and I regularly forget!).
Also is a general issue where your simulation involves ratios. A positive denominator whose lower bound is close to zero will introduce huge and often implausible numbers. These are situations where the divergence between E(x) / E(y) and E(x/y) will be the largest.
How can we express the uncertainty around cost/effectiveness if the ratio distribution is hard to reason about and has misleading moments?
This is my question too, mean(x) / mean(y) has no variance! The point of doing simulations was to quantify the uncertainty around our cost-effectiveness calculations!
The approach Sam (colleague at HLI) and I have taken is:
To report point estimates as calculated from point estimates (i.e., run calculation without simulating uncertainty).
For reporting CIs of cost-effectiveness, first check if the mean(x/y) approximates mean(x)/ mean(y). If so, then we use the uncertainty generated.
Edit: mean(x)/mean(y) has some variance, but it’s not quite what we’re after. Thank you Caspar Kaiser for pointing this out.
Estimating the cost-effectiveness of scientific research
Do you know why fan fiction appears to be the go-to medium for rationalists? This seems odd.
Yes, but it only exists for the past 20 years. I’m not sure anyone has done an analysis just looking at growth and changes to subjective wellbeing in low income countries over the past two decades. At HLI we started a project related to this last summer but it stalled and we haven’t picked it up since. I’m afraid 20 years isn’t enough time.
I haven’t really thought of this before! At first this felt like “cheating” in some way, but on reflection it seems more reasonable. Maybe this is what most gratitude interventions get at?
[Caveat I’m not an expert on this topic, just interested]
I think if people make their scales consistent with others they speak to, and use that same scale across time, for that to not imply similar scales across generations would mean that people don’t really speak to people outside of their generation. Which could be plausible, I’m not really sure. It seems like a relatively heavy lift to ask someone “Hey, could you make sure the best and worst life you reference is the same as your grandparents?”
But really I think what we need is more data! We can speculate til the cows come home.
(i.e. I don’t even know what it would mean for your function to be linear)
That people attach equal value to each unit change in a scale?
And indeed people could change functions over time while keeping those 0 and 10 pegs fixed.
That’d be pretty odd, wouldn’t it? I think it’s likelier that the scale stretches or shifts instead of the reporting function changing.
I would change my mind more fully that scale norming is not occuring if I saw evidence that experience-sampling type measures of affect also did not change over the course of decades as countries become/became wealthier (and earned more leisure time etc).
Why do you think that ESM scales wouldn’t change over the course of time if other scales did? Alas, I’m not sure this data exists. I think the closest thing is looking at time use data. Using time use happiness data Han & Kaiser (2021) seem to suggest that Americans are a tiny tiny bit happier since the 80s?
I’d also change my mind if I saw some experiment where people were asked to rate how their lives were going in relation to some shared reference point(s), such as other people’s lives descibed in a good amount of detail, and where people’s ratings of how their lives were going relative to those reference points also didn’t change as countries became significantly wealthier.
Interesting point. What about if there was evidence that people across ages tended to use subjective wellbeing scales in similar ways?
[Note, I can’t speak for Michael, but I work with Michael at the Happier Lives Institute.]
Without taking a stance on the broader thesis of this report, I think the evidence of hedonic adaptation is easy to overstate. Latent state-trait models show that changes in circumstances have detectable changes on well-being at least 10 years later, especially for affective measures. Winning the lottery also has long-term effects of well-being (though more so on life satisfaction), contra the Brickman study that played a role in popularizing hedonic adaptation.
Fair enough, hedonic adaptation in its stronger form, the claim that people will quickly return to a set point of wellbeing, has holes. But I don’t think the strong form was what was referenced in the text. I also see it as pretty uncontroversial that many things we could buy as a country would only give us fleeting enjoyment—which is, I think, Michael’s point.
The claim I think most consistent with the paradox and existing evidence is that people do not fully adapt to higher levels of income, AND they also don’t fully adapt to their neighbors getting richer (see Kaiser, 2020). So really, what we care about isn’t adaptation per se but whether the benefits of income gains for everyone last. And maybe if everyone else gets more prosperous at the same time by similar amounts (unlike in most lottery studies or cash transfer RCTs), those benefits don’t last. How long benefits and harms last is a complex topic that I think the literature doesn’t study enough. But even if we had an excellent model of individual adaptation to change, I’m not sure how well that’d predict what happens when everyone’s circumstances change.
“Also in this context, the research by my own organisation, the Happier Lives Institute, finds that cash transfers to the very poor — those on the global poverty line — actually do have a small but significant effect on subjective wellbeing, one that continues over several years (McGuire, Kaiser, Bach-Mortensen, 2022).”
I would suggest also citing the evidence that this result may be an artifact of publication bias.In our meta-analysis, we did check for publication bias using methods that existed at the time of the analysis—and found nothing major. The Bartos et al. (2022) comment utilises a novel method with unclear merit that reaches a puzzling finding: cash transfers to impoverished people don’t make them happier. I’m not sure what to make of it. If they’re right, I wouldn’t be surprised if most social science meta-analyses suffered the same fate.
Wild Idea #4 Simultaneously Solve American Education, Politics and Maybe Housing
2050: It’s been twenty years since ground broke on Franklin University, nestled in the foothills of Medicine Bow National Forest. Cal Newport was right, skilled researchers and professors flocked to the promise of 70%+ research time. The recent national spotlight on several successful alumni caused a surge of interest in the school. Enrollment swelled. The school was beginning to turn a profit and wean itself off those tech millions.
Its students aren’t that different from those attending good state schools, and the school doesn’t improve them all that much. Only 5% of each class goes into direct EA work (much higher for those who get a degree in Global Priorities Studies or Wellbeing Decision Science). The average student would look much the same if they’d gone elsewhere, but they’re slightly more sensible and expansive in their mindset.
The research is where the university really shines. Professors get plenty of time to do research, as long as 40% of it is spent on the department’s high impact research agenda. They can spend the rest of their time exploring an esoteric and in expectation un-impactful area of knowledge, but the university probably won’t fund them to do it.
The accompanying city of Longwell was planned with sane ideas about zoning and transportation. The relatively infertile land and embrace of building made it one of the only cities with genuinely cheap housing in a place worth living. Miles of hiking and biking trails sprawl out of the city and stretch into the surrounding hills and woodlands. The promise of cheap housing tempts many remote workers to the charmingly walkable city. Many remark that the town feels “old”, comparing it favorably to cozy north-eastern or European towns. The same appeal keeps a share of graduates hanging around, starting companies.
The schools commitment to idealogical diversity somewhat appeases conservatives who sometimes point approvingly to the Universities’ distinct lack of wokeness. Since the beginning, many of the Institutes founders stress the non-partisan aspects of their shared research agenda. Despite the firm and careful guidance, the project draws the ire of pundits warning of a liberal plot to take over Wyoming and steal its securely Republican electoral and congressional votes.
The success of Frank-U and Longwell spawn imitators across the country.
2090: Longwell and its suburbs, long the fastest growing MSA in the United States, is now large enough to be a political force in the state. For the first time, both Senators from Wyoming come from a liberal democratic political party. Sitting near the center, they exert disproportional political force. It still rides goodwill after several technologies stemming from projects related to the city blunted a potentially catastrophic pandemic in 2084.
——----------
Note, that Montana, the second smallest red state with a population of 1 mil instead of ~500k voted 40% for Biden compared to 25% in woyming. This led to an absolute vote gap that was smaller, 100k instead of 120k.
https://en.wikipedia.org/wiki/2020_United_States_presidential_election_in_Montana
https://en.wikipedia.org/wiki/2020_United_States_presidential_election_in_Wyoming
“In LEEP’s second year (this year), we started working in six more countries, bringing the total to nine. We received another government commitment and paint manufacturers in two countries started switching to lead-free.”
I would just like to point out that this is incredible. Policy advocacy seems very hard. To have two commitments from two countries in two years seems very unusual for any domain of policy or regulation. Keep in mind that this domain involves one of the most potent poisons we managed to spread everywhere. Even if these countries drag their feet, they would have to drag very hard for the value of these two years, and the expected value of LEEP to not be very high.
The cost-effectiveness looks strong with this one.
Hi Karthik, For what it’s worth I would be interested to hear your reasons and what sort of evidence would change your mind more than us cranking out analyses.