I am a researcher at the Happier Lives Institute. In my work, I assess the cost-effectiveness of interventions in terms of subjective wellbeing.
JoelMcGuire
The effect of cash transfers on subjective well-being and mental health
...a lack of good resources on how to actually do research.
Yes! It’s hard to convey that you need to have already done a literature search to know what you need to search in the first place.
I second “Focus on breadth first!”. Googling is cheap. Search longer than you think you need to. An additional good paper can be decisive in forming a view on a new topic.
Searching smartly is often more effective than going down the citation trail.
I think “going down the citation trail” can often be very fruitful, especially if you search citations within a foundational article. E.g.,
Also: a good template can help you organize and focus your search. I only sorted the studies I found by their most salient features (the 4 colored 0⁄1 columns) after I’d gathered quite a few.
I did not know about http://connectedpapers.com/. Seems useful!
Hello,
Glad to hear you’re excited!
Unfortunately, we do not have a clear picture yet of how many WELLBYs per dollar is a good deal. Cash transfers are the first intervention we (and I think anyone) have analyzed in this manner. Figuring this out is my priority and I will soon review the cost effectiveness of other interventions which should give more context. To give a sneak peak, cataracts surgery is looking promising in terms of cost effectiveness compared to cash transfers.
Those estimates are still in the works, but stay tuned!
I realized my previous reply might have been a bit misleading so I am adding this as a bit of an addendum.
There are previous calculations which include WELLBY like calculations such as Michael’s comparison of StrongMinds to GiveDirectly in his 2018 Mental Health cause profile or in Origins of Happiness / Handbook for WellBeing Policy Making in the UK. Why do we not compare our effects to these previous efforts? Most previous estimates looked at correlational effects and give no clear estimate of the total effect through time.
An aside follows: An example of these results communicated well is Micah Kaats thesis (which I think was related to HRI’s WALY report). They show the relationship of different maladies to life satisfaction and contextualize it with different common effects of life events.
Moving from standard deviations to points on a 0-11 scale is a further difficulty.
Something else worth noting is that different estimation methods can lead to systematically different effect sizes.In the same thesis, Kaats shows that fixed effects model tend to have lower effects.
While this may make it seem as if the the non fixed effects estimates are over-estimates. That’s only if you “are willing to assume the absence of dynamic causal relationships”—whether that’s reasonable will depend on the outcome.
As Michael did in his report with StrongMinds, and Clark et al., did for two studies (moving to a better neighborhood and building cement floors in Mexico—p. 207) in Origins of happiness, there have been estimates of cost effectiveness that take duration of effects into consideration, but they address only single studies. We wish to have a good understanding of evidence base as a whole before presenting estimates.
To further explain this last point, I have the view that the more scrutiny is applied to an effect the more it diminishes (can’t cite a good discussion of this at the moment). Comparing the cost effectiveness of a single study to our synthesis could give the wrong impression. In our synthesis we try hard to include all the relevant studies where it’s plausible that the first study we come across of an alternative well-being enhancing intervention is exceptionally optimistic in its effects.
I think the Charter Cities Institute is as an example of an ecosystem coordinator — they really seem to be the go to people for the area.
Hi Akash, It’s been a few months since your comment but I’m replying in case its still useful.
I’d be curious if you have a “rough sense” of why some programs seem to be so much better than others.
General note is that I am, for at least the next year, mostly staying away from comparing programs and instead will compare interventions. Hopefully one can estimate the impacts of a program from the work I do modeling interventions.
That being said let me try and answer your question.
One of the reasons why CTs make an elegant benchmark is there are relatively few moving parts on both ends. You inform someone they will receive cash. They then do what needs to be done to receive it, which at most means walking a long long ways. The issues with “quality” seem to arise primarily from A. How convenient they make it. and B. whether the provider reliably follow through with the transfers. Biggest variation I’m concerned with comes with administrative costs as share of the CT, which we still have very little information on. But that’s a factor on the cost not effect side of things.
From this simple description, I expect the programs that do best are those that use digital or otherwise automatic transfers AND are reliable. I don’t think this is situation where the best is 10x as good as average, I’m not sure there’s enough play in the system (however 3-5x variation in cost effectiveness seems possible).
I think GiveDirectly is a good program and quite a bit better than the average government unconditional CT (can put a number on that in private if you’d like). I’m not saying it’s the “best” because as I started this comment by saying, I’m not actively searching for the best program right now. I have some ideas for how we’d quickly compare programs though, I’d be happy to talk about that in private.
However, I can’t help but comment that there are some hard to quantify factors I haven’t incorporated that could favor government programs .For instance, there’s evidence that CTs when reliably ran can increase trust in governments.
But the decision maker isn’t always a donor. It may be a mid-level bureaucrat that can allocate money between programs, in which case intervention level analyses could be useful.
This might become even more important in analyses of other kinds of interventions, where the implementation factors might matter more.
Yes!
But if they do, I think they could inflate the effect size of CT programs on life satisfaction (relative to the effect size that would be found if we used a measure of life satisfaction that was less likely to prime people to think materialistically).
I agree. It may be worth it to roughly classify the “materialness” of different measures and see if that predicts larger effects of a cash transfer.
The post has been updated in what way?
Hi George (presumably),
I found this report very helpful as someone who mostly thinks about measuring the well-being of humans. I think it lays things out nicely : philosophical foundations, then the types of measurement instruments, ending with a discussion of “state of the art” of what’s commonly used.
I also appreciate how the report lays out assessments of reliability, validity and interpersonal comparisons of utility for each class of welfare indicators. However, I think a reader like myself would feel more oriented if each section concluded with a summary assessment of the measure.
This is obviously difficult. Ideally we’d have some sort of model like
wellbeingmeasured=accuracy∗importance=(reliability∗cardinality)∗(validity∗wellbeingaccount)
Where we just plug in values 0-1 into each parameter and presto get the best measures—but I’m not sure if that’s coherent.
It only seemed like I got a sense of your overall judgement in the executive summary and somewhat in the concluding discussion. And for this overall judgement I would have enjoyed reading more reasoning (like how much better are physical health and behavioral indicators than physiological indicators?)
You say:
I argue that rather than relying on any given assessment the best solution is to use a combination of methods that rely on different techniques. The ideal system would use a combination of qualitative measures, expert opinion based measures, an index of animal-based measures, and standalone measures such as preference testing or qualitative behavioural assessment.
But I’m not sure I follow. Would the ideal system use a combination? I think the ideal system would have a single measure that perfectly tracks what matters, no? Could you explain what you’re thinking here?
My last question is: what are y’all’s thoughts on making across species comparisons? This is the question that really interests me, and most of these indicators presented seem to be much, much more suitable to within species assessments of welfare.
Please keep up the great work!
Hello, this is quite exciting!
How do you expect this to change the marginal cost of delivering an additional cash transfer compared to the method GiveDirectly currently uses? Assuming GD spends $100 delivering every $1000 CT using its previous method, what would it look like with MobileAid?
Has GiveDirectly received any pushback about this new method? Such as if there’s negative effects on those who self enroll and are excluded or general privacy concerns from users (regardless of their merit)?
George,
You’re welcome. I’m excited to see what comes next!
(I deleted a previous comment of the same content that was posted using another account. I reposted the same comment using this account to clarify that I am a researcher with the Happier Lives Institute.)
Hello and thank you for your post! I know this is just a book review, but I have some quibbles with your comments on measuring SWB / happiness.
First quote to comment on:
However, it is very difficult to measure utility. Our best studies produce counterintuitive results, such as that income only increases life satisfaction to the extent that you are richer than people around you.
Broadly, I disagree that the best studies using SWB produce counterintuitive results. Engaging pretty broadly with the literature, I’ve been pleasantly surprised by how often subjective well-being (measuring well being by asking people how they feel about their life), conforms with intuitions.
Specifically, I don’t think the book you linked (or the SWB literature more broadly), implies that “income only increases life satisfaction to the extent that you are richer than people around you.” This would mean that 100% of the benefit of an increase in income is due to comparison / relative income effects. I think that claim is stronger than what the evidence supports. And even if there was evidence that a large share of the benefit to income gains came from favorable comparisons—I’m not sure that’s too counterintuitive for high income countries (LMICs would be another story!).
Some papers find that when the authors include a measure of relative income this diminishes the magnitude or significance of the absolute income coefficient (normally log(income) , e.g, Boyce et al., 2010). But other papers find that a large absolute-income effect remains when adding a measure of relative position (income relative to average area income e.g., Tsui et al., 2014).
Aside: Most of the evidence about the importance of relative versus absolute income is from high income countries. We don’t know much about the relative versus absolute income question in low and middle income countries.
Hi Michael and thank you for your comments and engaging with our work!
I may be misremembering, but doesn’t GiveDirectly give to whole villages at a time, anyway, making negative spillover very unlikely? If that’s the case, it seems like all of the spillover effects should be positive (in expectation).
To my understanding, GiveDirectly gives cash transfers to everyone in a village who is eligible. GiveWell says this means almost everyone in a village receives CTs in Kenya and Uganda but not Rwanda (note that GiveDirectly no longer works in Uganda). So it seems like negative spillovers are still possible. However, I think you’re still right that it makes negative spillovers less likely.
Do you have any thoughts on how the spillover effects of these interventions might compare, and is there any interest in looking further into this?
It’s tough to say how the (intra-household) spillovers compare. I guess that CTs could provide a bit more benefit to the household than psychotherapy, but I am very uncertain about this.
My thinking is that household spillovers are at least what your family gets for having you be happier. As you say, “people’s mental health can affect others (especially family, and parents’ mental health on children in particular) in important ways.” I expect this to be about balanced across interventions. Then there are the other benefits, which I think will mostly be pecuniary. In this case, it seems like cash transfers if shared, will boost the household’s consumption more than psychotherapy. Again, we are quite uncertain about how spillovers compare across interventions, but it seems important to figure out what’s going on at least within the household. I can go into more detail if you’d like.
We are very interested in looking further into how the spillover effects of these interventions might compare, particularly intra-household spillovers. But as you might guess, the existing evidence is very slim. To advance the question we need to either wait for more primary research to be done or ask researchers for their data and do the analysis ourselves. We will revisit this topic after we’ve looked into other interventions.
I would guess the effects on SWB through increased income for the direct beneficiaries of StrongMinds are already included in the measurements of effects on SWB, assuming the research participants were similar demographically (including in income, importantly) as the beneficiaries of StrongMinds
I think that’s pretty much correct!
Hi Siebe, thank you for the kind words! We agree that using SWB could help us find new opportunities! We’re excited to explore more of this area.
I was also surprised by the things you mention, but I think they make sense on reflection. I can share more of my reasoning if you’d like (but I’m unsure if that’s what you were asking for).
We don’t have enough information to estimate the relationship between cost and effectiveness, but this is an interesting question! The issue is that we lack studies that contain both the effects and the costs of psychotherapy. However, we should be able to get cost information from another psychotherapy NGO operating in LMICs, so we hope to analyze that too.
I will let Michael comment on the funding situation!
Brian, I am glad to see your interest in our work!
1.) We have discussed our work with GiveWell. But we will let them respond :).
2.) We’re also excited to wade deeper into deworming. The analysis has opened up a lot of interesting questions.
3.) I’m excited about your search for new charities! Very cool. I would be interested to discuss this further and learn more about this project.
4.) You’re right that in both the case of CTs and psychotherapy we estimate that the effects eventually become zero. We show the trajectory of StrongMinds effects over time in Figure 5. I think you’re asking if we could interpret this as an eventual tendency towards depression relapse. If so, I think you’re correct since most individuals in the studies we summarize are depressed, and relapse seems very common in longitudinal studies. However, it’s worth noting that this is an average. Some people may never relapse after treatment and some may simply receive no effect.
5.) I’ll message you privately about this for the time being.
6.) In general we hope to get more people to make decisions using SWB.
7.) I am going to pass the buck on making a comment on this :P. This decision will depend heavily on your view of the badness of death for the person dying and if the world is over or underpopulated. We discuss this a bit more in our moral weights piece. In my (admittedly limited) understanding, the goodness of improving the wellbeing of presently existing people is less sensitive to the philosophical view you take.
Hi Michael,
I try to avoid avoid the problem by discounting the average effect of psychotherapy. The point isn’t to try and find the “true effect”. The goal is to adjust for the risk of bias present in psychotherapy’s evidence base relative to the evidence base of cash transfers. We judge the CTs evidence to be higher quality. Psychotherapy has lower sample sizes on average and fewer unpublished studies, both of which are related to larger effect sizes in meta-analyses (MetaPsy, 2020; Vivalt, 2020, Dechartres et al., 2018 ;Slavin et al., 2016). FWIW I discuss this more in appendix C of the psychotherapy report.
I should note that I think the tool I use needs development. This issue of detecting and adjusting for the bias present in a study is a more general issue in social science.
I do worry about the effect sizes decreasing, but the hope is that the cost will drop to a greater degree as StrongMinds scales up.
We say “post-treatment effect” because it makes it clear the time point we are discussing. “Treatment effect” could refer either to the post-treatment effect or to the total effect of psychotherapy, where the total effect is the decision-relevant effect.
Yes! We’ve looked into this a bit already in our report on comparing the value of doubling consumption to saving the life of a child using SWB. We plan to revisit and expand on this work.
Hi Derek, it’s good to hear from you, and I appreciate your detailed comments. You suggest several features we should consider in our following intervention comparison and version of these analyses. I think trying to test the robustness of our results to more fundamental assumptions is where we are likeliest to see our uncertainty expand. But I moderately disagree that this is straightforward to adapt our model to. I’ll address your points in turn.
Time discounting: We omitted time-discounting because we only look at effects lasting ten years or less. Given our limited time available, adding a section discussing time-discounting would not be worth the effort. It’s worth noting that adding time discounting would only make psychotherapy look better because cash transfers’ benefits last longer.
Cost of StrongMinds: We include all costs StrongMinds incurs. The cost is “total expenditure of StrongMinds” / “number of people treated”. We don’t record any monetary cost to the beneficiary. If an expense to a beneficiary is bad because it decreases their wellbeing, we expect subjective well-being to account for that.
Only depression data? We have subjective well-being and mental health measures for cash transfers, but only the latter for psychotherapy. We discuss why we don’t think differences between MH and SWB measures will make much difference in sections 3.1 of the CT CEA and Appendix A of the psychotherapy report. Section 4.4 of the psychotherapy report discusses the literature on social desirability/experimenter demand (what I take you’re pointing to with your concern about “loading the dice”). The limited evidence suggests, perhaps surprisingly, that people don’t seem very responsive to the perceived demands of the experimenter, in general, or in LMIC settings.
Spillovers: We are working on updating our analysis to include household spillovers. We discuss the intra village spillovers in the cost-effectiveness analysis and the meta-analysis. I think we agree that the community spillovers do not appear likely to be influential.
Sensitivity / robustness: You are correct that we haven’t run as many robustness tests as we could have. These seem like reasonable candidates to consider in an updated version of the CEA comparison. Adding these tests can be conceptually straightforward and sometimes time-efficient. I especially think it’d be good to add another frame of the cost-effectiveness analysis that outputs the likelihood to surpass the 5x-8x bar.
On the other hand, adding robustness checks for model-level assumptions seems like it could take a decent amount of time. In my view it doesn’t seem straightforward to, for example, operationalise moral views, the value of information, reasonable bounds for discount rates, the differences in “conversion rates” between MH and SWB data, etc. But maybe we should be more willing to make semi-uninformed guesses at the range of these values and include these in our robustness tests.
Hi Derek, thank you for your comment and for clarifying a few things.
Time discounting: We will revisit time discounting when looking at interventions with longer time scales. To be clear, we plan to update these analyses for backwards compatibility as we introduce refinements to our models and analyse new interventions.
Costs: You’re right, expenses in an organisation can be lumpy over time. If costs are high in all previous years but low in 2019 and we only use the 2019 figures, we’d probably be making a wrong prediction about future costs. I think a reasonable way to account for this is by treating the cost for an organisation as an average of the previous years, where you give more weight increasingly to years closer to the present.
Depression data: Thanks for the clarification; I think I understand better now. We make a critical assumption that a one-unit improvement in depression scales corresponds to the same improvement in well-being as a one-unit change in subjective well-being scales. If SWB is our gold standard, we can ask if depression scale changes predict SWB scale changes. Our preliminary analyses suggest that the difference here would, in any case, be pretty small. For cash transfers, we found the ‘SWB only’ effect would be about 13% larger than the pooled ‘SWB-and-MH’ effect (see page 10, footnote 16). To assess therapy, we looked at some psychological interventions that had outcome measures in SWB and MH and found the SWB effect was 11% smaller (see p27-8). We’d like to dig further into this in the future. But these are not result-reversing differences.
This an interesting topic, but one I haven’t looked into much. I would like to see more work on this because while some claim that the link between prosocial spending and well-being is universal (Aknin et al., 2013) I wonder if that was a bit premature . The study I reference found cross sectional correlations between subjective well-being and prosocial spending in 136 countries and followed this up with a few small experiments that concurred.
Some other literature in the area for what it’s worth: A series of recent pre-registered experiments (n =~ 7k) found mixed results (2 positive, 1 null) on the effect of prosocial spending (not giving exactly) on happiness (Aknin et al., 2020). Another experiment (n = 615) finds that people do not adapt to giving like they adapt to spending on themselves (O’Brien and Kassirer, 2018). Several studies find that the degree of warm glow is increased by being informed about its impact and having a greater orientation towards “meaning and authenticity” (n = 126) (Lai et al., 2020), another found that happier giving experiences were marked by feeling as if the choice was freely chosen, has a clear impact or is made towards a cause that the giver is connected to (Lok & Dunn, 2020 ).
Chapter four of the 2019 World Happiness Report reviews some of the evidence of prosocial behavior and subjective-well being (although it does not appear to mention the studies I reference above).
Now comes the controversial line from a recent study (n = 325) that takes a different tact: “Regression results showed that saving a life decreased long-run happiness by 0.26 SD (P < 0.01) (Table 1, column 4) relative to receiving money, conditional on individual-specific baseline levels of happiness.” from Falk & Graeber (2020).
Some comments on the above study (I haven’t look at it in detail): By long-run they mean four weeks and they think saving a life means saving a life.
Another relevant quote from the Falk & Graeber paper: