I am a researcher at the Happier Lives Institute. In my work, I assess the cost-effectiveness of interventions in terms of subjective wellbeing.
JoelMcGuire
[Not answering on behalf of HLI, but I am an HLI employee]
Hi Michal,
We are interested in exploring more systematic solutions to aligning institutions with wellbeing. This topic regularly arises during strategic conversations.
Our aim is to eventually influence policy, for many of the reasons you mention. But we’re currently focusing on research and philanthropy. This is because there’s still a lot we need to learn about how to measure and best improve wellbeing. But before we attempt to influence how large amounts of resources are spent, I think we should be confident that our advice is sound.
The institution that I’m most interested in changing is academia. I think:
Wellbeing science should be a field.
There should be academic incentives to publish papers about wellbeing priorities and cost-effectiveness analyses.
The reasoning is that: if academia produced more high quality cost-effectiveness analyses of interventions and charities, HLI would be able to spend less time doing that research, and more time figuring out what that research implies about our priorities.
This also strikes me as likely more valuable than the average project of those who tend to work on wellbeing related topics.
It doesn’t seem implausible that we need to wait (or accelerate) research towards favoring best practices before we make strong, high stakes recommendations.
Following up on that last point, Folk and Dunn (2023) reviewed the power and pre-registration status of research on the 5 most popular recommendations in media for individuals to increase their happiness. The, results, captured in the figure below, are humbling.
That said, there is an organization that attempts to popularize and disseminate finding from wellbeing literature: https://actionforhappiness.org. We haven’t evaluated them yet, but they’re on our long list. I expect they’ll be challenging to evaluate.
I disagree because I think writing text to indicate a sentiment is a stronger signal than pressing a button. So while it’s somewhat redundant, it adds new information IMO.
As a writer, I pay attention to these signals when processing feedback.
Hi again Jason,
When we said “Excluding outliers is thought sensible practice here; two related meta-analyses, Cuijpers et al., 2020c; Tong et al., 2023, used a similar approach”—I can see that what we meant by “similar approach” was unclear. We meant that, conditional on removing outliers, they identify a similar or greater range of effect sizes as outliers as we do.
This was primarily meant to address the question raised by Gregory about whether to include outliers: “The cut data by and large doesn’t look visually ‘outlying’ to me.”
To rephrase, I think that Cuijpers et al. and Tong et al. would agree that the data we cut looks outlying. Obviously, this is a milder claim than our comment could be interpreted as making.
Turning to wider implications of these meta-analyses, As you rightly point out, they don’t have a “preferred specification” and are mostly presenting the options for doing the analysis. They present analyses with and without outlier removal in their main analysis, and they adjust for publication bias without outliers removed (which is not what we do). The first analytic choice doesn’t clearly support including or excluding outliers, and the second – if it supports any option, favors Greg’s proposed approach of correcting for publication bias without outliers removed.
I think one takeaway is that we should consider surveying the literature and some experts in the field, in a non-leading way, about what choices they’d make if they didn’t have “the luxury of not having to reach a conclusion”.
I think it seems plausible to give some weight to analyses with and without excluding outliers – if we are able find a reasonable way to treat the 2 out of 7 publication bias correction methods that produce the results suggesting that the effect of psychotherapy is in fact sizably negative. We’ll look into this more before our next update.
Cutting the outliers here was part of our first pass attempt at minimising the influence of dubious effects, which we’ll follow up with a Risk of Bias analysis in the next version. Our working assumption was that effects greater than ~ 2 standard deviations are suspect on theoretical grounds (that is, if they behave anything like SDs in an normal distribution), and seemed more likely to be the result of some error-generating process (e.g. data-entry error, bias) than a genuine effect.
We’ll look into this more in our next pass, but for this version we felt outlier removal was the most sensible choice.
- 10 Dec 2023 21:20 UTC; 6 points) 's comment on Talking through depression: The cost-effectiveness of psychotherapy in LMICs, revised and expanded by (
Hi Jason,
“Would it have been better to start with a stipulated prior based on evidence of short-course general-purpose[1] psychotherapy’s effect size generally, update that prior based on the LMIC data, and then update that on charity-specific data?”
1. To your first point, I think adding another layer of priors is a plausible way to do things – but given the effects of psychotherapy in general appear to be similar to the estimates we come up with[1] – it’s not clear how much this would change our estimates.
There are probably two issues with using HIC RCTs as a prior. First, incentives that could bias results probably differ across countries. I’m not sure how this would pan out. Second, in HICs, the control group (“treatment as usual”) is probably a lot better off. In a HIC RCT, there’s not much you can do to stop someone in the control group of a psychotherapy trial to go get prescribed antidepressants. However, the standard of care in LMICs is much lower (antidepressants typically aren’t an option), so we shouldn’t be terribly surprised if control groups appear to do worse (and the treatment effect is thus larger).
“To my not-very-well-trained eyes, one hint to me that there’s an issue with application of Bayesian analysis here is the failure of the LMIC effect-size model to come anywhere close to predicting the effect size suggested by the SM-specific evidence.”
2. To your second point, does our model predict charity specific effects?
In general, I think it’s a fair test of a model to say it should do a reasonable job at predicting new observations. We can’t yet discuss the forthcoming StrongMinds RCT – we will know how well our model works at predicting that RCT when it’s released, but for the Friendship Bench (FB) situation, it is true that we predict a considerably lower effect for FB than the FB-specific evidence would suggest. But this is in part because we’re using a combination of charity specific evidence to inform our prior and the data. Let me explain.
We have two sources of charity specific evidence. First, we have the RCTs, which are based on a charity programme but not as it’s deployed at scale. Second, we have monitoring and evaluation data, which can show how well the charity intervention is implemented in the real world. We don’t have a psychotherapy charity at present that has RCT evidence of the programme as it’s deployed in the real world. This matters because I think placing a very high weight on the charity-specific evidence would require that it has a high ecological validity. While the ecological validity of these RCTs is obviously higher than the average study, we still think it’s limited. I’ll explain our concern with FB.
For Friendship Bench, the most recent RCT (Haas et al. 2023, n = 516) reports an attendance rate of around 90% to psychotherapy sessions, but the Friendship Bench M&E data reports an attendance rate more like 30%. We discuss this in Section 8 of the report.
So for the Friendship Bench case we have a couple reasonable quality RCTs for Friendship Bench, but it seems like, based on the M&E data, that something is wrong with the implementation. This evidence of lower implementation quality should be adjusted for, which we do. But we include this adjustment in the prior. So we’re injecting charity specific evidence into both the prior and the data. Note that this is part of the reason why we don’t think it’s wild to place a decent amount of weight on the prior. This is something we should probably clean up in a future version.
We can’t discuss the details of the Baird et al. RCT until it’s published, but we think there may be an analogous situation to Friendship Bench where the RCT and M&E data tell conflicting stories about implementation quality.
This is all to say, judging how well our predictions fair when predicting the charity specific effects isn’t clearly straightforward, since we are trying to predict the effects of the charity as it is actually implemented (something we don’t directly observe), not simply the effects from an RCT.
If we try and predict the RCT effects for Friendship Bench (which have much higher attendance than the “real” programme), then the gap between the predicted RCT effects and actual RCT effects is much smaller, but still suggests that we can’t completely explain why the Friendship Bench RCTs find their large effects.
So, we think the error in our prediction isn’t quite as bad as it seems if we’re predicting the RCTs, and stems in large part from the fact that we are actually predicting the charity implementation.
- ^
Cuijpers et al. 2023 finds an effect of psychotherapy of 0.49 SDs for studies with low RoB in low, middle, and high income countries (comparisons = 218#), and Tong et al. 2023 find an effect of 0.69 SDs for studies with low RoB in non-western countries (primarily low and middle income; comparisons = 36). Our estimate of the initial effect is 0.70 SDs (before publication bias adjustments). The results tend to be lower (between 0.27 and 0.57, or 0.42 and 0.60) SDs when the authors of the meta-analyses correct for publication bias. In both meta-analyses (Tong et al. and Cuijpers et al.) the authors present the effects after using three publication bias corrected methods: trim-and-fill (0.6; 0.38 SDs), a limit meta-analysis (0.42; 0.28 SDs), and using a selection model (0.49; 0.57 SDs). If we averaged their publication bias corrected results (which they did without removing outliers beforehand) the estimated effect of psychotherapy would be 0.5 SDs and 0.41 for the two meta-analyses. Our estimate of the initial effect (which is most comparable to these meta-analyses), after removing outliers is 0.70 SDs, and our publication bias correction is 36%, implying that we estimate our initial effect to be 0.46 SDs. You can play around with the data they use on the metapsy website.
- ^
Hi Victor,
Our updated operationalization of psychotherapy we use in our new report (page 12) is
“For the purposes of this review, we defined psychotherapy as an intervention with a structured, face-to-face talk format, grounded in an accepted and plausible psychological theory, and delivered by someone with some level of training. We excluded interventions where psychotherapy was one of several components in a programme.”
So basically this is “psychotherapy delivered to groups or individuals by anyone with some amount of training”.
Does that clarify things?
Also, you should be able to use our new model to calculate the WELLBYs of 1 to 1 more traditional psychotherapy since we include studies with 1 to 1 in our model. Friendship Bench, for instance, uses that model (albeit with lay mental health workers with relatively brief training). Note that in this update our findings about group versus individual therapy has reversed and we now find 1 to 1 is more effective than group delivery (page 33). This is a bit of a puzzle since it disagrees somewhat with the broader literature, but we haven’t had time to look into this further.
They only include costs to the legal entity of StrongMinds. To my understanding, this includes a relatively generous stipend they provide to the community health workers and teachers that are “volunteering” to provide StrongMinds or grants StrongMinds makes to NGOs to support their delivery of StrongMinds programs.
Note that 61% of their partnership treatments are through these volunteer+ arrangements with community health workers and teachers. I’m not too worried about this since I’m pretty sure there aren’t meaningful additional costs to consider. these partnership treatments appear to be based on individual CHWs and teachers opting in. I also don’t think that the delivery of psychotherapy is meaningfully leading them to do less of their core health or educational work.
I’d be more concerned if these treatments were happening because a higher authority (say school administrators) was saying “Instead of teaching, you’ll be delivering therapy”. The costs to deliver therapy could then reasonably seen to include the teacher’s time and the decrease in teaching they’d do.
But what about the remaining 39% of partnerships (representing 24% of total treatments)? These are through NGOs. I think that 40% of these are delivered because StrongMinds is giving grants to these NGOs to deliver therapy in areas that StrongMinds can’t reach for various reasons. The other 60% of NGO cases appears to be instances where the NGO is paying StrongMinds to train it to deliver psychotherapy. The case for causally attributing these cases of treatment to StrongMinds seems more dubious here, and I haven’t gotten all the information I’d like, so to be conservative I assumed that none of these cases StrongMinds claims as its are attributable to it. This increases the costs by around 14%[1] because it’s reducing the total number treated by around 14%.
Some preemptive hedging: I think my approach so far is reasonable, but my world wouldn’t be rocked if I was later convinced this isn’t quite the way to think about incorporating costs in a situation with more decentralized delivery and more unclear causal attribution for treatment.
- ^
But 1.14 * 59 is 67 not 63! Indeed. The cost we report is lower than $67 because we include an offsetting 7.4% discount to the costs to harmonize the cost figures of StrongMinds (which are more stringent about who counts as treated—more than half of sessions completed are required) with Friendship Bench (who count anyone as treated as receiving at least 1 session). So 59 * (1 − 0.074) * 1.14 is $63. See page 69 of the report for the section where we discuss this.
- ^
Hi Nick,
Good question. I haven’t dug into this in depth, so consider this primarily my understanding of the story. I haven’t gone through an itemized breakdown of StrongMinds costs on a year by year basis to investigate this further.
It is a big drop from our previous costs. But I originally did the research in Spring 2021, when 2020 was the last full year. That was a year with unusually high costs. I didn’t use those costs because I assumed this was mostly a pandemic related aberration, but I wasn’t sure how long they’d keep the more expensive practices like teletherapy they started during COVID (programmes can be sticky). But they paused their expensive teletherapy programme this year because of cost concerns (p. 5).
So $63 is a big change from $170, but a smaller change from $109 -- their pre-COVID costs.
What else accounts for the drop though? I think “scale” seems like a plausible explanation. The first part of the story is fixed / overhead costs being spread over a larger number of people treated with variable (per person) costs remaining stable. StrongMinds spends at least $1 million on overhead costs (office, salaries, etc). The more people are treated, the lower the per person costs (all else equal). The second part of the story is that I think it’s plausible that variable costs (i.e., training and supporting the person delivering the therapy) are also decreasing. They’ve also shifted towards moving away from staff-centric delivery model to using more volunteers (e.g., community health workers), which likely depresses costs somewhat further. We discuss their scaling strategy and the complexities it introduces into our analysis a bit more around page 70 of the report.
Below I’ve attached StrongMinds most recent reporting about their number treated and cost per person treated, which gives a decent overall picture for how the costs and the number treated have changed over time.
Hi Jason,
The bars for AMF in Figure 2 should represent the range of cost-effectiveness estimates that come from inputting different neutral points, and for TRIA the age of connectedness.
This differs from the values given in Table 25 on page 80 because, as we note below that table, the values there are based on assuming a neutral point of 2 and an TRIA age of connectedness of 15.
The bar also differs for the range given in Figure 13 on page 83 because the lowest TRIA value has an age of connectivity of 5 years, where in Figure 2 (here) we allow it to go as low as 2 years I believe[1].
I see that the footnote explaining this is broken, I’ll fix that.
- ^
When I plug (Deprivationism, neutral point =0; TRIA, neutral point =0, age of connectedness = 2) into our calculator it spits out a cost-effectiveness of 91 WELLBYs per $1,000 for deprivationism and 78 for TRIA (age of connectivity = 2) -- this appears to match the upper end of the bar.
- ^
Neat work. I wouldn’t be surprised if this ends up positively updating my view on the cost-effectiveness of advocacy work.
What’s your take on possibility someone could empirically tackle a related issue we also tend to do a lot of guessing at—the likelihood of $X million spent advocacy in a certain domain leading to reform.
The prospect of a nuclear conflict is so terrifying I sometimes think we should be willing to pay almost any price to prevent such a possibility.
But when I think of withdrawing support for Ukraine or Taiwan to reduce the likelihood of nuclear war, that doesn’t seem right either—as it’d signal that we could be threatened into any concession if nuclear threats were sufficiently credible.
How would you suggest policymakers navigate such terrible tradeoffs?
How much do you think the risk of nuclear war would increase over the century if Iran acquired nuclear weapons? And what measures, if any, do you think are appropriate to attempt to prevent this or other examples of nuclear proliferation?
note that a large portion of Somaliland appears occupied by rebels at the moment. But other than that it has indeed been much more peaceful.
“Thank you for the comment. There’s a lot here. Could you highlight what you think the main takeaway is? I don’t have time to dig into this at present, so any condensing would be appreciated. Thanks again for the time and effort.” ??
I believe that large tech companies are, on average, more efficient at converting talent into market cap value than small companies or startups are. They typically offer higher salaries, for one.
This may be true for market cap, but let’s be careful when translating this to do-goodery. E.g., wages don’t necessarily relate to productivity. Higher wages could also reflect higher rents, which seems plausibly self-reinforcing by drawing (and shelving) innovative talent from smaller firms. A quote from a recent paper by Akcigit and Goldschlag (2023) is suggestive:
“when an inventor is hired by an incumbent, compared to a young firm, their earnings increase by 12.6 percent and their innovative output declines by 6 to 11 percent.”
I don’t have a good grasp of the literature. Still, the impression I got hanging around economists interested in innovation during my PhD led me to believe the opposite: that smaller firms were more innovative than larger firms, and the increasing size of firms over the past few decades is a leading candidate for explaining the decline in productivity and innovation.
Speaking from my own experience working in a tiny research organisation, I wish I could have started as a researcher with the structure and guidance of a larger organization, but I really doubt I’d have pursued as important research if we hadn’t tried to challenge other, larger organizations. Do you feel differently with QURI?
I don’t think this is right- “Russia” doesn’t make actions, Vladimir Putin does; Putin is 70, so he seems unlikely to be in power once Russia has recovered from the current war; there’s some evidence that other Russian elites didn’t actively want the war, so I don’t think it’s right to generalize to “Russia”.
Even if it was true that many elites were anti-war before the invasion, I think the war has probably accelerated a preexisting process of ideological purification. So even when Putin kicks the can, I think the elites will be just as likely to say “We didn’t go far enough” than “We went too far”. I expect at least some continuity in the willingness to go to war by Putin’s successor.
A US-China war would be fought almost entirely in the air and sea; Ukraine is fighting almost entirely on land. The weapons Ukraine has receive are mostly irrelevant for a potential US-China war; e.g. the Marines have already decided to stop using tanks entirely, and the US being capable of shipping the vast amounts of artillery ammunition being consumed in Ukraine to a combat zone would require the US-China war to already be essentially won.
Why would it be fought almost entirely in the air and sea? That sounds like a best or worst-case scenario, i.e., China isn’t able to actually land or China achieves air and naval superiority around Taiwan. The advanced weapons systems Ukraine has received seem very relevant: Storm shadow, HIMARs, Abrams + Leoopard, Patriot, Javelin, etc. And shipping weapons doesn’t seem to require the war to be essentially won, just that the US can achieve local air and naval superiority over part of Taiwan with a harbour. Complete dominance of the skies in a conflict is rare.
Weapons being sent to Ukraine are from drawdown stocks, which Taiwan itself hasn’t previously been eligible to receive. Taiwan instead purchases new weapons, but there are many, many other countries purchasing similar types of weapons, and if the US were to become concerned, I’d expect it to prioritize both Ukraine and Taiwan over e.g. Saudi Arabia or Egypt.
My concern is that these US stocks seem to be regenerating very, very slowly.
Hah! Yeah, stepping back, I think these events are a distraction for most people. Especially if they worsen one’s mental health. For me, reflecting on the war makes me feel so grateful and lucky to live where I do.
Another reason to pay attention is when it seems like it could shortly and sharply affect the chances of catastrophe. At the beginning of the war, I kept asking myself, “At what probability of nuclear war should I: make a plan, consider switching jobs, move to Argentina, etc.” But I think we’ve moved out of the scary zone for a while.
Fair jabs, but the PRC-Taiwan comparison was because it was the clearest natural experiment that came to mind where different bits of a nation (shared language, culture, etc.) were somewhat randomly assigned to authoritarianism or pluralistic democracy. I’m sure you could make more comparisons with further statistical jiggery-pokery.
The PRC-Taiwan comparison is also because, imagining we want to think of things in terms of life satisfaction, it’s not clear there’d be a huge (war-justifying) loss in wellbeing if annexation by the PRC only meant a relatively small dip in life satisfaction. This is the possibility I found distressing. Surely there’s something we’re missing, no?
I think inhabitants of both countries probably have similar response styles to surveys with these scales. Still, if a state is totalitarian, we should probably not be surprised if people are suspicious of surveys.
Sure, Taiwan could be invaded, and that could put a dampener on things, but, notably, Taiwan is more satisfied than its less likely to be invaded peers of similar wealth and democracy: Japan and South Korea.
I expect one response is, “well, we shouldn’t use these silly surveys”. But what other existing single type of measure is a better assessment of how people’s lives are going?
Taiwan has about a 0.7 advantage on a 0 to 10 life satisfaction scale, with most recently, 5% more of the population reporting to be happy.
I agree that the agency of newer NATO members (or Ukraine) has been neglected. Still, I don’t think this was a primary driver of underestimating Ukraine’s chances—unless I’m missing what “agency” means here.
I assume predictions were dim about Ukraine’s chances at the beginning of the war primarily because Russia and the West had done an excellent job of convincing us that Russia’s military was highly capable. E.g., I was disconcerted by the awe/dread with which my family members in the US Army spoke about Russian technical capabilities across multiple domains.
That said, I think some of these predictions came from a sense that Ukraine would just “give up”. In which case, missing the agency factor was a mistake.
What’s the track record of secular eschatology?
A recent SSC blog post depicts a dialogue about Eugenics. This raised the question: how has the track record been for a community of reasonable people to identify the risks of previous catastrophes?
As noted in the post, at different times:
Many people were concerned about overpopulation posing an existential threat (c.f. population bomb, discussed at length in Wizard and The Prophet). It now seems widely accepted that the risk overpopulation posed was overblown. But this depends on how contingent the green revolution was. If there wasn’t a Norman Borlaug, would someone else have tried a little bit harder than others to find more productive cultivars of wheat?
Historically, there also appeared to be more worry about the perceived threat posed by a potential decline in population IQ. This flowed from the reasonable-sounding argument “Smart people seem to have fewer kids than their less intellectually endowed peers. Extrapolate this over many generations, and we have an idiocracy that at best will be marooned on earth or at worst will no longer be capable of complex civilization.” I don’t hear these concerns much these days (an exception being a recent Clearer Thinking podcast episode). I assume the dismissal would sound something like “A. Flynn effect.[1] B. If this exists, it will take a long time to bite into technological progress. And by the time it does pose a threat, we should have more elegant ways of increasing IQ than selective breeding. Or C. Technological progress may depend more on total population size than average IQ since we need a few Von Neumann’s instead of hordes of B-grade thinkers.”
I think many EAs would characterize global warming as tentatively in the same class: “We weren’t worried enough when action would have been high leverage, but now we’re relatively too worried because we seem to be making good progress (see the decline in solar cost), and we should predict this progress to continue.”
There have also been concerns about the catastrophic consequences of: A. Depletion of key resources such as water, fertilizer, oil, etc. B. Ecological collapse. C. Nanotechnology(???). These concerns are also considered overblown in the EA community relative to the preoccupation with AI and engineered pathogens.
Would communism’s prediction about an inevitable collapse of capitalism count? I don’t know harmful this would have been considered in the short run since most attention was about the utopia this would afford.
Most of the examples I’ve come up with seem to make me lean towards the view that “these past fears were overblown because they consistently discount the likelihood that someone will fix the problem in ways we can’t yet imagine.”
But I’d be curious to know if someone has examples or interpretations that lean more towards “We were right to worry! And in hindsight, these issues received about the right amount of resources. Heck they should have got more!”
What would an ideal EA have done if teleported back in time and mindwiped of foresight when these issues were discovered? If reasonable people acted in folly then, and EAs would have acted in folly as well, what does that mean for our priors?
- ^
I can’t find an OWID page on this, despite google image searches making it apparent it once existed. Might not have fed the right conversations to have allowed people to compare IQs across countries?
Keep up the good struggle my dear fish loving friends.