I am a researcher at the Happier Lives Institute. In my work, I assess the cost-effectiveness of interventions in terms of subjective wellbeing.
JoelMcGuire
Excruciating pain is 1 k times as bad as disabling pain[3].
Disabling pain is 100 times as bad as hurtful pain.
Hurtful pain is 10 times as bad as annoying pain.
Call me a linearity-pilled likert-maxer, but this seems a bit wild. I’ve previously read the articles you linked to, and think it’s plausible that intense suffering can be way worse than we we’d be inclined to imagine, but don’t think it’s obvious by any means or that this necessarily implies the profound possibilities of suffering indicated here.
After some introspection, which seems near state of the art on this question, I’d guess a range of hedonic experience of −1000 to 100 where:
100 is peak experience.
5 is my average experience.
2 would be average day with annoying pain.
-20 is depression.
-600 is probably worst experience.
-1000 seems worst conceivable.
If I take your scale seriously (linearized below) and normalize hurtful pain to −5 (which would be more compatible with your assumptions), then I have to accept that my worst experienced suffering (which certainly felt like I was maxing out my psychological capacity for distress), was 0.12% of what’s possible. That strikes me as a bit odd. But hey, intuitions differ.
Keep up the good struggle my dear fish loving friends.
[Not answering on behalf of HLI, but I am an HLI employee]
Hi Michal,
We are interested in exploring more systematic solutions to aligning institutions with wellbeing. This topic regularly arises during strategic conversations.
Our aim is to eventually influence policy, for many of the reasons you mention. But we’re currently focusing on research and philanthropy. This is because there’s still a lot we need to learn about how to measure and best improve wellbeing. But before we attempt to influence how large amounts of resources are spent, I think we should be confident that our advice is sound.
The institution that I’m most interested in changing is academia. I think:
Wellbeing science should be a field.
There should be academic incentives to publish papers about wellbeing priorities and cost-effectiveness analyses.
The reasoning is that: if academia produced more high quality cost-effectiveness analyses of interventions and charities, HLI would be able to spend less time doing that research, and more time figuring out what that research implies about our priorities.
This also strikes me as likely more valuable than the average project of those who tend to work on wellbeing related topics.
It doesn’t seem implausible that we need to wait (or accelerate) research towards favoring best practices before we make strong, high stakes recommendations.
Following up on that last point, Folk and Dunn (2023) reviewed the power and pre-registration status of research on the 5 most popular recommendations in media for individuals to increase their happiness. The, results, captured in the figure below, are humbling.
That said, there is an organization that attempts to popularize and disseminate finding from wellbeing literature: https://actionforhappiness.org. We haven’t evaluated them yet, but they’re on our long list. I expect they’ll be challenging to evaluate.
I disagree because I think writing text to indicate a sentiment is a stronger signal than pressing a button. So while it’s somewhat redundant, it adds new information IMO.
As a writer, I pay attention to these signals when processing feedback.
Hi again Jason,
When we said “Excluding outliers is thought sensible practice here; two related meta-analyses, Cuijpers et al., 2020c; Tong et al., 2023, used a similar approach”—I can see that what we meant by “similar approach” was unclear. We meant that, conditional on removing outliers, they identify a similar or greater range of effect sizes as outliers as we do.
This was primarily meant to address the question raised by Gregory about whether to include outliers: “The cut data by and large doesn’t look visually ‘outlying’ to me.”
To rephrase, I think that Cuijpers et al. and Tong et al. would agree that the data we cut looks outlying. Obviously, this is a milder claim than our comment could be interpreted as making.
Turning to wider implications of these meta-analyses, As you rightly point out, they don’t have a “preferred specification” and are mostly presenting the options for doing the analysis. They present analyses with and without outlier removal in their main analysis, and they adjust for publication bias without outliers removed (which is not what we do). The first analytic choice doesn’t clearly support including or excluding outliers, and the second – if it supports any option, favors Greg’s proposed approach of correcting for publication bias without outliers removed.
I think one takeaway is that we should consider surveying the literature and some experts in the field, in a non-leading way, about what choices they’d make if they didn’t have “the luxury of not having to reach a conclusion”.
I think it seems plausible to give some weight to analyses with and without excluding outliers – if we are able find a reasonable way to treat the 2 out of 7 publication bias correction methods that produce the results suggesting that the effect of psychotherapy is in fact sizably negative. We’ll look into this more before our next update.
Cutting the outliers here was part of our first pass attempt at minimising the influence of dubious effects, which we’ll follow up with a Risk of Bias analysis in the next version. Our working assumption was that effects greater than ~ 2 standard deviations are suspect on theoretical grounds (that is, if they behave anything like SDs in an normal distribution), and seemed more likely to be the result of some error-generating process (e.g. data-entry error, bias) than a genuine effect.
We’ll look into this more in our next pass, but for this version we felt outlier removal was the most sensible choice.
- 10 Dec 2023 21:20 UTC; 7 points) 's comment on Talking through depression: The cost-effectiveness of psychotherapy in LMICs, revised and expanded by (
Hi Jason,
“Would it have been better to start with a stipulated prior based on evidence of short-course general-purpose[1] psychotherapy’s effect size generally, update that prior based on the LMIC data, and then update that on charity-specific data?”
1. To your first point, I think adding another layer of priors is a plausible way to do things – but given the effects of psychotherapy in general appear to be similar to the estimates we come up with[1] – it’s not clear how much this would change our estimates.
There are probably two issues with using HIC RCTs as a prior. First, incentives that could bias results probably differ across countries. I’m not sure how this would pan out. Second, in HICs, the control group (“treatment as usual”) is probably a lot better off. In a HIC RCT, there’s not much you can do to stop someone in the control group of a psychotherapy trial to go get prescribed antidepressants. However, the standard of care in LMICs is much lower (antidepressants typically aren’t an option), so we shouldn’t be terribly surprised if control groups appear to do worse (and the treatment effect is thus larger).
“To my not-very-well-trained eyes, one hint to me that there’s an issue with application of Bayesian analysis here is the failure of the LMIC effect-size model to come anywhere close to predicting the effect size suggested by the SM-specific evidence.”
2. To your second point, does our model predict charity specific effects?
In general, I think it’s a fair test of a model to say it should do a reasonable job at predicting new observations. We can’t yet discuss the forthcoming StrongMinds RCT – we will know how well our model works at predicting that RCT when it’s released, but for the Friendship Bench (FB) situation, it is true that we predict a considerably lower effect for FB than the FB-specific evidence would suggest. But this is in part because we’re using a combination of charity specific evidence to inform our prior and the data. Let me explain.
We have two sources of charity specific evidence. First, we have the RCTs, which are based on a charity programme but not as it’s deployed at scale. Second, we have monitoring and evaluation data, which can show how well the charity intervention is implemented in the real world. We don’t have a psychotherapy charity at present that has RCT evidence of the programme as it’s deployed in the real world. This matters because I think placing a very high weight on the charity-specific evidence would require that it has a high ecological validity. While the ecological validity of these RCTs is obviously higher than the average study, we still think it’s limited. I’ll explain our concern with FB.
For Friendship Bench, the most recent RCT (Haas et al. 2023, n = 516) reports an attendance rate of around 90% to psychotherapy sessions, but the Friendship Bench M&E data reports an attendance rate more like 30%. We discuss this in Section 8 of the report.
So for the Friendship Bench case we have a couple reasonable quality RCTs for Friendship Bench, but it seems like, based on the M&E data, that something is wrong with the implementation. This evidence of lower implementation quality should be adjusted for, which we do. But we include this adjustment in the prior. So we’re injecting charity specific evidence into both the prior and the data. Note that this is part of the reason why we don’t think it’s wild to place a decent amount of weight on the prior. This is something we should probably clean up in a future version.
We can’t discuss the details of the Baird et al. RCT until it’s published, but we think there may be an analogous situation to Friendship Bench where the RCT and M&E data tell conflicting stories about implementation quality.
This is all to say, judging how well our predictions fair when predicting the charity specific effects isn’t clearly straightforward, since we are trying to predict the effects of the charity as it is actually implemented (something we don’t directly observe), not simply the effects from an RCT.
If we try and predict the RCT effects for Friendship Bench (which have much higher attendance than the “real” programme), then the gap between the predicted RCT effects and actual RCT effects is much smaller, but still suggests that we can’t completely explain why the Friendship Bench RCTs find their large effects.
So, we think the error in our prediction isn’t quite as bad as it seems if we’re predicting the RCTs, and stems in large part from the fact that we are actually predicting the charity implementation.
- ^
Cuijpers et al. 2023 finds an effect of psychotherapy of 0.49 SDs for studies with low RoB in low, middle, and high income countries (comparisons = 218#), and Tong et al. 2023 find an effect of 0.69 SDs for studies with low RoB in non-western countries (primarily low and middle income; comparisons = 36). Our estimate of the initial effect is 0.70 SDs (before publication bias adjustments). The results tend to be lower (between 0.27 and 0.57, or 0.42 and 0.60) SDs when the authors of the meta-analyses correct for publication bias. In both meta-analyses (Tong et al. and Cuijpers et al.) the authors present the effects after using three publication bias corrected methods: trim-and-fill (0.6; 0.38 SDs), a limit meta-analysis (0.42; 0.28 SDs), and using a selection model (0.49; 0.57 SDs). If we averaged their publication bias corrected results (which they did without removing outliers beforehand) the estimated effect of psychotherapy would be 0.5 SDs and 0.41 for the two meta-analyses. Our estimate of the initial effect (which is most comparable to these meta-analyses), after removing outliers is 0.70 SDs, and our publication bias correction is 36%, implying that we estimate our initial effect to be 0.46 SDs. You can play around with the data they use on the metapsy website.
- ^
Hi Victor,
Our updated operationalization of psychotherapy we use in our new report (page 12) is
“For the purposes of this review, we defined psychotherapy as an intervention with a structured, face-to-face talk format, grounded in an accepted and plausible psychological theory, and delivered by someone with some level of training. We excluded interventions where psychotherapy was one of several components in a programme.”
So basically this is “psychotherapy delivered to groups or individuals by anyone with some amount of training”.
Does that clarify things?
Also, you should be able to use our new model to calculate the WELLBYs of 1 to 1 more traditional psychotherapy since we include studies with 1 to 1 in our model. Friendship Bench, for instance, uses that model (albeit with lay mental health workers with relatively brief training). Note that in this update our findings about group versus individual therapy has reversed and we now find 1 to 1 is more effective than group delivery (page 33). This is a bit of a puzzle since it disagrees somewhat with the broader literature, but we haven’t had time to look into this further.
They only include costs to the legal entity of StrongMinds. To my understanding, this includes a relatively generous stipend they provide to the community health workers and teachers that are “volunteering” to provide StrongMinds or grants StrongMinds makes to NGOs to support their delivery of StrongMinds programs.
Note that 61% of their partnership treatments are through these volunteer+ arrangements with community health workers and teachers. I’m not too worried about this since I’m pretty sure there aren’t meaningful additional costs to consider. these partnership treatments appear to be based on individual CHWs and teachers opting in. I also don’t think that the delivery of psychotherapy is meaningfully leading them to do less of their core health or educational work.
I’d be more concerned if these treatments were happening because a higher authority (say school administrators) was saying “Instead of teaching, you’ll be delivering therapy”. The costs to deliver therapy could then reasonably seen to include the teacher’s time and the decrease in teaching they’d do.
But what about the remaining 39% of partnerships (representing 24% of total treatments)? These are through NGOs. I think that 40% of these are delivered because StrongMinds is giving grants to these NGOs to deliver therapy in areas that StrongMinds can’t reach for various reasons. The other 60% of NGO cases appears to be instances where the NGO is paying StrongMinds to train it to deliver psychotherapy. The case for causally attributing these cases of treatment to StrongMinds seems more dubious here, and I haven’t gotten all the information I’d like, so to be conservative I assumed that none of these cases StrongMinds claims as its are attributable to it. This increases the costs by around 14%[1] because it’s reducing the total number treated by around 14%.
Some preemptive hedging: I think my approach so far is reasonable, but my world wouldn’t be rocked if I was later convinced this isn’t quite the way to think about incorporating costs in a situation with more decentralized delivery and more unclear causal attribution for treatment.
- ^
But 1.14 * 59 is 67 not 63! Indeed. The cost we report is lower than $67 because we include an offsetting 7.4% discount to the costs to harmonize the cost figures of StrongMinds (which are more stringent about who counts as treated—more than half of sessions completed are required) with Friendship Bench (who count anyone as treated as receiving at least 1 session). So 59 * (1 − 0.074) * 1.14 is $63. See page 69 of the report for the section where we discuss this.
- ^
Hi Nick,
Good question. I haven’t dug into this in depth, so consider this primarily my understanding of the story. I haven’t gone through an itemized breakdown of StrongMinds costs on a year by year basis to investigate this further.
It is a big drop from our previous costs. But I originally did the research in Spring 2021, when 2020 was the last full year. That was a year with unusually high costs. I didn’t use those costs because I assumed this was mostly a pandemic related aberration, but I wasn’t sure how long they’d keep the more expensive practices like teletherapy they started during COVID (programmes can be sticky). But they paused their expensive teletherapy programme this year because of cost concerns (p. 5).
So $63 is a big change from $170, but a smaller change from $109 -- their pre-COVID costs.
What else accounts for the drop though? I think “scale” seems like a plausible explanation. The first part of the story is fixed / overhead costs being spread over a larger number of people treated with variable (per person) costs remaining stable. StrongMinds spends at least $1 million on overhead costs (office, salaries, etc). The more people are treated, the lower the per person costs (all else equal). The second part of the story is that I think it’s plausible that variable costs (i.e., training and supporting the person delivering the therapy) are also decreasing. They’ve also shifted towards moving away from staff-centric delivery model to using more volunteers (e.g., community health workers), which likely depresses costs somewhat further. We discuss their scaling strategy and the complexities it introduces into our analysis a bit more around page 70 of the report.
Below I’ve attached StrongMinds most recent reporting about their number treated and cost per person treated, which gives a decent overall picture for how the costs and the number treated have changed over time.
Hi Jason,
The bars for AMF in Figure 2 should represent the range of cost-effectiveness estimates that come from inputting different neutral points, and for TRIA the age of connectedness.
This differs from the values given in Table 25 on page 80 because, as we note below that table, the values there are based on assuming a neutral point of 2 and an TRIA age of connectedness of 15.
The bar also differs for the range given in Figure 13 on page 83 because the lowest TRIA value has an age of connectivity of 5 years, where in Figure 2 (here) we allow it to go as low as 2 years I believe[1].
I see that the footnote explaining this is broken, I’ll fix that.
- ^
When I plug (Deprivationism, neutral point =0; TRIA, neutral point =0, age of connectedness = 2) into our calculator it spits out a cost-effectiveness of 91 WELLBYs per $1,000 for deprivationism and 78 for TRIA (age of connectivity = 2) -- this appears to match the upper end of the bar.
- ^
Talking through depression: The cost-effectiveness of psychotherapy in LMICs, revised and expanded
Neat work. I wouldn’t be surprised if this ends up positively updating my view on the cost-effectiveness of advocacy work.
What’s your take on possibility someone could empirically tackle a related issue we also tend to do a lot of guessing at—the likelihood of $X million spent advocacy in a certain domain leading to reform.
[Question] High impact charities in Gaza or Israel?
The prospect of a nuclear conflict is so terrifying I sometimes think we should be willing to pay almost any price to prevent such a possibility.
But when I think of withdrawing support for Ukraine or Taiwan to reduce the likelihood of nuclear war, that doesn’t seem right either—as it’d signal that we could be threatened into any concession if nuclear threats were sufficiently credible.
How would you suggest policymakers navigate such terrible tradeoffs?
How much do you think the risk of nuclear war would increase over the century if Iran acquired nuclear weapons? And what measures, if any, do you think are appropriate to attempt to prevent this or other examples of nuclear proliferation?
note that a large portion of Somaliland appears occupied by rebels at the moment. But other than that it has indeed been much more peaceful.
“Thank you for the comment. There’s a lot here. Could you highlight what you think the main takeaway is? I don’t have time to dig into this at present, so any condensing would be appreciated. Thanks again for the time and effort.” ??
I believe that large tech companies are, on average, more efficient at converting talent into market cap value than small companies or startups are. They typically offer higher salaries, for one.
This may be true for market cap, but let’s be careful when translating this to do-goodery. E.g., wages don’t necessarily relate to productivity. Higher wages could also reflect higher rents, which seems plausibly self-reinforcing by drawing (and shelving) innovative talent from smaller firms. A quote from a recent paper by Akcigit and Goldschlag (2023) is suggestive:
“when an inventor is hired by an incumbent, compared to a young firm, their earnings increase by 12.6 percent and their innovative output declines by 6 to 11 percent.”
I don’t have a good grasp of the literature. Still, the impression I got hanging around economists interested in innovation during my PhD led me to believe the opposite: that smaller firms were more innovative than larger firms, and the increasing size of firms over the past few decades is a leading candidate for explaining the decline in productivity and innovation.
Speaking from my own experience working in a tiny research organisation, I wish I could have started as a researcher with the structure and guidance of a larger organization, but I really doubt I’d have pursued as important research if we hadn’t tried to challenge other, larger organizations. Do you feel differently with QURI?
I don’t think this is right- “Russia” doesn’t make actions, Vladimir Putin does; Putin is 70, so he seems unlikely to be in power once Russia has recovered from the current war; there’s some evidence that other Russian elites didn’t actively want the war, so I don’t think it’s right to generalize to “Russia”.
Even if it was true that many elites were anti-war before the invasion, I think the war has probably accelerated a preexisting process of ideological purification. So even when Putin kicks the can, I think the elites will be just as likely to say “We didn’t go far enough” than “We went too far”. I expect at least some continuity in the willingness to go to war by Putin’s successor.
A US-China war would be fought almost entirely in the air and sea; Ukraine is fighting almost entirely on land. The weapons Ukraine has receive are mostly irrelevant for a potential US-China war; e.g. the Marines have already decided to stop using tanks entirely, and the US being capable of shipping the vast amounts of artillery ammunition being consumed in Ukraine to a combat zone would require the US-China war to already be essentially won.
Why would it be fought almost entirely in the air and sea? That sounds like a best or worst-case scenario, i.e., China isn’t able to actually land or China achieves air and naval superiority around Taiwan. The advanced weapons systems Ukraine has received seem very relevant: Storm shadow, HIMARs, Abrams + Leoopard, Patriot, Javelin, etc. And shipping weapons doesn’t seem to require the war to be essentially won, just that the US can achieve local air and naval superiority over part of Taiwan with a harbour. Complete dominance of the skies in a conflict is rare.
Weapons being sent to Ukraine are from drawdown stocks, which Taiwan itself hasn’t previously been eligible to receive. Taiwan instead purchases new weapons, but there are many, many other countries purchasing similar types of weapons, and if the US were to become concerned, I’d expect it to prioritize both Ukraine and Taiwan over e.g. Saudi Arabia or Egypt.
My concern is that these US stocks seem to be regenerating very, very slowly.
Sounds about right. 10 minutes means bad life. 5 minutes means still life worth living.
Good to know. It mostly makes a difference for humans and cows coming from ~0 to ~5% of disability in your model it seems?
And I appreciate you continuing to bang the drum for animal welfare stuff. It’s made me think about it more.