I’m going to leave aside discussing HLI here. Whilst I think they have some of the deepest analysis of StrongMinds, I am still confused by some of their methodology, it’s not clear to me what their relationship to StrongMinds is. I plan on going into more detail there in future posts. The key thing to understand about the HLI methodology is that follows the same structure as the Founders Pledge analysis and so all the problems I mention above regarding data apply just as much to them as FP.
Thanks for writing this Simon. I’m always pleased to see people scrutinising StrongMinds because it helps us all to build a better understanding of the most cost-effective ways to address the huge (and severely neglected) burden of disease from mental health conditions.
HLI’s researchers are currently enjoying some well-deserved holiday but they’ll be back next week and will respond in more detail then. In the meantime, I want to recommend the following resources (and discussion) for people reading this post:
I also want to clarify two things related to the quote above:
HLI’s relationship with StrongMinds is no different to GiveWell’s relationship with the charities they recommend. We are separate organisations and HLI’s evaluation of StrongMinds is entirely independent.
HLI’s methodology follows a meta-analytic approach. We don’t take the results from StrongMinds’ own trials at face value. We explain this further here.
HLI’s relationship with StrongMinds is no different to GiveWell’s relationship with the charities they recommend.
From an outside view, I see Happier Lives Institute as an advocacy organisation for mental health interventions, although I can imagine HLI see themselves as a research organisation working on communicating the effectiveness of mental health interventions. Ultimately, I am not sure there’s a lot distinguishing these roles.
Givewell however, is primarily a research and donor advisory organisation. Unlike HLI, it does not favour a particular intervention, or pioneer new metrics in support of said interventions.
Some things that HLI does that makes me think HLI is an advocacy org:
I agree with all of these reasons. My other reasons for being unclear as to the relationship is the (to my eye) cynical timing and aggressive comparisons published annually during peak giving season.
StrongMinds vs Worms (Dec 22)
StrongMinds vs Betnets (Nov 22)
StrongMinds vs Cash transfers 2 (Nov 21)
StrongMinds vs Cash transfers (Oct 21)
Last year when this happened I thought it was a coincidence, twice is enemy action.
(Edit: I didn’t mean to imply that HLI is an “enemy” in some sense, it’s just a turn-of-phrase)
It’s helpful to know why you thought the relationship was unclear.
But I don’t think us (HLI) publishing research during the giving season is “cynical timing” any more than you publishing this piece when many people from GWWC, FP, and HLI are on vacation is “cynical timing”.
When you’re an organization without guaranteed funding, it seems strategic to try to make yourself salient to people when they reach for their pocketbooks. I don’t see that as cynical.
FWIW, the explanation is rather mundane: the giving season acts as hard deadline which pushes us to finish our reports.
To add to this, even if it were timed, I don’t think that timing the publication outputs to coincide with peak giving season will necessarily differentiate between a funding-constrained research organisation and a funding-constrained advocacy organisation, if both groups think that peak giving season will lead to more donations that are instrumentally useful for their goals.
you publishing this piece when many people from GWWC, FP, and HLI are on vacation
I think the reason I’m publishing it now is because it’s when I’m on vacation! (But yes, that’s a fair point).
I think the timing makes sense for HLI, but given how adverserial the articles come across (to me) it seems like they are trying to shift funding away from [generic top charity] to StrongMinds, which is why it seems to me it’s more about StrongMinds than HLI. I expect HLI could get just as much salence publishing about bednets on their own at that time than adding the comparison to StrongMinds. (Not sure about this though, but it does seem like the strategy seems to involve generating lots of heat rather than light)
FWIW, the explanation is rather mundane: the giving season acts as hard deadline which pushes us to finish our reports.
Yes, that does make sense (and probably is about as mundane as my reason for publishing whilst GWWC, FP and HLI are on vacation)
I think the reason I’m publishing it now is because it’s when I’m on vacation! (But yes, that’s a fair point).
To be clear, that’s what I meant to imply—I assumed you published this when you had time, not because the guards were asleep.
I think the timing makes sense for HLI, but given how adverserial the articles come across (to me) it seems like they are trying to shift funding away from [generic top charity] to StrongMinds, which is why it seems to me it’s more about StrongMinds than HLI.
Everything is compared to StrongMinds because that’s what our models currently say is best. When (and I expect it’s only a matter of when) something else takes StrongMinds’ place, we will compare the charities we review to that one. The point is to frame the charities we review in terms of how they compare to our current best bet. I guess this is an alternative to putting everything in terms of GiveDirectly cash transfers—which IMO would generate less heat and light.
Everything is compared to StrongMinds because that’s what our models currently say is best. [...] I guess this is an alternative to putting everything in terms of GiveDirectly cash transfers—which IMO would generate less heat and light.
GW compares everything to GiveDirectly (which isn’t considered their best charity). I like that approach because:
Giving people cash is really easy to understand
It’s high capacity
It’s not a moving target (unlike say worms or betnets which changes all the time based on how the charities are executing)
I think for HLI (at their current stage) everthing is going to be a moving target (because there’s so much uncertainty about the WELLBY effect of every action) but I’d rather have only one moving target rather than two.
FWIW, I’m not unsympathetic to comparing everything to GiveDirectly CTs, and this is probably something we will (continue to) discuss internally at HLI.
I’m seeing a lot of accusations flying around in this thread (e.g. cynical, aggressive, enemy action, secret info etc.). This doesn’t strike me as a ‘scout mindset’ and I was glad to see Bruce’s comment that “it’s important to recognise that everyone here does share the same overarching goal of “how do we do good better”.
HLI has always been transparent about our goals and future plans. The front page of our website seems clear to me:
The Happier Lives Institute connects donors, researchers, and policymakers with the most cost-effective opportunities to increase global wellbeing.
Our recommended charity for 2022 is StrongMinds, a non-profit providing cheap, effective treatment for women struggling with depression in Uganda and Zambia.
Our research agenda is also very clear about our priorities:
Area 1: Foundational research into the measurement of wellbeing Area 2: Applied research to identify and evaluate the most cost-effective ways to increase wellbeing Area 3: Understanding the wider global priorities context
And our 2022 charity recommendation post makes it clear that we plan to investigate a wider range of interventions and charities in 2023:
So far, we’ve looked at four ‘micro-interventions’, those where you help one person at a time. These were all in low-income countries. However, it’s highly unlikely that we’ve found the best ways to improve happiness already. Phase 2 is to expand our search.
We already have a pipeline of promising charities and interventions to analyse next year:
Interventions we want to evaluate: air pollution, child development, digital mental health, and surgery for cataracts and fistula repair.
We have a long list of charities and interventions that we won’t get to in 2023 but plan to examine in future years. Eventually, we plan to consider systemic interventions and policy reforms that could affect the wellbeing of larger populations at all income levels. There’s plenty to do!
My role as Communications Manager is to communicate the findings from our research to decision-makers to help them allocate their resources more effectively. There’s nothing suspicious about doing that in Giving Season. That’s what all the charity evaluators do.
We only recommend one charity because StrongMinds is the most cost-effective charity we’ve identified (so far) and they have a $20m funding gap which is very unlikely to be filled in this giving season. GiveWell has a lot more money to allocate so they have to find multiple charities with room for more funding. I hope that HLI will face (and solve) that problem in the future too!
In my personal opinion, GiveWell has been hugely successful and inspirational but it’s clear that their methodology cannot handle interventions that have benefits beyond health and wealth. That’s why HLI is bringing the WELLBY methodology from the academic and policy world into the global health field. It’s the same reason that Open Philanthropy ran an essay prize to find suggestions for measuring non-health, non-pecuniary benefits. Our entry to that competition set out the pros AND the cons of the WELLBY approach as well as our plans for further foundational research on subjective wellbeing measures.
There’s a lot more I could say, but this comment is already getting too long. The key thing I want to get across is that if you (the reader) are confused about HLI’s mission, strategy, or research findings, then please talk to us. I’m always happy to talk to people about HLI’s work on a call or via email.
This is helpful to know how we come across. Id encourage people to disagree or agree with Elliots comment as a straw poll on how readers perceptions of HLI accord with that characterization.
p.s. I think you meant to write “HLI” instead of “FHI”.
I agreed with Elliott’s comment, but for a somewhat different reason that I thought might be worth sharing. The “Don’t just give well, give WELLBYs” post gave me a clear feeling that HLI was trying to position itself as the Happiness/Well-Being GiveWell, including by promoting StrongMinds as more effective than programs run by classic GW top charities. A skim of HLI’s website gives me the same impression, although somewhat less strongly than that post.
The problem as I see it is that when you set GiveWell up as your comparison point, people are likely to expect a GiveWell-type balance in your presentation (and I think that expectation is generally reasonable). For instance, when GiveWell had deworming programs as a top charity option, it was pretty clear to me within a few minutes of reading their material that the evidence base for this intervention had some issues and its top-charity status was based on a huge potential upside-for-cost. When GiveWell had standout charities, it was very clear that the depth of research and investigation behind those programs was roughly an order of magnitude or so less than for the top charities. Although I didn’t read everything on HLI’s website, I did not walk away with the impression that the methodological weaknesses discussed in this and other threads were disclosed and discussed very much (or nearly as much as I would expect GiveWell to have done in analogous circumstances).
The fact that HLI seems to be consciously positioning itself as in the GiveWellian tradition yet lacks this balance in its presentations is, I think, what gives off the “advocacy organisation” vibes to me. (Of course, its not reasonable for anyone to expect HLI to have done the level of vetting that GiveWell has done for its top charities—so I don’t mean to suggest the lesser degree of vetting at this point is the issue.)
“Happiness/Wellbeing GiveWell” is a fair description of HLI in my opinion. However, I want to push back on your claim that GiveWell is more open and balanced.
As far as I can tell, there is nothing new in Simon’s post or subsequent comments that we haven’t already discussed in our psychotherapy and StrongMinds cost-effectiveness analyses. I’m looking forward to reading his future blog post on our analysis and I’m glad it’s being subjected to external scrutiny.
Whereas, GiveWell acknowledge they need to improve their reasoning transparency:
Where we’d like to improve on reasoning transparency
We also agree with HLI that we have room for improvement on explaining our cost-effectiveness models. The decision about how to model whether benefits decline is an example of that—the reasoning I outlined above isn’t on our website. We only wrote, “the KLPS 4 results are smaller in magnitude (on a percentage increase basis) and higher variance than earlier survey rounds.”
We plan to update our website to make it clearer what key judgment calls are driving our cost-effectiveness estimates, why we’ve chosen specific parameters or made key assumptions, and how we’ve prioritized research questions that could potentially change our bottom line.
That’s just my opinion though and I don’t want to get into a debate about it here. Instead, I think we should all wait for GWWC to complete their independent evaluation of evaluators before drawing any strong conclusions about the relative strengths and weaknesses of the GiveWell and HLI methodologies.
To clarify, the bar I am suggesting here is something like: “After engaging with the recommender’s donor-facing materials about the recommended charity for 7-10 minutes, most potential donors should have a solid understanding of the quality of evidence and degree of uncertainty behind the recommendation; this will often include at least a brief mention of any major technical issues that might significantly alter the decision of a significant number of donors.”
Information in a CEA does not affect my evaluation of this bar very much. For qualify in my mind as “primarily a research and donor advisory organisation” (to use Elliot’s terminology), the organization should be communicating balanced information about evidence quality and degree of uncertainty fairly early in the donor-communication process. It’s not enough that the underlying information can be found somewhere in 77 pages of the CEAs you linked.
To analogize, if I were looking for information about a prescription drug, and visited a website I thought was patient-advisory rather than advocacy, I would expect to see a fair discussion of major risks and downsides within the first ten minutes of patient-friendly material rather than being only in the prescribing information (which, like the CEA, is a technical document).
I recognize that meeting the bar I suggested above will require HLI to communicate more doubt about that GiveWell needs to communicate about its four currently recommended charities; that is an unavoidable effect of the fact that GiveWell has had many years and millions of dollars to target the major sources of doubt on those interventions as applied to their effectiveness metrics, and HLI has not.
I want to close by affirming that HLI is asking important questions, and that there is real value in not being too tied to a single evaluator or evaluation methodology. That’s why I (and I assume others) took the time to write what I think is actionable feedback on how HLI can better present itself as a donor-advisory organization and send off fewer “advocacy group” vibes. So none of this is intended as a broad criticism of HLI’s existence. Rather, it is specifically about my perception that HLI is not adequately communicating information about evidence quality and degree of uncertainty in medium-form communications to donors.
I read this comment as implying that HLI’s reasoning transparency is currently better than Givewell’s, and think that this is both:
False.
Not the sort of thing it is reasonable to bring up before immediately hiding behind “that’s just my opinion and I don’t want to get into a debate about it here”.
I therefore downvoted, as well as disagree voting. I don’t think downvotes always need comments, but this one seemed worth explaining as the comment contains several statements people might reasonably disagree with.
Thanks for explaining your reasoning for the downvote.
I don’t expect everyone to agree with my comment but if you think it is false then you should explain why you think that. I value all feedback on how HLI can improve our reasoning transparency.
However, like I said, I’m going to wait for GWWC’s evaluation before expressing any further personal opinions on this matter.
I think an outsider may reasonably get the impression that HLI thinks its value is correlated with their ability to showcase the effectiveness of mental health charities, or of WELLBYs as an alternate metric to cause prioritisation. It might also be the case that HLI believes this, based on their published approach, which seems to assume that 1) happiness is what ultimately matters and 2) subjective wellbeing scores are the best way of measuring this. But I don’t personally think this is the case—I think the main value of an organisation like HLI is to help the GH research community work out the extent to which SWB scores are valuable in cause prioritisation, and how we best integrate these with existing measures (or indeed, replace them if appropriate). In a world where HLI works out that WELLBYs actually aren’t the best way of measuring SWB, or that actually we should weigh DALYs to SWB at a 1:5 ratio or a 4:1 ratio instead of replacing existing measures wholesale or disregarding them entirely, I’d still see these research conclusions as highly valuable (even if the money shifted metric might not be similarly high). And I think these should be possibilities that HLI remain open to in its research and considers in its theory of change going forward—though this is based mainly from a truth-seeking / epistemics perspective, and not because I have a deep knowledge of the SWB / happiness literature to have a well-formed view on this (though my sense is that it’s also not a settled question). I’m not suggesting that HLI is not already considering this or doing this, just that from reading the HLI website / published comments, it’s hard to clearly tell that this is the case (and I haven’t looked through the entire website, so I may have missed it).
====== Longer:
I think some things that may support Elliot’s views here:
HLI was founded with the mission of finding something better than GiveWell top charities under a subjective wellbeing (SWB) method. That means it’s beneficial for HLI in terms of achieving its phase 1 goal and mission that StrongMinds is highly effective. GiveWell doesn’t have this pressure of finding something better than it’s current best charities (or not to the same degree).
HLI’s investigation of various mental health programmes lead to its strong endorsement for StrongMinds. This was in part based on StrongMinds being the only organisation on HLI’s shortlist (of 13 orgs) to respond and engage with HLI’s request for information. Two potential scenarios for this:
HLI’s hypothesis that mental health charities are systematically undervalued is right, and thus, it’s not necessarily that StrongMinds is uniquely good (acknowledged by HLI here), but the very best mental health charities are all better than non-mental health charities under WELLBYs measurements, which is HLI’s preferred approach RE: “how to do the most good”. However this might bump up against priors or base rates or views around how good mental health charities on HLI’s shortlist might be vs existing GiveWell charities are as comparisons, whether all of global health prioritisation, aid or EA aid has been getting things wrong and we are in need of a paradigm shift, as well as whether WELLBYs and SWB scores alone should be a sufficient metric for “doing the most good”.
Mental health charities are not systematically undervalued, and current aid / EA global health work isn’t in need of a huge paradigm shift, but StrongMinds is uniquely good, and HLI were fortunate that the one that responded happened to be the one that responded. However, if an outsider’s priors on the effectiveness of good mental health interventions generally are much lower than HLI’s, it might seem like this result is very fortuitous for HLI’s mission and goals. On the other hand, there are some reasons to think they might be at least somewhat correlated:
well-run organisations are more likely to have capacity to respond to outside requests for information
organisations with good numbers are more likely to share their numbers etc
HLI have never published any conclusions that are net harmful for WELLBYs or mental health interventions. Depending on how much an outsider thinks GiveWell is wrong here, they might expect GiveWell to be wrong in different directions, and not only in one direction. Some pushback: HLI is young, and would reasonably focus on organisations that is most likely to be successful and most likely to change GiveWell funding priorities. These results are also what you’d expect if GiveWell IS in fact wrong on how charities should be measured.
I think ultimately the combination could contribute to an outsider’s uncertainty around whether they can take HLI’s conclusions at face value, or whether they believe these are the result of an unbiased search optimising for truth-seeking, e.g. if they don’t know who HLI researchers are or don’t have any reason to trust them beyond what they see from HLI’s outputs.
Some important disclaimers:
-All of these discussions are made possible because of HLI (and SM)’s transparency, which should be acknowledged.
-It seems much harder to defend against claims that paints HLI as an “advocacy org” or suggests motivated reasoning etc than to make the claim. It’s also the case that these findings are consistent with what we would expect if the claims 1) “WELLBYs or subjective wellbeing score alone is the best metric for ‘doing the most good’” and 2) “Existing metrics systematically undervalue mental health charities” are true, and HLI is taking a dispassionate, unbiased view towards this. All I’m saying is that an outsider might prefer to not default to believing this.
-It’s hard to be in a position to be challenging the status quo, in a community where reputation is important, and the status quo is highly trusted. Ultimately, I think this kind of work is worth doing, and I’m happy to see this level of engagement and hope it continues in the future.
-Lastly, I don’t want this message (or any of my other messages) to be interpreted to be an attack on HLI itself. For example, I found HLI’sDeworming and decay: replicating GiveWell’s cost-effectiveness analysis to be very helpful and valuable. I personally am excited about more work on subjective wellbeing measures generally (though I’m less certain if I’d personally subscribe to HLI’s founding beliefs), and I think this is a valuable niche in the EA research ecosystem. I also think it’s easy for these conversations to accidentally become too adversarial, and it’s important to recognise that everyone here does share the same overarching goal of “how do we do good better”.
Thanks—I had looked at the HLI research and I do have a bunch of issues with the analysis (both presentation and research). My biggest issue at the moment is I can’t join up the dots between:
“a universal metric called wellbeing-adjusted life years (WELLBYs). One WELLBY is equivalent to a 1-point increase on a 0-10 life satisfaction scale for one year” (here)
“First, we define a ΔWELLBY to denote a one SD change in wellbeing lasting for one year” (Appendix D here)
In all the HLI research, everything seems to be calculated in the latter terms, which isn’t something meaningful at all (to the best of my understanding). The standard deviations you are using aren’t some global “variance in subjective well-being” but a the sample variance of subjective well-being which going to be materially lower. It’s also not clear to me that this is even a meaningful quantity. Especially when your metric for subjective well-being is a mental health survey in which a mentally healthy person in San Franscisco would answer the same as a mentally healthy person in the most acute poverty.
Hi Simon, I’m one of the authors of HLI’s cost-effectiveness analysis of psychotherapy and StrongMinds. I’ll be able to engage more when I return from vacation next week.
I see why there could be some confusion there. Regarding the two specifications of WELLBYs, the latter was unique to that appendix, and we consider the first specification to be conventional. In an attempt to avoid this confusion, we denoted all the effects as changes in ‘SDs’ or ‘SD-years’ of subjective wellbeing / affective mental health in all the reports (1,2,3,4,5) that were direct results in the intervention comparison.
Regarding whether these changes are “meaningful at all”, -- it’s unclear what you mean. Which of the following are you concerned with?
That standard deviation differences (I.e., Cohen’s d or Hedges g effect sizes) are reasonable ways to do meta-analyses?
Or is your concern more that even if SDs are reasonable for meta-analyses, they aren’t appropriate for comparing the effectiveness of interventions? We flag some possible concerns in Section 7 of the psychotherapy report. But we haven’t found sufficient evidence after several shallow dives to change our minds.
Or, you may be concerned that similar changes in subjective wellbeing and affective mental health don’t represent similar changes in wellbeing? (We discuss this in Appendix A of the psychotherapy report).
Or is it something else I haven’t articulated?
Most of these issues are technical, and we recognise that our views could change with further work. However, we aren’t convinced there’s a ready-to-use method that is a better alternative for use with subjective wellbeing analyses.
I also welcome further explanation of your issues with our analysis, public or private. If you’d like to have low stakes chat about our work, you can schedule a time here. If that doesn’t work, email or message me, and we can make something work.
In an attempt to avoid this confusion, we denoted all the effects as changes in ‘SDs’ or ‘SD-years’ of subjective wellbeing / affective mental health in all the reports (1,2,3,4,5) that were direct results in the intervention comparison.
This is exactly what confused me. In all the analytical pieces (and places linked to in the reports defining WELLBY on the 0-10 scale) you use SD but then there’s a chart which uses WELLBY and I couldn’t find where you convert from one to another.
That standard deviation differences (I.e., Cohen’s d or Hedges g effect sizes) are reasonable ways to do meta-analyses?
I think this is a very reasonable way to do meta-analyses
Or is your concern more that even if SDs are reasonable for meta-analyses, they aren’t appropriate for comparing the effectiveness of interventions? We flag some possible concerns in Section 7 of the psychotherapy report. But we haven’t found sufficient evidence after several shallow dives to change our minds.
Yes. This is exactly my confusion, specifically:
A potential issue with using SD changes is that the mental health (MH) scores for recipients of different programmes might have different size standard deviations – e.g. SD could be 15 for cash transfers and 20 for psychotherapy, on a given mental health scale. We currently do not have much evidence on this. If we had more time we would test and adjust for any bias stemming from differences in variances of psychological distress between intervention samples by comparing the average SD for equivalent measures across intervention samples
In the absence of evidence my prior is very strong that a group of people selected to have a certain level of depression is going to have a lower SD than a group of randomly sampled people. This is exactly my confusion. Furthermore, I would expect the SD of “generally healthy people” to be quite low and interventions to have low impact. For example, giving a health person an PS5 for Christmas might massively boost their subjective well-being, but probably doen’t do much for mental health. (This is related to your third point, but is more about the magnitude of changes I’d expect to see rather than anything else)
Or, you may be concerned that similar changes in subjective wellbeing and affective mental health don’t represent similar changes in wellbeing? (We discuss this in Appendix A of the psychotherapy report).
So I also have issues with this, although it’s not the specific issue I’m raising here.
Or is it something else I haven’t articulated?
Nope—it’s pretty much exactly point 2.
Most of these issues are technical, and we recognise that our views could change with further work. However, we aren’t convinced there’s a ready-to-use method that is a better alternative for use with subjective wellbeing analyses.
Well, my contention is subjective wellbeing analyses shouldn’t be the sole basis for evaluation (but again, that’s probably a separate point).
I also welcome further explanation of your issues with our analysis, public or private. If you’d like to have low stakes chat about our work, you can schedule a time here. If that doesn’t work, email or message me, and we can make something work.
Thanks! I’ve (hopefully) signed up to speak to you tomorrow
Thanks for writing this Simon. I’m always pleased to see people scrutinising StrongMinds because it helps us all to build a better understanding of the most cost-effective ways to address the huge (and severely neglected) burden of disease from mental health conditions.
HLI’s researchers are currently enjoying some well-deserved holiday but they’ll be back next week and will respond in more detail then. In the meantime, I want to recommend the following resources (and discussion) for people reading this post:
HLI’s 2022 charity recommendation
HLI’s cost-effectiveness analysis of StrongMinds
AMA with Sean Mayberry (Founder & CEO of StrongMinds)
I also want to clarify two things related to the quote above:
HLI’s relationship with StrongMinds is no different to GiveWell’s relationship with the charities they recommend. We are separate organisations and HLI’s evaluation of StrongMinds is entirely independent.
HLI’s methodology follows a meta-analytic approach. We don’t take the results from StrongMinds’ own trials at face value. We explain this further here.
From an outside view, I see Happier Lives Institute as an advocacy organisation for mental health interventions, although I can imagine HLI see themselves as a research organisation working on communicating the effectiveness of mental health interventions. Ultimately, I am not sure there’s a lot distinguishing these roles.
Givewell however, is primarily a research and donor advisory organisation. Unlike HLI, it does not favour a particular intervention, or pioneer new metrics in support of said interventions.
Some things that HLI does that makes me think HLI is an advocacy org:
Recommend only 1 charity (StrongMinds)
Appear publicly on podcasts ect., and recommend StrongMinds
Write to Effective Giving platforms, requesting they add Strong Minds to their list of recommended organisations
Edit: Fixed acronym in first paragraph
I agree with all of these reasons. My other reasons for being unclear as to the relationship is the (to my eye) cynical timing and aggressive comparisons published annually during peak giving season.
StrongMinds vs Worms (Dec 22)
StrongMinds vs Betnets (Nov 22)
StrongMinds vs Cash transfers 2 (Nov 21)
StrongMinds vs Cash transfers (Oct 21)
Last year when this happened I thought it was a coincidence, twice is enemy action.
(Edit: I didn’t mean to imply that HLI is an “enemy” in some sense, it’s just a turn-of-phrase)
Simon,
It’s helpful to know why you thought the relationship was unclear.
But I don’t think us (HLI) publishing research during the giving season is “cynical timing” any more than you publishing this piece when many people from GWWC, FP, and HLI are on vacation is “cynical timing”.
When you’re an organization without guaranteed funding, it seems strategic to try to make yourself salient to people when they reach for their pocketbooks. I don’t see that as cynical.
FWIW, the explanation is rather mundane: the giving season acts as hard deadline which pushes us to finish our reports.
To add to this, even if it were timed, I don’t think that timing the publication outputs to coincide with peak giving season will necessarily differentiate between a funding-constrained research organisation and a funding-constrained advocacy organisation, if both groups think that peak giving season will lead to more donations that are instrumentally useful for their goals.
I think the reason I’m publishing it now is because it’s when I’m on vacation! (But yes, that’s a fair point).
I think the timing makes sense for HLI, but given how adverserial the articles come across (to me) it seems like they are trying to shift funding away from [generic top charity] to StrongMinds, which is why it seems to me it’s more about StrongMinds than HLI. I expect HLI could get just as much salence publishing about bednets on their own at that time than adding the comparison to StrongMinds. (Not sure about this though, but it does seem like the strategy seems to involve generating lots of heat rather than light)
Yes, that does make sense (and probably is about as mundane as my reason for publishing whilst GWWC, FP and HLI are on vacation)
To be clear, that’s what I meant to imply—I assumed you published this when you had time, not because the guards were asleep.
Everything is compared to StrongMinds because that’s what our models currently say is best. When (and I expect it’s only a matter of when) something else takes StrongMinds’ place, we will compare the charities we review to that one. The point is to frame the charities we review in terms of how they compare to our current best bet. I guess this is an alternative to putting everything in terms of GiveDirectly cash transfers—which IMO would generate less heat and light.
GW compares everything to GiveDirectly (which isn’t considered their best charity). I like that approach because:
Giving people cash is really easy to understand
It’s high capacity
It’s not a moving target (unlike say worms or betnets which changes all the time based on how the charities are executing)
I think for HLI (at their current stage) everthing is going to be a moving target (because there’s so much uncertainty about the WELLBY effect of every action) but I’d rather have only one moving target rather than two.
FWIW, I’m not unsympathetic to comparing everything to GiveDirectly CTs, and this is probably something we will (continue to) discuss internally at HLI.
I’m seeing a lot of accusations flying around in this thread (e.g. cynical, aggressive, enemy action, secret info etc.). This doesn’t strike me as a ‘scout mindset’ and I was glad to see Bruce’s comment that “it’s important to recognise that everyone here does share the same overarching goal of “how do we do good better”.
HLI has always been transparent about our goals and future plans. The front page of our website seems clear to me:
Our research agenda is also very clear about our priorities:
And our 2022 charity recommendation post makes it clear that we plan to investigate a wider range of interventions and charities in 2023:
My role as Communications Manager is to communicate the findings from our research to decision-makers to help them allocate their resources more effectively. There’s nothing suspicious about doing that in Giving Season. That’s what all the charity evaluators do.
We only recommend one charity because StrongMinds is the most cost-effective charity we’ve identified (so far) and they have a $20m funding gap which is very unlikely to be filled in this giving season. GiveWell has a lot more money to allocate so they have to find multiple charities with room for more funding. I hope that HLI will face (and solve) that problem in the future too!
In my personal opinion, GiveWell has been hugely successful and inspirational but it’s clear that their methodology cannot handle interventions that have benefits beyond health and wealth. That’s why HLI is bringing the WELLBY methodology from the academic and policy world into the global health field. It’s the same reason that Open Philanthropy ran an essay prize to find suggestions for measuring non-health, non-pecuniary benefits. Our entry to that competition set out the pros AND the cons of the WELLBY approach as well as our plans for further foundational research on subjective wellbeing measures.
There’s a lot more I could say, but this comment is already getting too long. The key thing I want to get across is that if you (the reader) are confused about HLI’s mission, strategy, or research findings, then please talk to us. I’m always happy to talk to people about HLI’s work on a call or via email.
This is helpful to know how we come across. Id encourage people to disagree or agree with Elliots comment as a straw poll on how readers perceptions of HLI accord with that characterization.
p.s. I think you meant to write “HLI” instead of “FHI”.
I agreed with Elliott’s comment, but for a somewhat different reason that I thought might be worth sharing. The “Don’t just give well, give WELLBYs” post gave me a clear feeling that HLI was trying to position itself as the Happiness/Well-Being GiveWell, including by promoting StrongMinds as more effective than programs run by classic GW top charities. A skim of HLI’s website gives me the same impression, although somewhat less strongly than that post.
The problem as I see it is that when you set GiveWell up as your comparison point, people are likely to expect a GiveWell-type balance in your presentation (and I think that expectation is generally reasonable). For instance, when GiveWell had deworming programs as a top charity option, it was pretty clear to me within a few minutes of reading their material that the evidence base for this intervention had some issues and its top-charity status was based on a huge potential upside-for-cost. When GiveWell had standout charities, it was very clear that the depth of research and investigation behind those programs was roughly an order of magnitude or so less than for the top charities. Although I didn’t read everything on HLI’s website, I did not walk away with the impression that the methodological weaknesses discussed in this and other threads were disclosed and discussed very much (or nearly as much as I would expect GiveWell to have done in analogous circumstances).
The fact that HLI seems to be consciously positioning itself as in the GiveWellian tradition yet lacks this balance in its presentations is, I think, what gives off the “advocacy organisation” vibes to me. (Of course, its not reasonable for anyone to expect HLI to have done the level of vetting that GiveWell has done for its top charities—so I don’t mean to suggest the lesser degree of vetting at this point is the issue.)
“Happiness/Wellbeing GiveWell” is a fair description of HLI in my opinion. However, I want to push back on your claim that GiveWell is more open and balanced.
As far as I can tell, there is nothing new in Simon’s post or subsequent comments that we haven’t already discussed in our psychotherapy and StrongMinds cost-effectiveness analyses. I’m looking forward to reading his future blog post on our analysis and I’m glad it’s being subjected to external scrutiny.
Whereas, GiveWell acknowledge they need to improve their reasoning transparency:
That’s just my opinion though and I don’t want to get into a debate about it here. Instead, I think we should all wait for GWWC to complete their independent evaluation of evaluators before drawing any strong conclusions about the relative strengths and weaknesses of the GiveWell and HLI methodologies.
To clarify, the bar I am suggesting here is something like: “After engaging with the recommender’s donor-facing materials about the recommended charity for 7-10 minutes, most potential donors should have a solid understanding of the quality of evidence and degree of uncertainty behind the recommendation; this will often include at least a brief mention of any major technical issues that might significantly alter the decision of a significant number of donors.”
Information in a CEA does not affect my evaluation of this bar very much. For qualify in my mind as “primarily a research and donor advisory organisation” (to use Elliot’s terminology), the organization should be communicating balanced information about evidence quality and degree of uncertainty fairly early in the donor-communication process. It’s not enough that the underlying information can be found somewhere in 77 pages of the CEAs you linked.
To analogize, if I were looking for information about a prescription drug, and visited a website I thought was patient-advisory rather than advocacy, I would expect to see a fair discussion of major risks and downsides within the first ten minutes of patient-friendly material rather than being only in the prescribing information (which, like the CEA, is a technical document).
I recognize that meeting the bar I suggested above will require HLI to communicate more doubt about that GiveWell needs to communicate about its four currently recommended charities; that is an unavoidable effect of the fact that GiveWell has had many years and millions of dollars to target the major sources of doubt on those interventions as applied to their effectiveness metrics, and HLI has not.
I want to close by affirming that HLI is asking important questions, and that there is real value in not being too tied to a single evaluator or evaluation methodology. That’s why I (and I assume others) took the time to write what I think is actionable feedback on how HLI can better present itself as a donor-advisory organization and send off fewer “advocacy group” vibes. So none of this is intended as a broad criticism of HLI’s existence. Rather, it is specifically about my perception that HLI is not adequately communicating information about evidence quality and degree of uncertainty in medium-form communications to donors.
I read this comment as implying that HLI’s reasoning transparency is currently better than Givewell’s, and think that this is both:
False.
Not the sort of thing it is reasonable to bring up before immediately hiding behind “that’s just my opinion and I don’t want to get into a debate about it here”.
I therefore downvoted, as well as disagree voting. I don’t think downvotes always need comments, but this one seemed worth explaining as the comment contains several statements people might reasonably disagree with.
Thanks for explaining your reasoning for the downvote.
I don’t expect everyone to agree with my comment but if you think it is false then you should explain why you think that. I value all feedback on how HLI can improve our reasoning transparency.
However, like I said, I’m going to wait for GWWC’s evaluation before expressing any further personal opinions on this matter.
TL;DR
I think an outsider may reasonably get the impression that HLI thinks its value is correlated with their ability to showcase the effectiveness of mental health charities, or of WELLBYs as an alternate metric to cause prioritisation. It might also be the case that HLI believes this, based on their published approach, which seems to assume that 1) happiness is what ultimately matters and 2) subjective wellbeing scores are the best way of measuring this. But I don’t personally think this is the case—I think the main value of an organisation like HLI is to help the GH research community work out the extent to which SWB scores are valuable in cause prioritisation, and how we best integrate these with existing measures (or indeed, replace them if appropriate). In a world where HLI works out that WELLBYs actually aren’t the best way of measuring SWB, or that actually we should weigh DALYs to SWB at a 1:5 ratio or a 4:1 ratio instead of replacing existing measures wholesale or disregarding them entirely, I’d still see these research conclusions as highly valuable (even if the money shifted metric might not be similarly high). And I think these should be possibilities that HLI remain open to in its research and considers in its theory of change going forward—though this is based mainly from a truth-seeking / epistemics perspective, and not because I have a deep knowledge of the SWB / happiness literature to have a well-formed view on this (though my sense is that it’s also not a settled question). I’m not suggesting that HLI is not already considering this or doing this, just that from reading the HLI website / published comments, it’s hard to clearly tell that this is the case (and I haven’t looked through the entire website, so I may have missed it).
======
Longer:
I think some things that may support Elliot’s views here:
HLI was founded with the mission of finding something better than GiveWell top charities under a subjective wellbeing (SWB) method. That means it’s beneficial for HLI in terms of achieving its phase 1 goal and mission that StrongMinds is highly effective. GiveWell doesn’t have this pressure of finding something better than it’s current best charities (or not to the same degree).
HLI’s investigation of various mental health programmes lead to its strong endorsement for StrongMinds. This was in part based on StrongMinds being the only organisation on HLI’s shortlist (of 13 orgs) to respond and engage with HLI’s request for information. Two potential scenarios for this:
HLI’s hypothesis that mental health charities are systematically undervalued is right, and thus, it’s not necessarily that StrongMinds is uniquely good (acknowledged by HLI here), but the very best mental health charities are all better than non-mental health charities under WELLBYs measurements, which is HLI’s preferred approach RE: “how to do the most good”. However this might bump up against priors or base rates or views around how good mental health charities on HLI’s shortlist might be vs existing GiveWell charities are as comparisons, whether all of global health prioritisation, aid or EA aid has been getting things wrong and we are in need of a paradigm shift, as well as whether WELLBYs and SWB scores alone should be a sufficient metric for “doing the most good”.
Mental health charities are not systematically undervalued, and current aid / EA global health work isn’t in need of a huge paradigm shift, but StrongMinds is uniquely good, and HLI were fortunate that the one that responded happened to be the one that responded. However, if an outsider’s priors on the effectiveness of good mental health interventions generally are much lower than HLI’s, it might seem like this result is very fortuitous for HLI’s mission and goals. On the other hand, there are some reasons to think they might be at least somewhat correlated:
well-run organisations are more likely to have capacity to respond to outside requests for information
organisations with good numbers are more likely to share their numbers etc
HLI have never published any conclusions that are net harmful for WELLBYs or mental health interventions. Depending on how much an outsider thinks GiveWell is wrong here, they might expect GiveWell to be wrong in different directions, and not only in one direction. Some pushback: HLI is young, and would reasonably focus on organisations that is most likely to be successful and most likely to change GiveWell funding priorities. These results are also what you’d expect if GiveWell IS in fact wrong on how charities should be measured.
I think ultimately the combination could contribute to an outsider’s uncertainty around whether they can take HLI’s conclusions at face value, or whether they believe these are the result of an unbiased search optimising for truth-seeking, e.g. if they don’t know who HLI researchers are or don’t have any reason to trust them beyond what they see from HLI’s outputs.
Some important disclaimers:
-All of these discussions are made possible because of HLI (and SM)’s transparency, which should be acknowledged.
-It seems much harder to defend against claims that paints HLI as an “advocacy org” or suggests motivated reasoning etc than to make the claim. It’s also the case that these findings are consistent with what we would expect if the claims 1) “WELLBYs or subjective wellbeing score alone is the best metric for ‘doing the most good’” and 2) “Existing metrics systematically undervalue mental health charities” are true, and HLI is taking a dispassionate, unbiased view towards this. All I’m saying is that an outsider might prefer to not default to believing this.
-It’s hard to be in a position to be challenging the status quo, in a community where reputation is important, and the status quo is highly trusted. Ultimately, I think this kind of work is worth doing, and I’m happy to see this level of engagement and hope it continues in the future.
-Lastly, I don’t want this message (or any of my other messages) to be interpreted to be an attack on HLI itself. For example, I found HLI’s Deworming and decay: replicating GiveWell’s cost-effectiveness analysis to be very helpful and valuable. I personally am excited about more work on subjective wellbeing measures generally (though I’m less certain if I’d personally subscribe to HLI’s founding beliefs), and I think this is a valuable niche in the EA research ecosystem. I also think it’s easy for these conversations to accidentally become too adversarial, and it’s important to recognise that everyone here does share the same overarching goal of “how do we do good better”.
(commenting in personal capacity etc)
I like that idea!
Edited, thanks
Thanks—I had looked at the HLI research and I do have a bunch of issues with the analysis (both presentation and research). My biggest issue at the moment is I can’t join up the dots between:
“a universal metric called wellbeing-adjusted life years (WELLBYs). One WELLBY is equivalent to a 1-point increase on a 0-10 life satisfaction scale for one year” (here)
“First, we define a ΔWELLBY to denote a one SD change in wellbeing lasting for one year” (Appendix D here)
In all the HLI research, everything seems to be calculated in the latter terms, which isn’t something meaningful at all (to the best of my understanding). The standard deviations you are using aren’t some global “variance in subjective well-being” but a the sample variance of subjective well-being which going to be materially lower. It’s also not clear to me that this is even a meaningful quantity. Especially when your metric for subjective well-being is a mental health survey in which a mentally healthy person in San Franscisco would answer the same as a mentally healthy person in the most acute poverty.
Hi Simon, I’m one of the authors of HLI’s cost-effectiveness analysis of psychotherapy and StrongMinds. I’ll be able to engage more when I return from vacation next week.
I see why there could be some confusion there. Regarding the two specifications of WELLBYs, the latter was unique to that appendix, and we consider the first specification to be conventional. In an attempt to avoid this confusion, we denoted all the effects as changes in ‘SDs’ or ‘SD-years’ of subjective wellbeing / affective mental health in all the reports (1,2,3,4,5) that were direct results in the intervention comparison.
Regarding whether these changes are “meaningful at all”, -- it’s unclear what you mean. Which of the following are you concerned with?
That standard deviation differences (I.e., Cohen’s d or Hedges g effect sizes) are reasonable ways to do meta-analyses?
Or is your concern more that even if SDs are reasonable for meta-analyses, they aren’t appropriate for comparing the effectiveness of interventions? We flag some possible concerns in Section 7 of the psychotherapy report. But we haven’t found sufficient evidence after several shallow dives to change our minds.
Or, you may be concerned that similar changes in subjective wellbeing and affective mental health don’t represent similar changes in wellbeing? (We discuss this in Appendix A of the psychotherapy report).
Or is it something else I haven’t articulated?
Most of these issues are technical, and we recognise that our views could change with further work. However, we aren’t convinced there’s a ready-to-use method that is a better alternative for use with subjective wellbeing analyses.
I also welcome further explanation of your issues with our analysis, public or private. If you’d like to have low stakes chat about our work, you can schedule a time here. If that doesn’t work, email or message me, and we can make something work.
This is exactly what confused me. In all the analytical pieces (and places linked to in the reports defining WELLBY on the 0-10 scale) you use SD but then there’s a chart which uses WELLBY and I couldn’t find where you convert from one to another.
I think this is a very reasonable way to do meta-analyses
Yes. This is exactly my confusion, specifically:
In the absence of evidence my prior is very strong that a group of people selected to have a certain level of depression is going to have a lower SD than a group of randomly sampled people. This is exactly my confusion. Furthermore, I would expect the SD of “generally healthy people” to be quite low and interventions to have low impact. For example, giving a health person an PS5 for Christmas might massively boost their subjective well-being, but probably doen’t do much for mental health. (This is related to your third point, but is more about the magnitude of changes I’d expect to see rather than anything else)
So I also have issues with this, although it’s not the specific issue I’m raising here.
Nope—it’s pretty much exactly point 2.
Well, my contention is subjective wellbeing analyses shouldn’t be the sole basis for evaluation (but again, that’s probably a separate point).
Thanks! I’ve (hopefully) signed up to speak to you tomorrow