I’m a researcher at SoGive conducting an independent evaluation of StrongMinds which will be published soon. I think the factual contents of your post here are correct. However, I suspect that after completing the research, I would be willing to defend the inclusion of StrongMinds on the GGWC list, and that the SoGive write-up will probably have a more optimistic tone than your post. Most of our credence comes from the wider academic literature on psychotherapy, rather than direct evidence from StrongMinds (which we agree suffers from problems, as you have outlined).
Regarding HLI’s analysis, I think it’s a bit confusing to talk about this without going into the details because there are both “estimating the impact” and “reframing how we think about moral-weights” aspects to the research. Ascertaining what the cost and magnitude of therapy’s effects are must be considered separately from the “therapy will score well when you use subjective-well-being as the standard by which therapy and cash transfers and malaria nets are graded” issue. As of now I do roughly think that HLI’s numbers regarding what the costs and effect sizes of therapy are on patients are in the right ballpark. We are borrowing the same basic methodology for our own analysis. You mentioned being confused by the methodology - there are a few points that still confuse me as well, but we’ll soon be publishing a spreadsheet model with a step by step explainer on the aspects of the model that we are borrowing, which may help.
If you ( @Simon_M or anyone else wishing to work at a similar level of analysis) is planning on diving into these topics in depth, I’d love to get in touch on the Forum and exchange notes.
Regarding the level of evidence: SoGive’s analysis framework outlines a “gold standard” for high impact, with “silver” and “bronze” ratings assigned to charities with lower-but-still-impressive cost-effectiveness ratings. However, we also distinguish between “tentative” ratings and “firm” ratings, to acknowledge that some high impact opportunities are based on more speculative estimates which may be revised as more evidence comes in. I don’t want to pre-empt our final conclusions on StrongMinds, but I wouldn’t be surprised if “Silver (rather than Gold)” and/or “Tentative (rather than Firm)” ended up featuring in our final rating. Such a conclusion still would be a positive one, on the basis of which donation and grant recommendations could be made.
There is precedent for effective altruists recommending donations to charities for which the evidence is still more tentative. Consider that Givewell recommends “top charities”, but also recommends less proven potentially cost-effective and scalable programs (formerly incubation grants). Identifying these opportunities allows the community to explore new interventions, and can unlock donations that counterfactually would not have been made, as different donors may make different subjective judgment calls about some interventions, or may be under constraints as to what they can donate to.
Having established that there are different criteria that one might look at in order to determine when an organization should be included in a list, and that more than one set of standards which may be applied, the question arises: What sort of standards does the GWWC top charities list follow, and is StrongMinds really out of place with the others? Speaking the following now personally, informally and not on behalf of any current or former employer: I would actually say that StrongMinds has much more evidencebacking than many of the other charities on this list (such as THL, Faunalytics, GFI, WAI, which by their nature don’t easily lend themselves to RCT data). Even if we restrict our scope to the arena of direct (excluding e.g. excluding pandemic research orgs) global health interventions, I wouldn’t be surprised if bright and promising potential stars such as Suvita and LEEP are actually at a somewhat similar stage as StrongMinds—they are generally evidence-based enough to deserve their endorsement on this list, but I’m not sure they’ve been as thoroughly vetted by external evaluators the way more established organizations such as Malaria Consortium might be. Because of all this, I don’t think StrongMinds seems particularly out of place next to the other GWWC recommendations. (Bearing in mind again that I want to speak casually as an individual for this last paragraph, and I am not claiming special knowledge of all the orgs mentioned for the purposes of this statement).
Finally, it’s great to see posts like this on the EA forum, thanks for writing it!
I might be being a bit dim here (I don’t have the time this week to do a good job of this), but I think of all the orgs evaluating StrongMinds, SoGive’s moral weights are most likely to find favourably for StrongMinds. Given that, I wonder what you expect you’d rate them at if you altered your moral weights to be more inline with FP and HLI?
SoGive’s Gold Standard Benchmarks are:
£5,000 per life saved
£50 to double someone’s consumption (spending) for one year
£200 to avert one year of severe depression
£5 to avert the suffering of one chicken who is living in very poor conditions
This is a ratio of 4:1 for averting a year of severe depression vs doubling someone’s consumption.
For context Founders Pledge who have a ratio somewhere around 1.3:1. Income doubling : DALY is 0.5 : 1. And severe depression corresponds to a DALY weighting of 0.658 in their CEA. (I understand they are shifting to a WELLBY framework like HLI, but I don’t think it will make much difference).
HLI is harder to piece together, but roughly speaking they see doubling income as having 1.3 WELLBY and severe depression has having a 1.3 WELLBY effect. A ratio of 1.3:1 (similar to FP)
Thanks for your question Simon, and it was very eagle-eyed of you to notice the difference in moral weights. Good sleuthing! (and more generally, thank you for provoking a very valuable discussion about StrongMinds)
I run SoGive and oversaw the work (then led by Alex Lawsen) to produce our moral weights. I’d be happy to provide further comment on our moral weights, however that might not be the most helpful thing. Here’s my interpretation of (the essence of) your very reasonable question:
“SoGive has a tendency to put a quite high value on tackling depression. Is this enough to explain why SoGive sounds like they might be more positive about StrongMinds than Simon M is?”
I have a simple answer to this: no, it isn’t.
Let me flesh that out. We have (at least) two sources of information:
Academic literature
Data from StrongMinds (e.g. their own evaluation report on themselves, or their regular reporting)
And we have (at least) two things we might ask about:
(a) How effective is the intervention that StrongMinds does, including the quality of evidence for it?
(b) How effective is the management team at StrongMinds?
I’d say that the main crux is the fact that our assessment of the quality of evidence for the intervention (item (a)) is based mostly on item 1 (the academic literature) and not on item 2 (data from StrongMinds).
This is the driver of the comments made by Ishaan above, not the moral weights.
And just to avoid any misunderstandings, I have not here said that the evidence base from the academic literature is really robust—we haven’t finished our assessment yet. I am saying that (unless our remaining work throws up some surprises) it will warrant a more positive tone than your post, and that it may well demonstrate a strong enough evidence base + good enough cost-effectiveness that it’s in the same ballpark as other charities in the GWWC list.
I don’t understand how that’s possible. If you put 3x the weight on StrongMind’s cost-effectiviness viz-a-vis other charities, changing this must move the needle on cost-effectiveness more than anything else. It’s possible to me it could have been “well into the range of gold-standard” and now it’s “just gold-standard” or “silver-standard”. However if something is silver standard, I can’t see any way in which your cost-effectivness being adjusted down by 1/3rd doesn’t massively shift your rating.
I’d say that the main crux is the fact that our assessment of the quality of evidence for the intervention (item (a)) is based mostly on item 1 (the academic literature) and not on item 2 (data from StrongMinds).
I feel like I’m being misunderstood here. I would be very happy to speak to you (or Ishaan) on the academic literature. I think probably best done in a more private forum so we can tease out our differences on this topic. (I can think of at least one surprise you might not have come across yet).
Ishaan’s work isn’t finished yet, and he has not yet converted his findings into the SoGive framework, or applied the SoGive moral weights to the problem. (Note that we generally try to express our findings in terms of the SoGive framework and other frameworks, such as multiples of cash, so that our results are meaningful to multiple audiences).
Just to reiterate, neither Ishaan nor I have made very strong statements about cost-effectiveness, because our work isn’t finished yet.
I would be very happy to speak to you (or Ishaan) on the academic literature.
That sounds great, I’ll message you directly. Definitely not wishing to misunderstand or misinterpret—thank you for your engagement on this topic :-)
To expand a little on “this seems implausible”: I feel like there is probably a mistake somewhere in the notion that anyone involves thinks that <doubling income as having 1.3 WELLBY and severe depression has having a 1.3 WELLBY effect.>
The mistake might be in your interpretation of HLI’s document (it does look like the 1.3 figure is a small part of some more complicated calculation regarding the economic impacts of AMF and their effect on well being, rather than intended as a headline finding about the cash to well being conversion rate). Or it could be that HLI has an error or has inconsistencies between reports. Or it could be that it’s not valid to apply that 1.3 number to “income doubling” SoGive weights for some reason because it doesn’t actually refer to the WELLBY value of doubling.
I’m not sure exactly where the mistake is, so it’s quite possible that you’re right, or that we are both missing something about how the math behind this works which causes this to work out, but I’m suspicious because it doesn’t really fit together with various other pieces of information that I know. For instance - it doesn’t really square with how HLI reported Psychotherapy is 9x GiveDirectly when the cost of treating one person with therapy is around $80, or how they estimated that it took $1000 worth of cash transfers to produce 0.92 SDs-years of subjective-well-being improvement (“totally curing just one case of severe depression for a year” should correspond to something more like 2-5 SD-years).
I wish I could give you a clearer “ah, here is where i think the mistake is” or perhaps a “oh, you’re right after all” but I too am finding the linked analysis a little hard to follow and am a bit short on time (ironically, because I’m trying to publish a different piece of Strongminds analysis before a deadline). Maybe one of the things we can talk about once we schedule a call is how you calculated this and whether it works? Or maybe HLI will comment and clear things up regarding the 1.3 figure you pulled out and what it really means.
Good stuff. I haven’t spent that much time looking at HLIs moral weights work but I think the answer is “Something is wrong with how you’ve constructed weights, HLI is in fact weighing mental health harder than SoGive”. I think a complete answer to this question requires me checking up on your calculations carefully, but I haven’t done so yet, so it’s possible that this is right.
If if were true that HLI found anything on the order of roughly doubling someone’s consumption improved well being as much as averting 1 case of depression, that would be very important as it would mean that SoGive moral weights fail some basic sanity checks. It would imply that we should raise our moral weight on cash-doubling to at least match the cost of therapy even under a purely subjective-well-being oriented framework to weighting. (why not pay 200 to double income, if it’s as good as averting depression and you would pay 200 to avert depression?) This seems implausible.
I haven’t actually been directly researching the comparative moral weights aspect, personally—I’ve been focusing primarily on <what’s the impact of therapy on depression in terms of effect size> rather than on the “what should the moral weights be” question (though I have put some attention to the “how to translate effect sizes into subjective intuitions” question, but that’s not quite the same thing). That said when I have more time I will look more deeply into this and check if our moral weights are failing some sort of sanity check on this order, but, I don’t think that they are.
Regarding the more general question of “where would we stand if we altered our moral weights to be something else”, ask me again in a month or so when all the spreadsheets are finalized, moral weights should be relatively easy to adjust once the analysis is done.
(as sanjay alludes to in the other thread, I do think all this is a somewhat separate discussion from the GWWC list—my main point with the GWWC list was that StrongMinds is not in the big picture actually super out of place with the others, in terms of how evidence-backed it is relative to the others, especially when you consider the big picture of the background academic literature about the intervention rather than their internal data. But I wanted to address the moral weights issue directly as it does seem like an important and separate point.)
that would be very important as it would mean that SoGive moral weights fail some basic sanity checks
I would recommend my post here. My opinion is—yes—SoGive’s moral weights do fail a basic sanity check.
1 year of averted depression is 4 income doublings 1 additional year of life (using GW life-expectancies for over 5s) is 1.95 income doublings.
ie SoGive would thinks depression is worse than death. Maybe this isn’t quite a “sanity check” but I doubt many people have that moral view.
I do think all this is a somewhat separate discussion from the GWWC list
I think cost-effectiveness is very important for this. StrongMinds isn’t so obviously great that we don’t need to consider the cost.
my main point with the GWWC list was that StrongMinds is not in the big picture actually super out of place with the others, in terms of how evidence-backed it is relative to the others, especially when you consider the big picture of the background academic literature about the intervention rather than their internal data
Yes, this is a great point which I think Jeff has addressed rather nicely in his new post. When I posted this it wasn’t supposed to be a critique of GWWC (I didn’t realise how bad the situation there was at the time) as much as a critique of StrongMinds. Now I see quite how bad it is, I’m honestly at a loss for words.
ie SoGive would thinks depression is worse than death. Maybe this isn’t quite a “sanity check” but I doubt many people have that moral view.
I replied in the moral weights post w.r.t. “worse than death” thing. (I think that’s a fundamentally fair, but fundamentally different point from what I meant re: sanity checks w.r.t not crossing hard lower bounds w.r.t. the empirical effects of cash on well being vs the empirical effect of mental health interventions on well being)
This is a great, balanced post which I appreciate thanks. Especially the point that there is a decent amount of RCT data for strongminds compared to other charities on the list.
I’m a researcher at SoGive conducting an independent evaluation of StrongMinds which will be published soon. I think the factual contents of your post here are correct. However, I suspect that after completing the research, I would be willing to defend the inclusion of StrongMinds on the GGWC list, and that the SoGive write-up will probably have a more optimistic tone than your post. Most of our credence comes from the wider academic literature on psychotherapy, rather than direct evidence from StrongMinds (which we agree suffers from problems, as you have outlined).
Regarding HLI’s analysis, I think it’s a bit confusing to talk about this without going into the details because there are both “estimating the impact” and “reframing how we think about moral-weights” aspects to the research. Ascertaining what the cost and magnitude of therapy’s effects are must be considered separately from the “therapy will score well when you use subjective-well-being as the standard by which therapy and cash transfers and malaria nets are graded” issue. As of now I do roughly think that HLI’s numbers regarding what the costs and effect sizes of therapy are on patients are in the right ballpark. We are borrowing the same basic methodology for our own analysis. You mentioned being confused by the methodology - there are a few points that still confuse me as well, but we’ll soon be publishing a spreadsheet model with a step by step explainer on the aspects of the model that we are borrowing, which may help.
If you ( @Simon_M or anyone else wishing to work at a similar level of analysis) is planning on diving into these topics in depth, I’d love to get in touch on the Forum and exchange notes.
Regarding the level of evidence: SoGive’s analysis framework outlines a “gold standard” for high impact, with “silver” and “bronze” ratings assigned to charities with lower-but-still-impressive cost-effectiveness ratings. However, we also distinguish between “tentative” ratings and “firm” ratings, to acknowledge that some high impact opportunities are based on more speculative estimates which may be revised as more evidence comes in. I don’t want to pre-empt our final conclusions on StrongMinds, but I wouldn’t be surprised if “Silver (rather than Gold)” and/or “Tentative (rather than Firm)” ended up featuring in our final rating. Such a conclusion still would be a positive one, on the basis of which donation and grant recommendations could be made.
There is precedent for effective altruists recommending donations to charities for which the evidence is still more tentative. Consider that Givewell recommends “top charities”, but also recommends less proven potentially cost-effective and scalable programs (formerly incubation grants). Identifying these opportunities allows the community to explore new interventions, and can unlock donations that counterfactually would not have been made, as different donors may make different subjective judgment calls about some interventions, or may be under constraints as to what they can donate to.
Having established that there are different criteria that one might look at in order to determine when an organization should be included in a list, and that more than one set of standards which may be applied, the question arises: What sort of standards does the GWWC top charities list follow, and is StrongMinds really out of place with the others? Speaking the following now personally, informally and not on behalf of any current or former employer: I would actually say that StrongMinds has much more evidence backing than many of the other charities on this list (such as THL, Faunalytics, GFI, WAI, which by their nature don’t easily lend themselves to RCT data). Even if we restrict our scope to the arena of direct (excluding e.g. excluding pandemic research orgs) global health interventions, I wouldn’t be surprised if bright and promising potential stars such as Suvita and LEEP are actually at a somewhat similar stage as StrongMinds—they are generally evidence-based enough to deserve their endorsement on this list, but I’m not sure they’ve been as thoroughly vetted by external evaluators the way more established organizations such as Malaria Consortium might be. Because of all this, I don’t think StrongMinds seems particularly out of place next to the other GWWC recommendations. (Bearing in mind again that I want to speak casually as an individual for this last paragraph, and I am not claiming special knowledge of all the orgs mentioned for the purposes of this statement).
Finally, it’s great to see posts like this on the EA forum, thanks for writing it!
I might be being a bit dim here (I don’t have the time this week to do a good job of this), but I think of all the orgs evaluating StrongMinds, SoGive’s moral weights are most likely to find favourably for StrongMinds. Given that, I wonder what you expect you’d rate them at if you altered your moral weights to be more inline with FP and HLI?
(Source)
This is a ratio of 4:1 for averting a year of severe depression vs doubling someone’s consumption.
For context Founders Pledge who have a ratio somewhere around 1.3:1. Income doubling : DALY is 0.5 : 1. And severe depression corresponds to a DALY weighting of 0.658 in their CEA. (I understand they are shifting to a WELLBY framework like HLI, but I don’t think it will make much difference).
HLI is harder to piece together, but roughly speaking they see doubling income as having 1.3 WELLBY and severe depression has having a 1.3 WELLBY effect. A ratio of 1.3:1 (similar to FP)
Thanks for your question Simon, and it was very eagle-eyed of you to notice the difference in moral weights. Good sleuthing! (and more generally, thank you for provoking a very valuable discussion about StrongMinds)
I run SoGive and oversaw the work (then led by Alex Lawsen) to produce our moral weights. I’d be happy to provide further comment on our moral weights, however that might not be the most helpful thing. Here’s my interpretation of (the essence of) your very reasonable question:
I have a simple answer to this: no, it isn’t.
Let me flesh that out. We have (at least) two sources of information:
Academic literature
Data from StrongMinds (e.g. their own evaluation report on themselves, or their regular reporting)
And we have (at least) two things we might ask about:
(a) How effective is the intervention that StrongMinds does, including the quality of evidence for it?
(b) How effective is the management team at StrongMinds?
I’d say that the main crux is the fact that our assessment of the quality of evidence for the intervention (item (a)) is based mostly on item 1 (the academic literature) and not on item 2 (data from StrongMinds).
This is the driver of the comments made by Ishaan above, not the moral weights.
And just to avoid any misunderstandings, I have not here said that the evidence base from the academic literature is really robust—we haven’t finished our assessment yet. I am saying that (unless our remaining work throws up some surprises) it will warrant a more positive tone than your post, and that it may well demonstrate a strong enough evidence base + good enough cost-effectiveness that it’s in the same ballpark as other charities in the GWWC list.
I don’t understand how that’s possible. If you put 3x the weight on StrongMind’s cost-effectiviness viz-a-vis other charities, changing this must move the needle on cost-effectiveness more than anything else. It’s possible to me it could have been “well into the range of gold-standard” and now it’s “just gold-standard” or “silver-standard”. However if something is silver standard, I can’t see any way in which your cost-effectivness being adjusted down by 1/3rd doesn’t massively shift your rating.
I feel like I’m being misunderstood here. I would be very happy to speak to you (or Ishaan) on the academic literature. I think probably best done in a more private forum so we can tease out our differences on this topic. (I can think of at least one surprise you might not have come across yet).
Ishaan’s work isn’t finished yet, and he has not yet converted his findings into the SoGive framework, or applied the SoGive moral weights to the problem. (Note that we generally try to express our findings in terms of the SoGive framework and other frameworks, such as multiples of cash, so that our results are meaningful to multiple audiences).
Just to reiterate, neither Ishaan nor I have made very strong statements about cost-effectiveness, because our work isn’t finished yet.
That sounds great, I’ll message you directly. Definitely not wishing to misunderstand or misinterpret—thank you for your engagement on this topic :-)
To expand a little on “this seems implausible”: I feel like there is probably a mistake somewhere in the notion that anyone involves thinks that <doubling income as having 1.3 WELLBY and severe depression has having a 1.3 WELLBY effect.>
The mistake might be in your interpretation of HLI’s document (it does look like the 1.3 figure is a small part of some more complicated calculation regarding the economic impacts of AMF and their effect on well being, rather than intended as a headline finding about the cash to well being conversion rate). Or it could be that HLI has an error or has inconsistencies between reports. Or it could be that it’s not valid to apply that 1.3 number to “income doubling” SoGive weights for some reason because it doesn’t actually refer to the WELLBY value of doubling.
I’m not sure exactly where the mistake is, so it’s quite possible that you’re right, or that we are both missing something about how the math behind this works which causes this to work out, but I’m suspicious because it doesn’t really fit together with various other pieces of information that I know. For instance - it doesn’t really square with how HLI reported Psychotherapy is 9x GiveDirectly when the cost of treating one person with therapy is around $80, or how they estimated that it took $1000 worth of cash transfers to produce 0.92 SDs-years of subjective-well-being improvement (“totally curing just one case of severe depression for a year” should correspond to something more like 2-5 SD-years).
I wish I could give you a clearer “ah, here is where i think the mistake is” or perhaps a “oh, you’re right after all” but I too am finding the linked analysis a little hard to follow and am a bit short on time (ironically, because I’m trying to publish a different piece of Strongminds analysis before a deadline). Maybe one of the things we can talk about once we schedule a call is how you calculated this and whether it works? Or maybe HLI will comment and clear things up regarding the 1.3 figure you pulled out and what it really means.
Replied here
Good stuff. I haven’t spent that much time looking at HLIs moral weights work but I think the answer is “Something is wrong with how you’ve constructed weights, HLI is in fact weighing mental health harder than SoGive”. I think a complete answer to this question requires me checking up on your calculations carefully, but I haven’t done so yet, so it’s possible that this is right.
If if were true that HLI found anything on the order of roughly doubling someone’s consumption improved well being as much as averting 1 case of depression, that would be very important as it would mean that SoGive moral weights fail some basic sanity checks. It would imply that we should raise our moral weight on cash-doubling to at least match the cost of therapy even under a purely subjective-well-being oriented framework to weighting. (why not pay 200 to double income, if it’s as good as averting depression and you would pay 200 to avert depression?) This seems implausible.
I haven’t actually been directly researching the comparative moral weights aspect, personally—I’ve been focusing primarily on <what’s the impact of therapy on depression in terms of effect size> rather than on the “what should the moral weights be” question (though I have put some attention to the “how to translate effect sizes into subjective intuitions” question, but that’s not quite the same thing). That said when I have more time I will look more deeply into this and check if our moral weights are failing some sort of sanity check on this order, but, I don’t think that they are.
Regarding the more general question of “where would we stand if we altered our moral weights to be something else”, ask me again in a month or so when all the spreadsheets are finalized, moral weights should be relatively easy to adjust once the analysis is done.
(as sanjay alludes to in the other thread, I do think all this is a somewhat separate discussion from the GWWC list—my main point with the GWWC list was that StrongMinds is not in the big picture actually super out of place with the others, in terms of how evidence-backed it is relative to the others, especially when you consider the big picture of the background academic literature about the intervention rather than their internal data. But I wanted to address the moral weights issue directly as it does seem like an important and separate point.)
I would recommend my post here. My opinion is—yes—SoGive’s moral weights do fail a basic sanity check.
1 year of averted depression is 4 income doublings
1 additional year of life (using GW life-expectancies for over 5s) is 1.95 income doublings.
ie SoGive would thinks depression is worse than death. Maybe this isn’t quite a “sanity check” but I doubt many people have that moral view.
I think cost-effectiveness is very important for this. StrongMinds isn’t so obviously great that we don’t need to consider the cost.
Yes, this is a great point which I think Jeff has addressed rather nicely in his new post. When I posted this it wasn’t supposed to be a critique of GWWC (I didn’t realise how bad the situation there was at the time) as much as a critique of StrongMinds. Now I see quite how bad it is, I’m honestly at a loss for words.
I replied in the moral weights post w.r.t. “worse than death” thing. (I think that’s a fundamentally fair, but fundamentally different point from what I meant re: sanity checks w.r.t not crossing hard lower bounds w.r.t. the empirical effects of cash on well being vs the empirical effect of mental health interventions on well being)
This is a great, balanced post which I appreciate thanks. Especially the point that there is a decent amount of RCT data for strongminds compared to other charities on the list.