I’m also pretty skeptical about the astronomical success rate SM professes, particularly because of some pretty serious methodology issues. Very significant confounding factors due to the recruitment method is, I think, the most important (recruitment from microfinance and employment training programs, to me, means that their sample would be predisposed to having improvements in depression symptoms because of improvement or even the possibility of improvement in material conditions), but the lackluster follow through with control groups and long-term assessment are also significant. I would love for them to have a qualitative study with participants to understand the mechanisms of improvement and what the participants feel has been significant in alleviating their depressive symptoms.
That being said, I think it’s worth mentioning that SM is not the first to try this method of treatment, and that there are a considerable amount of studies that have similar results (their methods also leave something to be desired, in my opinion, but not so much so that I think they should be disregarded). Meta-analyses for IPT have found that IPT is effective in treating depression and noteworthy as an empirically tested treatment [1]. The original Bolton et al. study that I believe SM takes as a model for its intervention and six-month follow up study claims that benefits have held for that long. Another study in Uganda that Bolton and some other members of initial research team conducted 10 years later gives some more insight in how it actually works through a qualitative study (to be clear, though, this is not a follow up with the participants from the initial papers). There are more than a dozen other studies implementing GIPT in a variety of different context and for different target groups, and they all point to high decreases in symptoms of depression, although they all suffer from confounding factors and either lack of long-term follow ups or lack of meaningfully tracked control groups.
This is not to say that SM’s estimates are probably correct, or that they for sure deserve their spot as top rated charity—I do think that they’re doing valuable work overall, and I have some confidence that providing this type of intervention that is somewhat self-sustaining with its focus on treating symptoms of depression by focusing on managing them and creating in-community support groups that are able to continue after the initial intervention is probably cost-effective for mental health treatment, and moreso in underserved communities, but I remain doubtful about the size and means of impact until I see some more evidence backing it up. That being said, I was surprised to see that SM’s purported results weren’t as out of left field as I initially assumed, so I wanted to share this for context.
Two of these studies (1, 2) are for IPT and not GIPT, and are notably conducted by a group led by the same person; the third focuses on postpartum depression specifically.
I am not doubting that IPT-G is an effective method for treating depression. (I hope that came across in my article). I am doubting the data (and by extension the effect size) which they are seeing vs other methods.
They are somewhere between 1.4-3.7x higher than the meta-analyses from HLI where I would expect them to be lower than the meta-analysis effects. (It’s not clear to me that Cohen’s-d is the right metric here, which I want to say more about in future posts). tl;dr Cohen’s-d is more about saying “there is an effect” than how big the effect is.
Could you clarify your comment about Cohen’s d? In my experience with experimental work, p-values are used to establish the ‘existence’ of an effect. But (low/>0.05) p-values do not inherently mean an effect size is meaningful. Cohen’s d are meant to gauge effect sizes and meaningfulness (usually in relation to Cohen’s heuristics of 0.2, 0.5, and 0.8 for small, medium, and large effect sizes). However, Cohen argued it was lit and context dependent. Sometimes tiny effects are meaningful. The best example I can think of are the Milkman et al megastudy on text-based vaccine nudges.
I wasn’t taking issue with your skepticism of SM. I was just confused about your comments about Cohen’d given they are not typically used to demonstrate the existence of an effect. I’m just curious about your reasons as to why it might not be an ideal metric !
Yes - it was a fair question and what I wrote was phrased badly. I was just wondering if my explanation there was sufficient? (Basically my issue is that Cohen’s d only gives you information in SD terms, and it’s not easy to say whether or not SDs are a useful in this context or not)
Like with you and many other commenters here, I also find the large effect sizes quite puzzling. It definitely gives me “Hilgard’s Lament” vibes—“there’s no way to contest ridiculous data because ‘the data are ridiculous’ is not an empirical argument”. On the usefulness of Cohen’s d/SD, I’m not sure. I guess it has little to no meaning if there seems to be issues surrounding the reliability and validity of the data. Bruce linked to their recruitment guidelines and it doesn’t look very good.
I agree—that’s essentially the thing I want to resolve. I have basically thrown out a bunch of potential reasons:
The data is dubious
The data isn’t dubious, but isn’t saying what we think it’s saying—for example, it might be easy to move 1-SD of [unclear metric] might notbe that surprising depending on what [unclear metric] is.
The data isn’t dubious and StrongMinds really is a great charity
For option 3 to be compelling we certainly need a whole lot more than what’s been given. Many EA charities have a lot of RCT/qual work buttressing them while this doesn’t. It seems fundamentally strange then that EA orgs are pitching SM as the next greatest thing without the strong evidence that we expect from EA causes.
Oh no, I wasn’t trying to imply that that’s what you were doing. I wanted to comment on it because I was extremely doubtful that any kind of intervention could have very high impact (not even as high as SM claims, even something around 70-75% would have been surprising to me) when I first came across it and considered it very implausible until seeing the evidence base for GIPT, which made me think it’s not quite so outlandish as to be totally implausible (although, as I said, I still have my doubts and don’t think SM makes a strong enough case for their figures). Just wanted to share this for anyone else who was in my position.
I’m also pretty skeptical about the astronomical success rate SM professes, particularly because of some pretty serious methodology issues. Very significant confounding factors due to the recruitment method is, I think, the most important (recruitment from microfinance and employment training programs, to me, means that their sample would be predisposed to having improvements in depression symptoms because of improvement or even the possibility of improvement in material conditions), but the lackluster follow through with control groups and long-term assessment are also significant. I would love for them to have a qualitative study with participants to understand the mechanisms of improvement and what the participants feel has been significant in alleviating their depressive symptoms.
That being said, I think it’s worth mentioning that SM is not the first to try this method of treatment, and that there are a considerable amount of studies that have similar results (their methods also leave something to be desired, in my opinion, but not so much so that I think they should be disregarded). Meta-analyses for IPT have found that IPT is effective in treating depression and noteworthy as an empirically tested treatment [1]. The original Bolton et al. study that I believe SM takes as a model for its intervention and six-month follow up study claims that benefits have held for that long. Another study in Uganda that Bolton and some other members of initial research team conducted 10 years later gives some more insight in how it actually works through a qualitative study (to be clear, though, this is not a follow up with the participants from the initial papers). There are more than a dozen other studies implementing GIPT in a variety of different context and for different target groups, and they all point to high decreases in symptoms of depression, although they all suffer from confounding factors and either lack of long-term follow ups or lack of meaningfully tracked control groups.
This is not to say that SM’s estimates are probably correct, or that they for sure deserve their spot as top rated charity—I do think that they’re doing valuable work overall, and I have some confidence that providing this type of intervention that is somewhat self-sustaining with its focus on treating symptoms of depression by focusing on managing them and creating in-community support groups that are able to continue after the initial intervention is probably cost-effective for mental health treatment, and moreso in underserved communities, but I remain doubtful about the size and means of impact until I see some more evidence backing it up. That being said, I was surprised to see that SM’s purported results weren’t as out of left field as I initially assumed, so I wanted to share this for context.
Two of these studies (1, 2) are for IPT and not GIPT, and are notably conducted by a group led by the same person; the third focuses on postpartum depression specifically.
I am not doubting that IPT-G is an effective method for treating depression. (I hope that came across in my article). I am doubting the data (and by extension the effect size) which they are seeing vs other methods.
They are somewhere between 1.4-3.7x higher than the meta-analyses from HLI where I would expect them to be lower than the meta-analysis effects. (It’s not clear to me that Cohen’s-d is the right metric here, which I want to say more about in future posts). tl;dr Cohen’s-d is more about saying “there is an effect” than how big the effect is.
Could you clarify your comment about Cohen’s d? In my experience with experimental work, p-values are used to establish the ‘existence’ of an effect. But (low/>0.05) p-values do not inherently mean an effect size is meaningful. Cohen’s d are meant to gauge effect sizes and meaningfulness (usually in relation to Cohen’s heuristics of 0.2, 0.5, and 0.8 for small, medium, and large effect sizes). However, Cohen argued it was lit and context dependent. Sometimes tiny effects are meaningful. The best example I can think of are the Milkman et al megastudy on text-based vaccine nudges.
Does this comment answer your question or not?
I wasn’t taking issue with your skepticism of SM. I was just confused about your comments about Cohen’d given they are not typically used to demonstrate the existence of an effect. I’m just curious about your reasons as to why it might not be an ideal metric !
Yes - it was a fair question and what I wrote was phrased badly. I was just wondering if my explanation there was sufficient? (Basically my issue is that Cohen’s d only gives you information in SD terms, and it’s not easy to say whether or not SDs are a useful in this context or not)
Like with you and many other commenters here, I also find the large effect sizes quite puzzling. It definitely gives me “Hilgard’s Lament” vibes—“there’s no way to contest ridiculous data because ‘the data are ridiculous’ is not an empirical argument”. On the usefulness of Cohen’s d/SD, I’m not sure. I guess it has little to no meaning if there seems to be issues surrounding the reliability and validity of the data. Bruce linked to their recruitment guidelines and it doesn’t look very good.
Edit: Grammar and typos.
I agree—that’s essentially the thing I want to resolve. I have basically thrown out a bunch of potential reasons:
The data is dubious
The data isn’t dubious, but isn’t saying what we think it’s saying—for example, it might be easy to move 1-SD of [unclear metric] might notbe that surprising depending on what [unclear metric] is.
The data isn’t dubious and StrongMinds really is a great charity
For option 3 to be compelling we certainly need a whole lot more than what’s been given. Many EA charities have a lot of RCT/qual work buttressing them while this doesn’t. It seems fundamentally strange then that EA orgs are pitching SM as the next greatest thing without the strong evidence that we expect from EA causes.
I strongly agree—hence my title
Oh no, I wasn’t trying to imply that that’s what you were doing. I wanted to comment on it because I was extremely doubtful that any kind of intervention could have very high impact (not even as high as SM claims, even something around 70-75% would have been surprising to me) when I first came across it and considered it very implausible until seeing the evidence base for GIPT, which made me think it’s not quite so outlandish as to be totally implausible (although, as I said, I still have my doubts and don’t think SM makes a strong enough case for their figures). Just wanted to share this for anyone else who was in my position.