I am not doubting that IPT-G is an effective method for treating depression. (I hope that came across in my article). I am doubting the data (and by extension the effect size) which they are seeing vs other methods.
They are somewhere between 1.4-3.7x higher than the meta-analyses from HLI where I would expect them to be lower than the meta-analysis effects. (It’s not clear to me that Cohen’s-d is the right metric here, which I want to say more about in future posts). tl;dr Cohen’s-d is more about saying “there is an effect” than how big the effect is.
Could you clarify your comment about Cohen’s d? In my experience with experimental work, p-values are used to establish the ‘existence’ of an effect. But (low/>0.05) p-values do not inherently mean an effect size is meaningful. Cohen’s d are meant to gauge effect sizes and meaningfulness (usually in relation to Cohen’s heuristics of 0.2, 0.5, and 0.8 for small, medium, and large effect sizes). However, Cohen argued it was lit and context dependent. Sometimes tiny effects are meaningful. The best example I can think of are the Milkman et al megastudy on text-based vaccine nudges.
I wasn’t taking issue with your skepticism of SM. I was just confused about your comments about Cohen’d given they are not typically used to demonstrate the existence of an effect. I’m just curious about your reasons as to why it might not be an ideal metric !
Yes - it was a fair question and what I wrote was phrased badly. I was just wondering if my explanation there was sufficient? (Basically my issue is that Cohen’s d only gives you information in SD terms, and it’s not easy to say whether or not SDs are a useful in this context or not)
Like with you and many other commenters here, I also find the large effect sizes quite puzzling. It definitely gives me “Hilgard’s Lament” vibes—“there’s no way to contest ridiculous data because ‘the data are ridiculous’ is not an empirical argument”. On the usefulness of Cohen’s d/SD, I’m not sure. I guess it has little to no meaning if there seems to be issues surrounding the reliability and validity of the data. Bruce linked to their recruitment guidelines and it doesn’t look very good.
I agree—that’s essentially the thing I want to resolve. I have basically thrown out a bunch of potential reasons:
The data is dubious
The data isn’t dubious, but isn’t saying what we think it’s saying—for example, it might be easy to move 1-SD of [unclear metric] might notbe that surprising depending on what [unclear metric] is.
The data isn’t dubious and StrongMinds really is a great charity
For option 3 to be compelling we certainly need a whole lot more than what’s been given. Many EA charities have a lot of RCT/qual work buttressing them while this doesn’t. It seems fundamentally strange then that EA orgs are pitching SM as the next greatest thing without the strong evidence that we expect from EA causes.
Oh no, I wasn’t trying to imply that that’s what you were doing. I wanted to comment on it because I was extremely doubtful that any kind of intervention could have very high impact (not even as high as SM claims, even something around 70-75% would have been surprising to me) when I first came across it and considered it very implausible until seeing the evidence base for GIPT, which made me think it’s not quite so outlandish as to be totally implausible (although, as I said, I still have my doubts and don’t think SM makes a strong enough case for their figures). Just wanted to share this for anyone else who was in my position.
I am not doubting that IPT-G is an effective method for treating depression. (I hope that came across in my article). I am doubting the data (and by extension the effect size) which they are seeing vs other methods.
They are somewhere between 1.4-3.7x higher than the meta-analyses from HLI where I would expect them to be lower than the meta-analysis effects. (It’s not clear to me that Cohen’s-d is the right metric here, which I want to say more about in future posts). tl;dr Cohen’s-d is more about saying “there is an effect” than how big the effect is.
Could you clarify your comment about Cohen’s d? In my experience with experimental work, p-values are used to establish the ‘existence’ of an effect. But (low/>0.05) p-values do not inherently mean an effect size is meaningful. Cohen’s d are meant to gauge effect sizes and meaningfulness (usually in relation to Cohen’s heuristics of 0.2, 0.5, and 0.8 for small, medium, and large effect sizes). However, Cohen argued it was lit and context dependent. Sometimes tiny effects are meaningful. The best example I can think of are the Milkman et al megastudy on text-based vaccine nudges.
Does this comment answer your question or not?
I wasn’t taking issue with your skepticism of SM. I was just confused about your comments about Cohen’d given they are not typically used to demonstrate the existence of an effect. I’m just curious about your reasons as to why it might not be an ideal metric !
Yes - it was a fair question and what I wrote was phrased badly. I was just wondering if my explanation there was sufficient? (Basically my issue is that Cohen’s d only gives you information in SD terms, and it’s not easy to say whether or not SDs are a useful in this context or not)
Like with you and many other commenters here, I also find the large effect sizes quite puzzling. It definitely gives me “Hilgard’s Lament” vibes—“there’s no way to contest ridiculous data because ‘the data are ridiculous’ is not an empirical argument”. On the usefulness of Cohen’s d/SD, I’m not sure. I guess it has little to no meaning if there seems to be issues surrounding the reliability and validity of the data. Bruce linked to their recruitment guidelines and it doesn’t look very good.
Edit: Grammar and typos.
I agree—that’s essentially the thing I want to resolve. I have basically thrown out a bunch of potential reasons:
The data is dubious
The data isn’t dubious, but isn’t saying what we think it’s saying—for example, it might be easy to move 1-SD of [unclear metric] might notbe that surprising depending on what [unclear metric] is.
The data isn’t dubious and StrongMinds really is a great charity
For option 3 to be compelling we certainly need a whole lot more than what’s been given. Many EA charities have a lot of RCT/qual work buttressing them while this doesn’t. It seems fundamentally strange then that EA orgs are pitching SM as the next greatest thing without the strong evidence that we expect from EA causes.
I strongly agree—hence my title
Oh no, I wasn’t trying to imply that that’s what you were doing. I wanted to comment on it because I was extremely doubtful that any kind of intervention could have very high impact (not even as high as SM claims, even something around 70-75% would have been surprising to me) when I first came across it and considered it very implausible until seeing the evidence base for GIPT, which made me think it’s not quite so outlandish as to be totally implausible (although, as I said, I still have my doubts and don’t think SM makes a strong enough case for their figures). Just wanted to share this for anyone else who was in my position.