Could you clarify your comment about Cohen’s d? In my experience with experimental work, p-values are used to establish the ‘existence’ of an effect. But (low/>0.05) p-values do not inherently mean an effect size is meaningful. Cohen’s d are meant to gauge effect sizes and meaningfulness (usually in relation to Cohen’s heuristics of 0.2, 0.5, and 0.8 for small, medium, and large effect sizes). However, Cohen argued it was lit and context dependent. Sometimes tiny effects are meaningful. The best example I can think of are the Milkman et al megastudy on text-based vaccine nudges.
I wasn’t taking issue with your skepticism of SM. I was just confused about your comments about Cohen’d given they are not typically used to demonstrate the existence of an effect. I’m just curious about your reasons as to why it might not be an ideal metric !
Yes - it was a fair question and what I wrote was phrased badly. I was just wondering if my explanation there was sufficient? (Basically my issue is that Cohen’s d only gives you information in SD terms, and it’s not easy to say whether or not SDs are a useful in this context or not)
Like with you and many other commenters here, I also find the large effect sizes quite puzzling. It definitely gives me “Hilgard’s Lament” vibes—“there’s no way to contest ridiculous data because ‘the data are ridiculous’ is not an empirical argument”. On the usefulness of Cohen’s d/SD, I’m not sure. I guess it has little to no meaning if there seems to be issues surrounding the reliability and validity of the data. Bruce linked to their recruitment guidelines and it doesn’t look very good.
I agree—that’s essentially the thing I want to resolve. I have basically thrown out a bunch of potential reasons:
The data is dubious
The data isn’t dubious, but isn’t saying what we think it’s saying—for example, it might be easy to move 1-SD of [unclear metric] might notbe that surprising depending on what [unclear metric] is.
The data isn’t dubious and StrongMinds really is a great charity
For option 3 to be compelling we certainly need a whole lot more than what’s been given. Many EA charities have a lot of RCT/qual work buttressing them while this doesn’t. It seems fundamentally strange then that EA orgs are pitching SM as the next greatest thing without the strong evidence that we expect from EA causes.
Could you clarify your comment about Cohen’s d? In my experience with experimental work, p-values are used to establish the ‘existence’ of an effect. But (low/>0.05) p-values do not inherently mean an effect size is meaningful. Cohen’s d are meant to gauge effect sizes and meaningfulness (usually in relation to Cohen’s heuristics of 0.2, 0.5, and 0.8 for small, medium, and large effect sizes). However, Cohen argued it was lit and context dependent. Sometimes tiny effects are meaningful. The best example I can think of are the Milkman et al megastudy on text-based vaccine nudges.
Does this comment answer your question or not?
I wasn’t taking issue with your skepticism of SM. I was just confused about your comments about Cohen’d given they are not typically used to demonstrate the existence of an effect. I’m just curious about your reasons as to why it might not be an ideal metric !
Yes - it was a fair question and what I wrote was phrased badly. I was just wondering if my explanation there was sufficient? (Basically my issue is that Cohen’s d only gives you information in SD terms, and it’s not easy to say whether or not SDs are a useful in this context or not)
Like with you and many other commenters here, I also find the large effect sizes quite puzzling. It definitely gives me “Hilgard’s Lament” vibes—“there’s no way to contest ridiculous data because ‘the data are ridiculous’ is not an empirical argument”. On the usefulness of Cohen’s d/SD, I’m not sure. I guess it has little to no meaning if there seems to be issues surrounding the reliability and validity of the data. Bruce linked to their recruitment guidelines and it doesn’t look very good.
Edit: Grammar and typos.
I agree—that’s essentially the thing I want to resolve. I have basically thrown out a bunch of potential reasons:
The data is dubious
The data isn’t dubious, but isn’t saying what we think it’s saying—for example, it might be easy to move 1-SD of [unclear metric] might notbe that surprising depending on what [unclear metric] is.
The data isn’t dubious and StrongMinds really is a great charity
For option 3 to be compelling we certainly need a whole lot more than what’s been given. Many EA charities have a lot of RCT/qual work buttressing them while this doesn’t. It seems fundamentally strange then that EA orgs are pitching SM as the next greatest thing without the strong evidence that we expect from EA causes.
I strongly agree—hence my title