Seth and Benny, many thanks for this extremely interesting and thought-provoking piece. This is a major contribution to the field. It is especially helpful to have the quantitative meta-analyses and meta-regressions; the typically low within-study power in this literature can obscure the picture in some other reviews that just count significant studies. It’s also heartening to see how far this literature has come in the past few years in terms of measuring objective outcomes.
A few thoughts and questions:
1.) The meta-regression on self-reported vs. objectively measured outcomes is very interesting and, as you say, a little counter-intuitive. In a previous set of RCTs (Mathur 2021 in the forest plot), we found suggestive evidence of strong social desirability bias in the context of an online-administered documentary intervention. There, we only considered self-reported outcomes, but compared two types of outcomes: (1) stated intentions measured immediately (high potential for social desirability bias); vs. (2) reported consumption measured after 2 weeks (lower potential for social desirability bias). In light of your results, it could be that ours primarily reflected effects decaying over time, or genuine differences between intentions and behavior, more than pure social desirability bias. Methodologically, I think your findings point to the importance of head-to-head comparisons of self-reported vs. objective outcomes in studies that are capable of measuring both. If these findings continue to suggest little difference between these modes of outcome measurement, that would be great news for interpreting the existing literature using self-report measures and for doing future studies on the cheap, using self-report.
2.) Was there a systematic database search in addition to the thorough snowballing and manual searches? I kind of doubt that you would have found many additional studies this way, but this seems likely to come up in peer review if the paper is described as a systematic review.
3.) Very minor point: I think the argument about Glass delta = 0.3 corresponding to a 10% reduction in MAP consumption is not quite right. For a binary treatment X and continuous outcome Y, the relationship between Cohen’s d (not quite the same as Glass, as you say) and Pearson’s r is given by d = 2r / sqrt(1-r^2), such that d = 0.3 corresponds to r^2 (proportion of variance explained) = 0.02. Even so, the 2% of variation explained does not necessarily mean a 2% reduction in Y itself. Since Glass standardizes by only the control group SD, the same relationship will hold under equal SDs between the treatment and control group, and otherwise I do not think there will be a 1-1 relationship between delta and r.
Again, congratulations on this very well-conducted analysis, and best of luck with the journal submissions. I am very glad you are pursuing that.
Thank you very much for your kind words. Your two 2021 papers were big inspirations and guidance for this post.
I still believe that self-reported outcomes are at serious risk of social desirability bias, even though our test doesn’t detect it. The second experiment in your 3-RCT paper is, as far as I am concerned, dispositive. I also agree that the thing we need now is an assessment of the correlation between, on the one hand, attitudes, intentions, and self-reported outcomes, and objectively measured outcomes on the other.
Apropos of this, if we have time and get more funding, I’d like to go back and code up attitudes or intentions outcomes from every paper in our dataset that has them and assess the overall within-study correlation between attitudes and behaviors (even if self-reported). At least then we know how well the Knowledge-Attitude-Practice model holds for this literature, which is another way of saying, when we alter stated attitudes, are we influencing behaviors down the line? Another paper I just wrapped up found very little on this front.
No systematic searches yet, and a discarded draft had a little quip about why: “Ordinarily, this is where a meta-analysis tells you about an exhaustive search of many different databases, reports on how many abstracts the team read, and then how many papers they ended up with. We, however, don’t have a team of RAs to help with this, so we took a slightly more indirect approach.” I’m going to put this off for as long as possible because I also think it won’t be that fruitful, but if an editor or peer reviewer asks, we’ll hop to it 😃
edit: the following reply is based on a misread—I thought Maya was saying that the conversion didn’t work for binary outcomes, but she was saying that it didn’t work for continuous outcomes and a binary treatment. (She was perfectly clear, I was reading too fast.) So the remainder of this section is not apropos of Maya’s question, and is now a de facto appendix on an unrelated question, which is how do we convert binary outcomes into Glass’s ∆. Thank you for this, I should have specified (and will now modify the paper to say) that this applies, roughly, to **continuous** normally distributed outcomes. For binary DVs, our procedure is to treat outcomes as draws from a Bernoulli distribution whose variance is $p * (1-p)$, where p is the proportion of some event, and the standard deviation is the square root of variance. So our estimator is Δ=p1−p2√p2∗(1−p2) So, as you noted, the correspondence between ∆ and real-life effect size depends on the underlying variance. But for reference sake, if 50% of people in the control group order a meat-free meal and 60% do in the treatment group, ∆ = (0.6 − 0.5) / sqrt(0.5 * 0.5) = 0.1 /0.5 = 0.2. If the numbers are 20% in the treatment group and 10% in control, ∆ = (0.2 − 0.1) / sqrt(0.1 * 0.9) = 0.1 / 0.3 = 0.333.
This estimator has the nice property that the computed effect size grows larger as variance grows smaller, as opposed to converting from odds ratios to ∆. As Robin Gomila writes, any given odds ratio “can translate into different Cohen’s d values, depending on the probability″ of event incidence for the “dependent variable for control group participants″ (p. 8).
I believe this is the first time I/we have proposed this estimator to an actual statistician, WDYT?
Thanks so much for the thoughtful and interesting response, and I’m honored to hear that the 2021 papers helped lead into this. Cumulative science at work!
I fully agree. Our study was at best comparing a measure with presumably less social desirability bias to one with presumably more, and lacked any gold-standard benchmark. In any case, it was also only one particular intervention and setting. I think your proposed exercise of coding attitude and intention measures for each study would be very valuable. A while back, we had tossed around some similar ideas in my lab. I’d be happy to chat offline about how we could try to help support you in this project, if that would be helpful.
Makes sense.
For binary outcomes, yes, I think your analog to delta is reasonable. Often these proportion-involving estimates are not normal across studies, but that’s easy enough to deal with using robust meta-analysis or log-transforms, etc. I guess you approximated the variance of this estimate with the delta method or similar, which makes sense. For continuous outcomes, this actually was the case I was referring to (a binary treatment X and continuous outcome Y), since that is the setting where the d-to-r conversion I cited holds. Below is an MWE in R, and please do let me know if I’ve misinterpreted what you were proposing. I hope not to give the impression of harping on a very minor point – again, I found your analysis very thoughtful and rigorous throughout; I’m just indulging a personal interest in effect-size conversions.
Thanks again, Seth!
Maya
library(dplyr)
# sample size
N = 10^5
# population parameter
delta = .3
# assume same SD conditional on X=0 and X=1 so that Glass = Cohen
sd.within = .5
# E[Y | X=0] and E[Y | X=1]
m0 = .5
m1 = m0 + delta*sd.within
# generate data
d = data.frame( X = c( rep(0, N/2), rep(1, N/2) ) )
You are 100% right about this, my mistake. First, I read your first comment too fast (I placed ‘binary’ on the wrong side of the equation, as you noticed), and second, I think that the original paragraph confuses percentage change with percentile change. I removed the section.
I still want the final draft to present some intuitive, drawing-on-stats-that-we-learned-in-HS way to put standardized mean effect sizes into impact estimate terms, but I think we need to think more about this.
Thanks for engaging! FWIW I ran through your code and everything makes sense to me
Seth and Benny, many thanks for this extremely interesting and thought-provoking piece. This is a major contribution to the field. It is especially helpful to have the quantitative meta-analyses and meta-regressions; the typically low within-study power in this literature can obscure the picture in some other reviews that just count significant studies. It’s also heartening to see how far this literature has come in the past few years in terms of measuring objective outcomes.
A few thoughts and questions:
1.) The meta-regression on self-reported vs. objectively measured outcomes is very interesting and, as you say, a little counter-intuitive. In a previous set of RCTs (Mathur 2021 in the forest plot), we found suggestive evidence of strong social desirability bias in the context of an online-administered documentary intervention. There, we only considered self-reported outcomes, but compared two types of outcomes: (1) stated intentions measured immediately (high potential for social desirability bias); vs. (2) reported consumption measured after 2 weeks (lower potential for social desirability bias). In light of your results, it could be that ours primarily reflected effects decaying over time, or genuine differences between intentions and behavior, more than pure social desirability bias. Methodologically, I think your findings point to the importance of head-to-head comparisons of self-reported vs. objective outcomes in studies that are capable of measuring both. If these findings continue to suggest little difference between these modes of outcome measurement, that would be great news for interpreting the existing literature using self-report measures and for doing future studies on the cheap, using self-report.
2.) Was there a systematic database search in addition to the thorough snowballing and manual searches? I kind of doubt that you would have found many additional studies this way, but this seems likely to come up in peer review if the paper is described as a systematic review.
3.) Very minor point: I think the argument about Glass delta = 0.3 corresponding to a 10% reduction in MAP consumption is not quite right. For a binary treatment X and continuous outcome Y, the relationship between Cohen’s d (not quite the same as Glass, as you say) and Pearson’s r is given by d = 2r / sqrt(1-r^2), such that d = 0.3 corresponds to r^2 (proportion of variance explained) = 0.02. Even so, the 2% of variation explained does not necessarily mean a 2% reduction in Y itself. Since Glass standardizes by only the control group SD, the same relationship will hold under equal SDs between the treatment and control group, and otherwise I do not think there will be a 1-1 relationship between delta and r.
Again, congratulations on this very well-conducted analysis, and best of luck with the journal submissions. I am very glad you are pursuing that.
Hi Maya,
Thank you very much for your kind words. Your two 2021 papers were big inspirations and guidance for this post.
I still believe that self-reported outcomes are at serious risk of social desirability bias, even though our test doesn’t detect it. The second experiment in your 3-RCT paper is, as far as I am concerned, dispositive. I also agree that the thing we need now is an assessment of the correlation between, on the one hand, attitudes, intentions, and self-reported outcomes, and objectively measured outcomes on the other.
Apropos of this, if we have time and get more funding, I’d like to go back and code up attitudes or intentions outcomes from every paper in our dataset that has them and assess the overall within-study correlation between attitudes and behaviors (even if self-reported). At least then we know how well the Knowledge-Attitude-Practice model holds for this literature, which is another way of saying, when we alter stated attitudes, are we influencing behaviors down the line? Another paper I just wrapped up found very little on this front.
No systematic searches yet, and a discarded draft had a little quip about why: “Ordinarily, this is where a meta-analysis tells you about an exhaustive search of many different databases, reports on how many abstracts the team read, and then how many papers they ended up with. We, however, don’t have a team of RAs to help with this, so we took a slightly more indirect approach.” I’m going to put this off for as long as possible because I also think it won’t be that fruitful, but if an editor or peer reviewer asks, we’ll hop to it 😃
edit: the following reply is based on a misread—I thought Maya was saying that the conversion didn’t work for binary outcomes, but she was saying that it didn’t work for continuous outcomes and a binary treatment. (She was perfectly clear, I was reading too fast.) So the remainder of this section is not apropos of Maya’s question, and is now a de facto appendix on an unrelated question, which is how do we convert binary outcomes into Glass’s ∆.
Thank you for this, I should have specified (and will now modify the paper to say) that this applies, roughly, to **continuous** normally distributed outcomes. For binary DVs, our procedure is to treat outcomes as draws from a Bernoulli distribution whose variance is $p * (1-p)$, where p is the proportion of some event, and the standard deviation is the square root of variance. So our estimator is Δ=p1−p2√p2∗(1−p2) So, as you noted, the correspondence between ∆ and real-life effect size depends on the underlying variance. But for reference sake, if 50% of people in the control group order a meat-free meal and 60% do in the treatment group, ∆ = (0.6 − 0.5) / sqrt(0.5 * 0.5) = 0.1 /0.5 = 0.2. If the numbers are 20% in the treatment group and 10% in control, ∆ = (0.2 − 0.1) / sqrt(0.1 * 0.9) = 0.1 / 0.3 = 0.333.
This estimator has the nice property that the computed effect size grows larger as variance grows smaller, as opposed to converting from odds ratios to ∆. As Robin Gomila writes, any given odds ratio “can translate into different Cohen’s d values, depending on the probability″ of event incidence for the “dependent variable for control group participants″ (p. 8).
I believe this is the first time I/we have proposed this estimator to an actual statistician, WDYT?
Hi Seth,
Thanks so much for the thoughtful and interesting response, and I’m honored to hear that the 2021 papers helped lead into this. Cumulative science at work!
I fully agree. Our study was at best comparing a measure with presumably less social desirability bias to one with presumably more, and lacked any gold-standard benchmark. In any case, it was also only one particular intervention and setting. I think your proposed exercise of coding attitude and intention measures for each study would be very valuable. A while back, we had tossed around some similar ideas in my lab. I’d be happy to chat offline about how we could try to help support you in this project, if that would be helpful.
Makes sense.
For binary outcomes, yes, I think your analog to delta is reasonable. Often these proportion-involving estimates are not normal across studies, but that’s easy enough to deal with using robust meta-analysis or log-transforms, etc. I guess you approximated the variance of this estimate with the delta method or similar, which makes sense. For continuous outcomes, this actually was the case I was referring to (a binary treatment X and continuous outcome Y), since that is the setting where the d-to-r conversion I cited holds. Below is an MWE in R, and please do let me know if I’ve misinterpreted what you were proposing. I hope not to give the impression of harping on a very minor point – again, I found your analysis very thoughtful and rigorous throughout; I’m just indulging a personal interest in effect-size conversions.
Thanks again, Seth!
Maya
You are 100% right about this, my mistake. First, I read your first comment too fast (I placed ‘binary’ on the wrong side of the equation, as you noticed), and second, I think that the original paragraph confuses percentage change with percentile change. I removed the section.
I still want the final draft to present some intuitive, drawing-on-stats-that-we-learned-in-HS way to put standardized mean effect sizes into impact estimate terms, but I think we need to think more about this.
Thanks for engaging! FWIW I ran through your code and everything makes sense to me
No worries. Effect-size conversions are very confusing. Thanks for doing this important project and for the exchange!