Thank you very much for your kind words. Your two 2021 papers were big inspirations and guidance for this post.
I still believe that self-reported outcomes are at serious risk of social desirability bias, even though our test doesn’t detect it. The second experiment in your 3-RCT paper is, as far as I am concerned, dispositive. I also agree that the thing we need now is an assessment of the correlation between, on the one hand, attitudes, intentions, and self-reported outcomes, and objectively measured outcomes on the other.
Apropos of this, if we have time and get more funding, I’d like to go back and code up attitudes or intentions outcomes from every paper in our dataset that has them and assess the overall within-study correlation between attitudes and behaviors (even if self-reported). At least then we know how well the Knowledge-Attitude-Practice model holds for this literature, which is another way of saying, when we alter stated attitudes, are we influencing behaviors down the line? Another paper I just wrapped up found very little on this front.
No systematic searches yet, and a discarded draft had a little quip about why: “Ordinarily, this is where a meta-analysis tells you about an exhaustive search of many different databases, reports on how many abstracts the team read, and then how many papers they ended up with. We, however, don’t have a team of RAs to help with this, so we took a slightly more indirect approach.” I’m going to put this off for as long as possible because I also think it won’t be that fruitful, but if an editor or peer reviewer asks, we’ll hop to it 😃
edit: the following reply is based on a misread—I thought Maya was saying that the conversion didn’t work for binary outcomes, but she was saying that it didn’t work for continuous outcomes and a binary treatment. (She was perfectly clear, I was reading too fast.) So the remainder of this section is not apropos of Maya’s question, and is now a de facto appendix on an unrelated question, which is how do we convert binary outcomes into Glass’s ∆. Thank you for this, I should have specified (and will now modify the paper to say) that this applies, roughly, to **continuous** normally distributed outcomes. For binary DVs, our procedure is to treat outcomes as draws from a Bernoulli distribution whose variance is $p * (1-p)$, where p is the proportion of some event, and the standard deviation is the square root of variance. So our estimator is Δ=p1−p2√p2∗(1−p2) So, as you noted, the correspondence between ∆ and real-life effect size depends on the underlying variance. But for reference sake, if 50% of people in the control group order a meat-free meal and 60% do in the treatment group, ∆ = (0.6 − 0.5) / sqrt(0.5 * 0.5) = 0.1 /0.5 = 0.2. If the numbers are 20% in the treatment group and 10% in control, ∆ = (0.2 − 0.1) / sqrt(0.1 * 0.9) = 0.1 / 0.3 = 0.333.
This estimator has the nice property that the computed effect size grows larger as variance grows smaller, as opposed to converting from odds ratios to ∆. As Robin Gomila writes, any given odds ratio “can translate into different Cohen’s d values, depending on the probability″ of event incidence for the “dependent variable for control group participants″ (p. 8).
I believe this is the first time I/we have proposed this estimator to an actual statistician, WDYT?
Thanks so much for the thoughtful and interesting response, and I’m honored to hear that the 2021 papers helped lead into this. Cumulative science at work!
I fully agree. Our study was at best comparing a measure with presumably less social desirability bias to one with presumably more, and lacked any gold-standard benchmark. In any case, it was also only one particular intervention and setting. I think your proposed exercise of coding attitude and intention measures for each study would be very valuable. A while back, we had tossed around some similar ideas in my lab. I’d be happy to chat offline about how we could try to help support you in this project, if that would be helpful.
Makes sense.
For binary outcomes, yes, I think your analog to delta is reasonable. Often these proportion-involving estimates are not normal across studies, but that’s easy enough to deal with using robust meta-analysis or log-transforms, etc. I guess you approximated the variance of this estimate with the delta method or similar, which makes sense. For continuous outcomes, this actually was the case I was referring to (a binary treatment X and continuous outcome Y), since that is the setting where the d-to-r conversion I cited holds. Below is an MWE in R, and please do let me know if I’ve misinterpreted what you were proposing. I hope not to give the impression of harping on a very minor point – again, I found your analysis very thoughtful and rigorous throughout; I’m just indulging a personal interest in effect-size conversions.
Thanks again, Seth!
Maya
library(dplyr)
# sample size
N = 10^5
# population parameter
delta = .3
# assume same SD conditional on X=0 and X=1 so that Glass = Cohen
sd.within = .5
# E[Y | X=0] and E[Y | X=1]
m0 = .5
m1 = m0 + delta*sd.within
# generate data
d = data.frame( X = c( rep(0, N/2), rep(1, N/2) ) )
You are 100% right about this, my mistake. First, I read your first comment too fast (I placed ‘binary’ on the wrong side of the equation, as you noticed), and second, I think that the original paragraph confuses percentage change with percentile change. I removed the section.
I still want the final draft to present some intuitive, drawing-on-stats-that-we-learned-in-HS way to put standardized mean effect sizes into impact estimate terms, but I think we need to think more about this.
Thanks for engaging! FWIW I ran through your code and everything makes sense to me
Hi Maya,
Thank you very much for your kind words. Your two 2021 papers were big inspirations and guidance for this post.
I still believe that self-reported outcomes are at serious risk of social desirability bias, even though our test doesn’t detect it. The second experiment in your 3-RCT paper is, as far as I am concerned, dispositive. I also agree that the thing we need now is an assessment of the correlation between, on the one hand, attitudes, intentions, and self-reported outcomes, and objectively measured outcomes on the other.
Apropos of this, if we have time and get more funding, I’d like to go back and code up attitudes or intentions outcomes from every paper in our dataset that has them and assess the overall within-study correlation between attitudes and behaviors (even if self-reported). At least then we know how well the Knowledge-Attitude-Practice model holds for this literature, which is another way of saying, when we alter stated attitudes, are we influencing behaviors down the line? Another paper I just wrapped up found very little on this front.
No systematic searches yet, and a discarded draft had a little quip about why: “Ordinarily, this is where a meta-analysis tells you about an exhaustive search of many different databases, reports on how many abstracts the team read, and then how many papers they ended up with. We, however, don’t have a team of RAs to help with this, so we took a slightly more indirect approach.” I’m going to put this off for as long as possible because I also think it won’t be that fruitful, but if an editor or peer reviewer asks, we’ll hop to it 😃
edit: the following reply is based on a misread—I thought Maya was saying that the conversion didn’t work for binary outcomes, but she was saying that it didn’t work for continuous outcomes and a binary treatment. (She was perfectly clear, I was reading too fast.) So the remainder of this section is not apropos of Maya’s question, and is now a de facto appendix on an unrelated question, which is how do we convert binary outcomes into Glass’s ∆.
Thank you for this, I should have specified (and will now modify the paper to say) that this applies, roughly, to **continuous** normally distributed outcomes. For binary DVs, our procedure is to treat outcomes as draws from a Bernoulli distribution whose variance is $p * (1-p)$, where p is the proportion of some event, and the standard deviation is the square root of variance. So our estimator is Δ=p1−p2√p2∗(1−p2) So, as you noted, the correspondence between ∆ and real-life effect size depends on the underlying variance. But for reference sake, if 50% of people in the control group order a meat-free meal and 60% do in the treatment group, ∆ = (0.6 − 0.5) / sqrt(0.5 * 0.5) = 0.1 /0.5 = 0.2. If the numbers are 20% in the treatment group and 10% in control, ∆ = (0.2 − 0.1) / sqrt(0.1 * 0.9) = 0.1 / 0.3 = 0.333.
This estimator has the nice property that the computed effect size grows larger as variance grows smaller, as opposed to converting from odds ratios to ∆. As Robin Gomila writes, any given odds ratio “can translate into different Cohen’s d values, depending on the probability″ of event incidence for the “dependent variable for control group participants″ (p. 8).
I believe this is the first time I/we have proposed this estimator to an actual statistician, WDYT?
Hi Seth,
Thanks so much for the thoughtful and interesting response, and I’m honored to hear that the 2021 papers helped lead into this. Cumulative science at work!
I fully agree. Our study was at best comparing a measure with presumably less social desirability bias to one with presumably more, and lacked any gold-standard benchmark. In any case, it was also only one particular intervention and setting. I think your proposed exercise of coding attitude and intention measures for each study would be very valuable. A while back, we had tossed around some similar ideas in my lab. I’d be happy to chat offline about how we could try to help support you in this project, if that would be helpful.
Makes sense.
For binary outcomes, yes, I think your analog to delta is reasonable. Often these proportion-involving estimates are not normal across studies, but that’s easy enough to deal with using robust meta-analysis or log-transforms, etc. I guess you approximated the variance of this estimate with the delta method or similar, which makes sense. For continuous outcomes, this actually was the case I was referring to (a binary treatment X and continuous outcome Y), since that is the setting where the d-to-r conversion I cited holds. Below is an MWE in R, and please do let me know if I’ve misinterpreted what you were proposing. I hope not to give the impression of harping on a very minor point – again, I found your analysis very thoughtful and rigorous throughout; I’m just indulging a personal interest in effect-size conversions.
Thanks again, Seth!
Maya
You are 100% right about this, my mistake. First, I read your first comment too fast (I placed ‘binary’ on the wrong side of the equation, as you noticed), and second, I think that the original paragraph confuses percentage change with percentile change. I removed the section.
I still want the final draft to present some intuitive, drawing-on-stats-that-we-learned-in-HS way to put standardized mean effect sizes into impact estimate terms, but I think we need to think more about this.
Thanks for engaging! FWIW I ran through your code and everything makes sense to me
No worries. Effect-size conversions are very confusing. Thanks for doing this important project and for the exchange!