First, the coding mistake with the standard error correction calculation.
Second, and I didn’t pick this up in the last comment, that the CT effect size change calculation were all referencing the same model, while the PT effect size changes were referencing their non-publication bias analog.
After correcting these errors, the picture does shift a bit, but the quantitative changes are relatively small.
Here’s the results where only the change due to the publication bias adjusts the cost-effectiveness comparison. More of the tests indicate a downwards correction, and the average / median test now indicates an adjustment from 9.4x to 8x. However, when we remove all adjustments that favor PT in the comparison (models 19, 25, 23, 21, 17, 27, 15) the (average / median) is ratio of PT to CT is now (7x / 8x). This is the same as it was before the corrections.
Note: I added vertical reference lines to mark the 3x, 7x and 9.44x multiples.
Next, I present the changes where we include the model choices as publication bias adjustments (e.g., any reduction in effect size that comes from using a fixed effect model or outlier removal is counted against PT—Gregory and Ryan support this approach. I’m still unsure, but it seems plausible and I’ll read more about it). The mean / median adjustment leads to a 6x/7x comparison ratio. Excluding all PT favorable results leads to an average / median correction of 5.6x / 5.8x slightly below the 6x I previously reported.
Note: I added vertical reference lines to mark the 3x, 7x and 9.44x multiples.
Since the second approach bites into the cost-effectiveness comparison more and to a degree that’s worth mentioning if true, I’ll read more / raise this with my colleagues about whether using fixed effect models / discarding outliers are appropriate responses to suspicion of publication bias.
If it turns out this is a more appropriate approach, then I should eat my hat re:
My complete guess is that if StrongMinds went below 7x GiveDirectly we’d qualitatively soften our recommendation of StrongMinds and maybe recommend bednets to more donors.
The issue re comparators is less how good dropping outliers or fixed effects are as remedies to publication bias (or how appropriate either would be as an analytic choice here all things considered), but the similarity of these models to the original analysis.
We are not, after all, adjusting or correcting the original metaregression analysis directly, but rather indirectly inferring the likely impact of small study effects on the original analysis by reference to the impact it has in simpler models.
The original analysis, of course, did not exclude outliers, nor follow-ups, and used random effects, not fixed effects. So of Models 1-6, model 1 bears the closest similarity to the analysis being indirectly assessed, so seems the most appropriate baseline.
The point about outlier removal and fixed effects reducing the impact of small study effects is meant to illustrate cycling comparators introduces a bias in assessment instead of just adding noise. Of models 2-6, we would expect 2, 4,5 and 6 to be more resilient to small study effects than model 1, because they either remove outliers, use fixed effects, or both (Model 3 should be ~ a wash). The second figure provides some (further) evidence of this, as (e.g.) the random effects models (thatched) strongly tend to report greater effect sizes than the fixed effect ones, regardless of additional statistical method.
So noting the discount for a statistical small study effect correction is not so large versus comparators which are already less biased (due to analysis choices contrary to those made in the original analysis) misses the mark.
If the original analysis had (somehow) used fixed effects, these worries would (largely) not apply. Of course, if the original analysis had used fixed effects, the effect size would have been a lot smaller in the first place.
--
Perhaps also worth noting is—with a discounted effect size—the overall impact of the intervention now becomes very sensitive to linear versus exponential decay of effect, given the definite integral of the linear method scales with the square of the intercept, whilst for exponential decay the integral is ~linear with the intercept. Although these values line up fairly well with the original intercept value of ~ 0.5, they diverge at lower values. If (e.g.) the intercept is 0.3, over a 5 year period the exponential method (with correction) returns ~1 SD years (vs.1.56 originally), whilst the linear method gives ~0.4 SD years (vs. 1.59 originally).
(And, for what it is worth, if you plug in corrected SE or squared values in to the original multilevel meta-regressions, PET/PEESE style, you do drop the intercept by around these amounts either vs. follow-up alone or the later models which add other covariates.)
Gregory,
Thank you for pointing out two errors.
First, the coding mistake with the standard error correction calculation.
Second, and I didn’t pick this up in the last comment, that the CT effect size change calculation were all referencing the same model, while the PT effect size changes were referencing their non-publication bias analog.
______________________________________________________
After correcting these errors, the picture does shift a bit, but the quantitative changes are relatively small.
Here’s the results where only the change due to the publication bias adjusts the cost-effectiveness comparison. More of the tests indicate a downwards correction, and the average / median test now indicates an adjustment from 9.4x to 8x. However, when we remove all adjustments that favor PT in the comparison (models 19, 25, 23, 21, 17, 27, 15) the (average / median) is ratio of PT to CT is now (7x / 8x). This is the same as it was before the corrections.
Note: I added vertical reference lines to mark the 3x, 7x and 9.44x multiples.
Next, I present the changes where we include the model choices as publication bias adjustments (e.g., any reduction in effect size that comes from using a fixed effect model or outlier removal is counted against PT—Gregory and Ryan support this approach. I’m still unsure, but it seems plausible and I’ll read more about it). The mean / median adjustment leads to a 6x/7x comparison ratio. Excluding all PT favorable results leads to an average / median correction of 5.6x / 5.8x slightly below the 6x I previously reported.
Note: I added vertical reference lines to mark the 3x, 7x and 9.44x multiples.
Since the second approach bites into the cost-effectiveness comparison more and to a degree that’s worth mentioning if true, I’ll read more / raise this with my colleagues about whether using fixed effect models / discarding outliers are appropriate responses to suspicion of publication bias.
If it turns out this is a more appropriate approach, then I should eat my hat re:
The issue re comparators is less how good dropping outliers or fixed effects are as remedies to publication bias (or how appropriate either would be as an analytic choice here all things considered), but the similarity of these models to the original analysis.
We are not, after all, adjusting or correcting the original metaregression analysis directly, but rather indirectly inferring the likely impact of small study effects on the original analysis by reference to the impact it has in simpler models.
The original analysis, of course, did not exclude outliers, nor follow-ups, and used random effects, not fixed effects. So of Models 1-6, model 1 bears the closest similarity to the analysis being indirectly assessed, so seems the most appropriate baseline.
The point about outlier removal and fixed effects reducing the impact of small study effects is meant to illustrate cycling comparators introduces a bias in assessment instead of just adding noise. Of models 2-6, we would expect 2, 4,5 and 6 to be more resilient to small study effects than model 1, because they either remove outliers, use fixed effects, or both (Model 3 should be ~ a wash). The second figure provides some (further) evidence of this, as (e.g.) the random effects models (thatched) strongly tend to report greater effect sizes than the fixed effect ones, regardless of additional statistical method.
So noting the discount for a statistical small study effect correction is not so large versus comparators which are already less biased (due to analysis choices contrary to those made in the original analysis) misses the mark.
If the original analysis had (somehow) used fixed effects, these worries would (largely) not apply. Of course, if the original analysis had used fixed effects, the effect size would have been a lot smaller in the first place.
--
Perhaps also worth noting is—with a discounted effect size—the overall impact of the intervention now becomes very sensitive to linear versus exponential decay of effect, given the definite integral of the linear method scales with the square of the intercept, whilst for exponential decay the integral is ~linear with the intercept. Although these values line up fairly well with the original intercept value of ~ 0.5, they diverge at lower values. If (e.g.) the intercept is 0.3, over a 5 year period the exponential method (with correction) returns ~1 SD years (vs.1.56 originally), whilst the linear method gives ~0.4 SD years (vs. 1.59 originally).
(And, for what it is worth, if you plug in corrected SE or squared values in to the original multilevel meta-regressions, PET/PEESE style, you do drop the intercept by around these amounts either vs. follow-up alone or the later models which add other covariates.)