Epistemic status: tentative, it’s been a long time since reading social science papers was a significant part of my life. Happy to edit/retract this preliminary view as appropriate if someone is able to identify mistakes.
Excluding outliers is thought sensible practice here; two related meta-analyses, Cuijpers et al., 2020c; Tong et al., 2023, used a similar approach.
I can’t access Cuijpers et al., but I don’t read Tong et al. as supporting what HLI has done here.
In their article, Tong et al. provide the effect size with no exclusions, then with outliers excluded, then with “extreme outliers” excluded (the latter of which seems to track HLI’s removal criterion). They also provide effect size with various publication-bias measures employed. See PDF at 5-6. If I’m not mistaken, the publication bias measures are applied to the no-exclusions version, not a version with outliers removed or limited to those with lower RoB. See id. at 6 tbl.2 (n = 117 for combined and 2 of 3 publication-bias effect sizes; 153 with trim-and-fill adding 36 studies; n = 74 for outliers removed & n = 104 for extreme outliers removed; effect sizes after publication-bias measures range from 0.42 to 0.60 seem to be those mentioned in HLI’s footnote above).
Tong et al. “conducted sensitive analyses comparing the results with and without the inclusion of extreme outliers,” PDF at 5, discussing the results without exclusion first and then the results with exclusion. See id. at 5-6. Tables 3-5 are based on data without exclusion of extreme outliers; the versions of Tables 4 and 5 that excludes extreme outliers are relegated to the supplemental tables (not in PDF). See id. at 6. This reads to my eyes as treating both the all-inclusive and extreme-outliers-excluded data seriously, with some pride of place to the all-inclusive data.
I don’t read Tong et al. as having reached a conclusion that either the all-inclusive or extreme-outliers-excluded results were more authoritative, saying things like:
Lastly, we were unable to explain the different findings in the analyses with vs. without extreme outliers. The full analyses that included extreme outliers may reflect the true differences in study characteristics, or they may imply the methodological issues raised by studies with effect sizes that were significantly higher than expected.
and
Therefore, the larger treatment effects observed in non-Western trials may not necessarily imply superior treatment outcomes. On the other hand, it could stem from variations in study design and quality.
and
Further research is required to explain the reasons for the differences in study design and quality between Western and non-Western trials, as well as the different results in the analyses with and without extreme outliers.
PDF at 10.
Of course, “further research needed” is an almost inevitable conclusion of the majority of academic papers, and Tong et al. have the luxury of not needing to reach any conclusions to inform the recommended distribution of charitable dollars. But I don’t read the article by Tong et al. as supporting the proposition that it is appropriate to just run with the outliers-excluded data. Rather, I read the article as suggesting that—at least in the absence of compelling reasons to the contrary—one should take both analyses seriously, but neither definitively.
I lack confidence in what taking both analyses seriously, but neither definitively would mean for purposes of conducting a cost-effectiveness analysis. But I speculate that it would likely involve some sort of weighting of the two views.
When we said “Excluding outliers is thought sensible practice here; two related meta-analyses, Cuijpers et al., 2020c; Tong et al., 2023, used a similar approach”—I can see that what we meant by “similar approach” was unclear. We meant that, conditional on removing outliers, they identify a similar or greater range of effect sizes as outliers as we do.
This was primarily meant to address the question raised by Gregory about whether to include outliers: “The cut data by and large doesn’t look visually ‘outlying’ to me.”
To rephrase, I think that Cuijpers et al. and Tong et al. would agree that the data we cut looks outlying. Obviously, this is a milder claim than our comment could be interpreted as making.
Turning to wider implications of these meta-analyses, As you rightly point out, they don’t have a “preferred specification” and are mostly presenting the options for doing the analysis. They present analyses with and without outlier removal in their main analysis, and they adjust for publication bias without outliers removed (which is not what we do). The first analytic choice doesn’t clearly support including or excluding outliers, and the second – if it supports any option, favors Greg’s proposed approach of correcting for publication bias without outliers removed.
I think one takeaway is that we should consider surveying the literature and some experts in the field, in a non-leading way, about what choices they’d make if they didn’t have “the luxury of not having to reach a conclusion”.
I think it seems plausible to give some weight to analyses with and without excluding outliers – if we are able find a reasonable way to treat the 2 out of 7 publication bias correction methods that produce the results suggesting that the effect of psychotherapy is in fact sizably negative. We’ll look into this more before our next update.
Cutting the outliers here was part of our first pass attempt at minimising the influence of dubious effects, which we’ll follow up with a Risk of Bias analysis in the next version. Our working assumption was that effects greater than ~ 2 standard deviations are suspect on theoretical grounds (that is, if they behave anything like SDs in an normal distribution), and seemed more likely to be the result of some error-generating process (e.g. data-entry error, bias) than a genuine effect.
We’ll look into this more in our next pass, but for this version we felt outlier removal was the most sensible choice.
I recently discovered that GiveWell decided to exclude an outlier in their water chlorination meta-analysis. I’m not qualified to judge their reasoning, but maybe others with sufficient expertise will weigh in?
We excluded one RCT that meets our other criteria because we think the results are implausibly high such that we don’t believe they represent the true effect of chlorination interventions (more in footnote).[4] It’s unorthodox to exclude studies for this reason when conducting a meta-analysis, but we chose to do so because we think it gives us an overall estimate that is more likely to represent the true effect size.
I recently discovered that GiveWell decided to exclude an outlier in their water chlorination meta-analysis. I’m not qualified to judge their reasoning, but maybe others with sufficient expertise will weigh in?
We excluded one RCT that meets our other criteria because we think the results are implausibly high such that we don’t believe they represent the true effect of chlorination interventions (more in footnote).[4] It’s unorthodox to exclude studies for this reason when conducting a meta-analysis, but we chose to do so because we think it gives us an overall estimate that is more likely to represent the true effect size.
Epistemic status: tentative, it’s been a long time since reading social science papers was a significant part of my life. Happy to edit/retract this preliminary view as appropriate if someone is able to identify mistakes.
I can’t access Cuijpers et al., but I don’t read Tong et al. as supporting what HLI has done here.
In their article, Tong et al. provide the effect size with no exclusions, then with outliers excluded, then with “extreme outliers” excluded (the latter of which seems to track HLI’s removal criterion). They also provide effect size with various publication-bias measures employed. See PDF at 5-6. If I’m not mistaken, the publication bias measures are applied to the no-exclusions version, not a version with outliers removed or limited to those with lower RoB. See id. at 6 tbl.2 (n = 117 for combined and 2 of 3 publication-bias effect sizes; 153 with trim-and-fill adding 36 studies; n = 74 for outliers removed & n = 104 for extreme outliers removed; effect sizes after publication-bias measures range from 0.42 to 0.60 seem to be those mentioned in HLI’s footnote above).
Tong et al. “conducted sensitive analyses comparing the results with and without the inclusion of extreme outliers,” PDF at 5, discussing the results without exclusion first and then the results with exclusion. See id. at 5-6. Tables 3-5 are based on data without exclusion of extreme outliers; the versions of Tables 4 and 5 that excludes extreme outliers are relegated to the supplemental tables (not in PDF). See id. at 6. This reads to my eyes as treating both the all-inclusive and extreme-outliers-excluded data seriously, with some pride of place to the all-inclusive data.
I don’t read Tong et al. as having reached a conclusion that either the all-inclusive or extreme-outliers-excluded results were more authoritative, saying things like:
Lastly, we were unable to explain the different findings in the analyses with vs. without extreme outliers. The full analyses that included extreme outliers may reflect the true differences in study characteristics, or they may imply the methodological issues raised by studies with effect sizes that were significantly higher than expected.
and
Therefore, the larger treatment effects observed in non-Western trials may not necessarily imply superior treatment outcomes. On the other hand, it could stem from variations in study design and quality.
and
Further research is required to explain the reasons for the differences in study design
and quality between Western and non-Western trials, as well as the different results in the analyses with and without extreme outliers.
PDF at 10.
Of course, “further research needed” is an almost inevitable conclusion of the majority of academic papers, and Tong et al. have the luxury of not needing to reach any conclusions to inform the recommended distribution of charitable dollars. But I don’t read the article by Tong et al. as supporting the proposition that it is appropriate to just run with the outliers-excluded data. Rather, I read the article as suggesting that—at least in the absence of compelling reasons to the contrary—one should take both analyses seriously, but neither definitively.
I lack confidence in what taking both analyses seriously, but neither definitively would mean for purposes of conducting a cost-effectiveness analysis. But I speculate that it would likely involve some sort of weighting of the two views.
Hi again Jason,
When we said “Excluding outliers is thought sensible practice here; two related meta-analyses, Cuijpers et al., 2020c; Tong et al., 2023, used a similar approach”—I can see that what we meant by “similar approach” was unclear. We meant that, conditional on removing outliers, they identify a similar or greater range of effect sizes as outliers as we do.
This was primarily meant to address the question raised by Gregory about whether to include outliers: “The cut data by and large doesn’t look visually ‘outlying’ to me.”
To rephrase, I think that Cuijpers et al. and Tong et al. would agree that the data we cut looks outlying. Obviously, this is a milder claim than our comment could be interpreted as making.
Turning to wider implications of these meta-analyses, As you rightly point out, they don’t have a “preferred specification” and are mostly presenting the options for doing the analysis. They present analyses with and without outlier removal in their main analysis, and they adjust for publication bias without outliers removed (which is not what we do). The first analytic choice doesn’t clearly support including or excluding outliers, and the second – if it supports any option, favors Greg’s proposed approach of correcting for publication bias without outliers removed.
I think one takeaway is that we should consider surveying the literature and some experts in the field, in a non-leading way, about what choices they’d make if they didn’t have “the luxury of not having to reach a conclusion”.
I think it seems plausible to give some weight to analyses with and without excluding outliers – if we are able find a reasonable way to treat the 2 out of 7 publication bias correction methods that produce the results suggesting that the effect of psychotherapy is in fact sizably negative. We’ll look into this more before our next update.
Cutting the outliers here was part of our first pass attempt at minimising the influence of dubious effects, which we’ll follow up with a Risk of Bias analysis in the next version. Our working assumption was that effects greater than ~ 2 standard deviations are suspect on theoretical grounds (that is, if they behave anything like SDs in an normal distribution), and seemed more likely to be the result of some error-generating process (e.g. data-entry error, bias) than a genuine effect.
We’ll look into this more in our next pass, but for this version we felt outlier removal was the most sensible choice.
I recently discovered that GiveWell decided to exclude an outlier in their water chlorination meta-analysis. I’m not qualified to judge their reasoning, but maybe others with sufficient expertise will weigh in?
I recently discovered that GiveWell decided to exclude an outlier in their water chlorination meta-analysis. I’m not qualified to judge their reasoning, but maybe others with sufficient expertise will weigh in?
It looks like the same comment got posted several times?
Thanks Rebecca. I will delete the duplicates.