GiveWell should fund an SMC replication

Abstract: This essay argues that the evidence supporting GiveWell’s top cause area – Seasonal Malaria Chemoprevention, or SMC – is much weaker than it appears at first glance and would benefit from high-quality replication. Specifically, GiveWell’s assertion that every $5,000 spent on SMC saves a life is a stronger claim than the literature warrants on three grounds: 1) the effect size is small and imprecisely estimated; 2) co-interventions delivered simultaneously pose a threat to external validity; and 3) the research lacks the quality markers of the replication/​credibility revolution. I conclude by arguing that any replication of SMC should meet the standards of rigor and transparency set by GiveDirectly, whose evaluations clearly demonstrate contemporary best practices in open science.

1. Introduction: the evidence for Seasonal Malaria Chemoprevention

GiveWell currently endorses four top charities, with first place going to the Malaria Consortium, a charity that delivers Seasonal Malaria Chemoprevention (SMC). GiveWell provides more context on its Malaria Consortium – Seasonal Malaria Chemoprevention page and its Seasonal Malaria Chemoprevention intervention report. That report is built around a Cochrane review of seven randomized controlled trials (Meremikwu et al. 2012). GiveWell discounts one of those studies (Dicko et al. 2008) for technical reasons and includes an additional trial published later (Tagbor et al. 2016) in its evidence base.

No new research has been added since then, and GiveWell’s SMC report was last updated in 2018. It appears as though GiveWell treats the question of “does SMC work?” as effectively settled.

I argue that GiveWell should revisit its conclusions about SMC and should fund and/​or oversee a high-quality replication study on the subject. While there is very strong evidence that SMC prevents the majority of malaria episodes, “including severe episodes” (Meremikwu et al. 2012, p. 2), GiveWell’s estimate that every $5,000 of SMC saves a life in expectation is shaky on three grounds related to research quality: 1) the underlying effect size is small, relative to the sample size, and statistically imprecise; 2) SMC is often tested in places receiving other interventions, which threatens external validity because we don’t know which set of interventions bests maps onto the target population; and 3) the evidence comes from studies that are pre-credibility revolution, and therefore lack quality controls such as detailed pre-registration, open code and data, and sufficient statistical power.

2. Three grounds for doubting the relationship between SMC and mortality

2.1 The effect size is small and imprecisely estimated

Across an N of 12,589, Meremikwu et al. record 10 deaths in the combined treatment groups and 16 in the combined control groups. Subtracting the one study that GiveWell discounts and including the one they supplement with, we arrive at 10 deaths for treatment and 15 for control. As the authors note, “the difference was not statistically significant” (p. 12), “and none of the trials were adequately powered to detect an effect on mortality…However, a reduction in death would be consistent with the high quality evidence of a reduction in severe malaria” (p. 4).[1]

Overall, the authors conclude, SMC “probably prevents some deaths,” but “[l]arger trials are necessary to have full confidence in this effect” (p. 4).

As a benchmark, a recent study on deworming (N = 14,172) estimates that deworming saves 18 lives per 1000 childbirths, versus about 0.4 for SMC.

GiveWell forthrightly acknowledges this “limited evidence” on its SMC page, and explains why it believes SMC reduces morality to a larger degree than the assembled studies suggest directly. This is laudably transparent, but the question is foundational to all of GiveWell’s subsequent analyses of SMC. Especially given the organization’s strong funding position, GiveWell should devote resources towards bolstering that limited evidence through replication.

2.2 It’s unclear which studies map directly to the target population

Of the seven studies analyzed by Meremikwu et al., four test SMC in settings where both treatment and control samples are already receiving anti-malaria interventions. Two studies test SMC along with “home-based management of malaria (HMM)” while two others test SMC “alongside ITN [insecticide treated nets] distribution and promotion” (p. 9).

GiveWell’s SMC intervention report notes that SMC + ITN trials found “similar proportional reduction in malaria incidence to trials which did not promote ITNs.” This finding is useful and interesting, but not does not self-evidently help us estimate the effects of just SMC on mortality, which is the basis of GiveWell’s cost-benefit analyses. To make the leap between the four studies that include co-interventions and those that don’t, we need an additional identifying assumption about external validity, such as:

  • any interaction effect between SMC and co-interventions is negligible or negative, which makes these estimates minimally or downwardly biased;

  • the target population will have a mix of people receiving ITNs, HMM, or neither, and, therefore, we should aggregate these studies to mirror the target population.

GiveWell does not take a position on this. The SMC intervention report says that the organization has “not carefully considered whether Malaria Consortium’s SMC program is operating in areas where ITN coverage is being expanded.” It does not mention HMM.

If we only look at the two studies that estimate the relationship between just SMC and mortality, we see that five children died in the combined control group (N = 1,139), while four died in the combined treatment group (N = 1,122). Every death of a child is a tragedy; but this difference is not a strong basis for determining where the marginal dollar is most likely to save a life, and we are always triaging.

This issue merits more careful attention than it currently receives from GiveWell. At a minimum, the SMC intervention page might be amended to note GiveWell’s position on the relationship between co-interventions and external validity. More broadly, an SMC replication could have multiple treatment arms to tease out the effects of both SMC and SMC + co-interventions.

2.3 The provided studies lack the quality markers of the credibility revolution

As Andrew Gelman puts it, “What has happened down here is the winds have changed;” as recently as 2011, “the replication crisis was barely a cloud on the horizon.” In the ten years since Meremikwu et al. was published – as well as in the six years since Tagbor et al. (2016) – we’ve learned a lot about what good research looks like. We’ve also learned, as described in a recent essay by Michael Nielsen and Kanjun Qiu, that studies meeting contemporary best practices – detailed pre-registration, “large samples, and open sharing of code, data and other methodological materials” – are systematically more likely to replicate successfully.

The studies cited by GiveWell in support of SMC do not clearly meet these criteria.

  • While the original trials are large enough to detect an effect on incidence of malaria, for effects on mortality, “the trials were underpowered to reach statistical significance” (Meremikwu et al. 2012, p. 2).

  • The code, data, and materials are not publicly available (as far as I can tell);

  • These studies were indeed preregistered (e.g. here and here), but not in ways that would meaningfully constrain researcher degrees of freedom.[2]

This isn’t to say they won’t replicate if we run them again using contemporary best practices. But given the literally hundreds of millions of dollars at stake, let’s verify rather than assume.

3. Conclusion

One of the unsettling conclusions of the replication revolution is that when studies implement stringent quality standards, they’re more likely to produce null results.[3]

As it happens, GiveDirectly, formerly one of GiveWell’s top charities, already has evaluations that meet the highest standards of credibility. Haushofer and Shapiro (2016), for instance, have a meticulous pre-registration plan, large sample sizes, and publicly available code and data; they also “hired two graduate students to audit the data and code” for reproducibility (p. 1977). A subsequent evaluation by the same authors found more mixed results: some positive, enduring changes but also some negative spillovers within treated communities. But GiveDirectly was much more likely to generate null and contradictory findings because its evaluations were so carefully done.

GiveWell argues that it only recommends charities that are at least 10X as effective as cash. Right now, that comparison is confounded by large differences in research quality between GiveDirectly’s evaluations and those supporting SMC.

GiveWell can remedy this by funding an equally high-quality replication for SMC – and then, ideally, for each of its top cause areas.

Thanks to Alix Winter and Daniel Waldinger for comments on an early draft.

  1. ^

    In the text, the Ns accompanying the mortality numbers are slightly different because not all studies recorded deaths (for those that did, N = 9533). I am assuming that if any deaths had occurred in those studies, they would have mentioned it as a matter of data collection/​attrition.

  2. ^

    This is a complicated subject on which many people have weighed in (e.g. here, here, here and here). For starters, here is an example of a pre-registration plan that meaningfully constrains its authors.

  3. ^

    I learned this first-hand when contributing to two meta-analyses.