The Case for Funding New Long-Term Randomized Controlled Trials of Deworming

Summary

Despite significant uncertainty in the cost-effectiveness of mass deworming, GiveWell has directed over a hundred million dollars in donations to deworming initiatives since 2011. Almost all the data underlying GiveWell’s cost-effectiveness estimate comes from a single 1998 randomized trial of deworming in 75 Kenyan schools. Errors in GiveWell’s estimate of cost-effectiveness (in either direction) could be driving an impactful misallocation of funding in the global health and development space, reducing the total welfare created by Effective Altruism (EA)-linked donations. A randomized controlled trial replicating the 1998 Kenya deworming trial could provide a substantial improvement in the accuracy of cost-effectiveness estimates, with a simplified model indicating the expected value of such a trial is in the millions of dollars per year. Therefore, EA-aligned donors may have made an error by not performing replication studies on the long-run economic impact of deworming and should prioritize running them in the future. More generally, this finding suggests that EA organizations may be undervaluing the information that could be gained from running experiments to replicate existing published results.

Introduction

Chronic parasitic infections are common in many regions of the world, including sub-Saharan Africa and parts of East Asia. Two common types of parasitic disease are schistosomiasis, which is transmitted by contaminated water, and the soil-transmitted helminth infections (STHs) trichuriasis, ascariasis, and hookworm. Mass deworming is the process of treating these diseases in areas of high prevalence by administering antiparasitic medications to large groups of people without first testing each individual for infection. The antiparasitic medications involved, praziquantel for schistosomiasis and albendazole for STHs, are cheap, have relatively few side effects, and are considered safe to administer on a large scale. There is strong evidence that deworming campaigns reduce the prevalence of parasitic disease, as well as weaker evidence that deworming campaigns improve broader life outcomes.

GiveWell has included charities working on deworming in its top charities list for over a decade, with the SCI Foundation (formerly the Schistosomiasis Control Initiative) and Evidence Action’s Deworm the World Initiative being the top recipients of GiveWell-directed deworming donations. As of 2020, GiveWell has directed $163 million to charities working on deworming, with this funding coming from individual donors giving to deworming organizations based on GiveWell’s recommendation, GiveWell funding deworming organizations directly via its Maximum Impact Fund, and Open Philanthropy donating to deworming organizations based on GiveWell’s research.[1]

GiveWell’s recommendation of deworming-focused charities is based almost entirely on the limited evidence linking deworming to long-term economic benefits, particularly increases in income and consumption. Regarding impacts on health, the GiveWell brief on deworming states “evidence for the impact of deworming on short-term general health is thin. We would guess that deworming has small impacts on weight, but the evidence for its impact on other health outcomes is weak.” So-called “supplemental factors” other than the effect on income change GiveWell’s overall cost-effectiveness estimate for Deworm the World by 7%.[2]

GiveWell’s estimate of the long-term economic benefit produced by deworming comes from “Twenty-Year Economic Impacts of Deworming” (2021), by Joan Hamory, Edward Miguel, Michael Walker, Michael Kremer, and Sarah Baird. This paper is a 20-year follow-up to “Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities” (2004) by Edward Miguel and Michael Kremer, which analyzed the results of the randomized introduction of deworming into 75 Kenyan schools over 3 years starting in 1998. The 20-year follow-up found that “Individuals who received two to three additional years of childhood deworming experienced a 14% gain in consumption expenditures and 13% increase in hourly earnings.” GiveWell uses the average of the point estimates from the 20-year follow-up on the increases in individual earnings and consumption to derive its estimate of the long-term economic benefit produced by deworming.[3] However, there is enormous uncertainty in these point estimates: the mean consumption increase is $199/​year with a standard error of $130/​year, while the mean income increase is $85/​year with a standard error of $171/​year.[4]

In light of this uncertainty, researchers and commentators have extensively debated whether deworming produces real benefits. The so-called “worm wars,” as these debates have come to be known, have involved multiple re-analyses of the data from the 1998 Kenya study and have led some researchers to conclude that deworming does not have significant economic effects. For example, Paul Garner, David Taylor-Robinson, Harshpal Singh Sachdev of the Liverpool School of Tropical Medicine wrote “it seems implausible that deworming itself would have an independent effect on school attendance or economic development.”

The goal of this essay is not to relitigate these discussions, but to argue that in light of this uncertainty, efforts to run new randomized controlled trials of deworming would likely have a high expected value. The next section provides an argument for this position in general terms, and the section after introduces a simplified statistical model of the potential benefits of such studies.

The Case for Additional Studies

The size of the uncertainty in deworming’s cost-effectiveness cannot easily be understated. Assuming a normal distribution, the mean and standard deviation in Hamory et al. (2021) indicate that the 95% confidence interval for the impact of deworming on income is between -$257/​year and +$427/​year, or alternately between −12% and +20%. This uncertainty is further compounded by the fact that the many of the regions in which Deworm the World and The SCI Foundation operate have important differences relative to the region studied in Hamory et al. (2021), such as in the rates of parasitic disease infection or in terms of other population health factors such as nutrition. In computing cost-effectiveness estimates, GiveWell attempts to make adjustments for some of these factors, but notes that its cost-effectiveness estimates are highly sensitive to assumptions on how these adjustments are made. These adjustments would therefore further increase the uncertainty in deworming’s cost-effectiveness.

A naive interpretation of the confidence intervals in Hamory et al. (2021) would indicate that the true impacts of deworming on economic outcomes are as likely to be higher than the nominal point estimates as they are to be lower. However, there are reasons to believe that the point estimates given in that paper are likely to be high. Two that are especially worth highlighting are publication bias and the lack of a clear causal mechanism for the economic effects. Publication bias in the public health field manifests as a tendency for negative results to not be published. This means that the lack of studies showing zero or negative effects of deworming on economic outcomes does not mean that no such studies were conducted; they may have been conducted and not published. Moreover, it is possible that the Miguel and Kremer (2004) study would not have been published if it had not found a substantial positive effect. Given the existence of publication bias, it is expected that the average effect size seen in published studies is an overestimate of the true positive effect of an intervention. The lack of a clear causal mechanism relates to the issue that overall data on the short-term effects of deworming do not indicate large enough changes in weight, cognition, and years of schooling to fully explain the economic effects seen in Hamory et al. (2021). This short-term data comes both from the Miguel and Kremer (2004) study (which showed some decreases in effect sizes when reanalyzed) and from other studies of deworming that only performed short-term follow-ups on participants. Deworming may still produce positive outcomes via harder-to-measure or less well understood effects than the ones discussed here. But the lack of a clear and currently-understood causal mechanism somewhat increases the likelihood that the economic effects of deworming seen in Hamory et al. (2021) are a statistical outlier rather than reflecting the true effectiveness.

To handle these issues, GiveWell applies an extremely large “replicability adjustment” that downweights the cost-effectiveness estimate produced from the data in Hamory et al. (2021) in order to arrive at its final cost-effectiveness estimate. Currently, GiveWell multiplies the raw cost-effectiveness estimate by 0.13 to produce the adjusted estimate, decreasing the effectiveness by approximately a factor of 7.[5] This adjustment has two implications. First, it further highlights the large uncertainties involved in current estimates of deworming’s effectiveness. Second, the fact that GiveWell has already applied such a large negative correction greatly increases the likelihood that further studies would increase GiveWell’s cost-effectiveness estimate, rather than decreasing it.

Regardless of the direction, a substantial error in GiveWell’s current estimate of the cost- effectiveness of deworming would significantly reduce global well-being. If GiveWell’s current estimate of deworming’s cost-effectiveness is too high, then well-being could likely be substantially improved by shifting donations away from deworming to other cause areas. Given that GiveWell reported in early 2022 that “we don’t expect to have enough funding to support all the cost-effective opportunities we find,” money diverted to ineffective causes directly detracts from highly effective organizations. However, the opposite case would also be a very serious issue. If deworming is substantially more cost-effective than GiveWell currently estimates, then funding other initiatives at the expense of deworming is forgoing a major opportunity to do good.

Further randomized controlled trials of deworming would not perfectly identify the impact of deworming on life outcomes. Just like Miguel and Kremer (2004) and its follow-ups, future trials will likely have substantial uncertainties on any estimates. But if these trials were run well, they would provide important evidence on the effectiveness of deworming that could be aggregated with existing research to produce more accurate cost-effectiveness estimates. Since in expectation, a replication trial will drive GiveWell’s estimate of the cost-effectiveness of deworming closer to the truth, one can expect a replication trial to improve funding allocations in a way that improves overall well-being.

Of course, a replication study would carry costs of its own. However, since deworming is already occurring, the only costs of running a controlled trial would be the costs of administering the randomization, collecting the additional data for both the treatment and control groups, and analyzing the results. Including the costs of the deworming drugs, existing studies analyzing the short-term effects of deworming have had treatment and data collection costs of approximately $1 per participant per year. Therefore, a study that enrolled 10,000 participants (to have similar statistical power to the 1998 trial in Kenya) and followed them for 10 years would likely cost less than $0.5 million in total, even after accounting for the costs of data analysis. When compared to the $163 million GiveWell directed to deworming in the last decade, as well as GiveWell’s aim of directing up to a billion dollars per year by 2025, it seems clear that a further randomized controlled trial of deworming would be a cost-effective use of funding.

Model

This section introduces a highly simplified model that compares the total well-being created by GiveWell-directed donations with and without a new RCT investigating the long-term effects of deworming. The model operates in terms of the “units of value” used in GiveWell’s cost effectiveness analysis. To give a sense of scale, GiveWell values preventing the death of a child under 5 from malaria at 116 units of value and increasing the natural log of consumption of one person by one for one year at 1.44 units of value.[6] Working in these units, the model makes the following introductory assumptions:

  • All dollars spent on non-deworming GiveWell top charities generate six times the units of value per dollar as donations to GiveDirectly. Denote this constant value G = 0.0034*6 = 0.0204. This is based on GiveWell’s 2022 statement “we think we’ll be able to recommend up to approximately $750 million in grants that are at least 6x as cost-effective as cash transfers.” Since the overall room for funding in non-deworming causes is much greater than that of deworming, it is assumed that shifting funding towards or away from deworming does not change the effectiveness of donations to other causes.

  • Denote the value generated by the first dollar of spending on deworming as D. The prior probability distribution (the probability distribution without a new RCT) for D is assumed to be normal: D1 ~ N(, ). It has a mean equal to the average of GiveWell’s value per dollar estimates across all regions in which the SCI initiative and Deworm the World operate (0.049 units of value). It has the same ratio of standard deviation to mean as in the Hamory et al. (2021) data on the consumption increase from deworming (giving a standard deviation of 0.032 units of value).

  • The value per marginal dollar spent on deworming decreases linearly. The slope s at which the value per dollar decreases is estimated at −1.825*10-9 units/​dollar, which was computed by assuming that GiveWell’s directed funding was perfectly rational in 2020, such that under the linearly-decreasing value assumption, the last dollar spent by GiveWell on deworming in 2020 (GiveWell directed $15,699,622 to deworming that year) generated the same value as dollars spent on the average non-deworming intervention (6x the value per dollar generated by donations to GiveDirectly).

  • This prior distribution is assumed to be well-calibrated, so a true number for the value of the first dollar spent on deworming Dtrue is generated by taking a random sample from the prior.

  • The total amount of funding available per year is FT = $224,500,413, which is the amount of money GiveWell directed in 2020 (the last year for which full information is available). Denote the portion of funding allocated to deworming FD and the portion of funding allocated to general non-deworming causes FG.

With these assumptions introduced, two scenarios are considered:[7]

Scenario 1: No new trial of deworming.

  • Without a new trial of deworming, the true cost-effectiveness number for deworming is unknown.

  • The fraction of funding per year distributed to each cause stays the same as it currently is. Describing this split in terms of the model’s assumptions, the amount of funding distributed to deworming is in this case computed based on the prior distribution. The amount is such that the marginal dollar given to deworming is no more effective than the marginal dollar given to other causes. All other funding is given to non-deworming causes.
    = = $15,699,622 = =$208,800,791

Scenario 2: New trial of deworming.

  • Another randomized controlled trial is run with the same statistical power as the trial referenced in Hamory et al. (2021). The mean first-dollar cost-effectiveness estimate produced by this RCT, , is drawn from a normal distribution Xr ~ N(Dtrue, ) with a mean equal to the true value of the first dollar and the same standard deviation as the prior probability distribution ().

  • A posterior probability distribution for the first-dollar cost-effectiveness of deworming is computed by performing a Bayesian update from the prior using the new evidence. Since the prior and likelihood function are both normal, the posterior distribution is also normal with the values given below for and .[8]

  • The amount of funding distributed to deworming is computed based on the posterior distribution. The amount donated to deworming is once again computed such that the marginal dollar given to deworming is no more effective than the marginal dollar given to other causes (additional dollars spent on deworming are still assumed to obey the same linear decrease in effectiveness). If the posterior estimate for the value of the first dollar given to deworming is less than the value of each dollar given to non-deworming causes, no money is given to deworming. All funding not spent on deworming is spent on non-deworming causes.

In both scenarios, the average value per dollar provided by deworming is computed by averaging the true value of the first dollar spent on deworming with the true value of the last dollar spent on deworming (since the value per dollar is modeled as decreasing linearly). Then the total value V created by GiveWell-directed donations is computed by multiplying the average value per dollar spent on deworming by the dollars spent on deworming and adding the average value per dollar spent on non-deworming interventions multiplied by the dollars spent on non-deworming interventions.

This model was run for one million randomized cases with these parameters, producing the distributions of value created per year shown in Figure 1. Note that the spike seen in the “With Replication Study” curve includes the cases in which the replication study leads GiveWell to stop funding deworming. The mean value created without the replication study was 4.81 million units/​year, while the mean value created with the replication study was 4.94 million units/​year. This difference in value created of approximately 130,000 units/​year is equivalent to the value created by $6.6 million/​year in donations to causes with 6x the effectiveness of GiveDirectly. This is a very large return for a study with a one-time cost of $0.5 million, and implies that such a study would be highly cost-effective.

Figure 1: Histogram of Value Generated by GiveWell Donations at 2020 Funding Levels with and Without a Deworming Replication Study

Robustness

The model introduced in the previous section makes a number of assumptions and simplifications, which this section discusses in more detail. To start off, one of the more significant factors not addressed in the previous section is that a new deworming trial will not deliver significant information on long-run effects until sufficient time has passed to measure them. The first time at which a follow-up study’s results would likely be helpful in determining deworming’s cost-effectiveness is when 5-year follow-up data could be measured to compare to Miguel and Kremer (2004). This means that the value generated by a replication study would not be realized for several years, requiring some adjustment for temporal discounting. However, given that the cost of a study is likely to be under half a million dollars and the benefits are in the millions of dollars per year, the expected value will be positive under any reasonable choice of discount rate.

A further aspect of the model that is especially simplified is the assumption that deworming’s cost-effectiveness decreases linearly as spending on deworming increases. GiveWell typically models decreasing returns to scale in terms of its “room for more funding” metric, which is similar to modeling the cost-effectiveness of donations to an organization as constant up until some funding level, at which point it becomes zero. GiveWell’s modeling approach is reasonable when considering a single organization, since a single nonprofit is likely to face bottlenecks on its expansion due to non-monetary factors such as the time required to onboard new employees. However, the approach taken by GiveWell is unlikely to be the best way to model effectiveness of an overall cause area, since spending on a cause can also be increased by adding new organizations to GiveWell’s top charities list (for example, both Partners in Health and Doctors Without Borders distribute deworming medication in some cases, but neither is a GiveWell top charity). Funding given to organizations that are not current top charities is likely to be less effective per dollar, but it should not be valued at zero. The exact shape of the relationship between funding level and marginal cost-effectiveness is unknown, but modeling it as linear is a common approach for this type of simplified modeling.

A third aspect of the model that is highly simplified is the assumption the GiveWell-directed donations are perfectly optimized for effectiveness. This is primarily because many of the donations directed by GiveWell are made by individual donors rather than through the Maximum Impact Fund. This assumption particularly impacts the estimate of the slope for deworming’s decreasing cost effectiveness as funding increases. Given that GiveWell did not fully fill deworming organizations’ room for additional funding in 2020, it is likely that if GiveWell’s current cost effectiveness model is assumed to be correct, the last dollar spent on deworming in 2020 was more than 6x as effective as donations to GiveDirectly. Therefore, the slope used in the model is likely an overestimate of the rate of decrease in cost-effectiveness per dollar. By manually varying the slope parameter, it was found that increasing the slope decreases the estimate of the value produced by the replication study, and vice-versa, so an overestimate in the slope would lead to an under-estimate in the study’s value. Additionally, in modeling the impact of new information, it is possible that non-optimal funding allocations in the future will reduce the value of gaining new information. However, GiveWell does try to optimize its donations from the maximum impact fund, and informed donors are also likely to shift their donations in response to new information. Modeling this behavior as perfect optimization is likely sufficient for this kind of simplified model.

The results of the model are also impacted by the choices of numerical parameters used, but the general conclusion of a replication study delivering over $1m/​year in equivalent donation value is robust to reasonable variation of the parameters. Increasing the uncertainty in the current estimate of deworming’s cost effectiveness increases the expected value of a replication study, while decreasing the uncertainty reduces it. However, it would require a reduction by a factor of approximately 2.5 before the expected value of a replication study falls below $1m/​year. This kind of decrease seems unreasonable given that it would remove zero from the 95% confidence interval of the value per dollar of deworming, and as discussed in the introduction, there are a substantial number of researchers who think the true effectiveness is roughly zero. Changes in the mean estimate for the first-dollar value of deworming do not change the general conclusion unless the prior estimate is reduced to be very close to or below the 6x GiveDirectly value used for the value per dollar of non-deworming interventions. Similarly, changing the estimate used for the value of non-deworming interventions does not change the general conclusion unless it is increased above the mean estimate of the first-dollar value of deworming. Finally, a significantly steeper slope for the declining marginal value of donations spent on deworming would reduce the expected value of new information, but approximately a 6x increase in slope would be required before the expected value of a replication study falls below $1m/​year. Moreover, as discussed above, the model assumptions likely lead to an under-estimate of the slope, not an overestimate.

Discussion

GiveWell has funded deworming programs for over a decade despite significant uncertainty in the cost-effectiveness of deworming. In that time, GiveWell’s researchers have done their best to extrapolate cost-effectiveness estimates from a single 1998 study in Kenya to the wide variety of regions in which GiveWell top charities are now running deworming programs. However, they have not obtained data from additional trials on deworming’s long-run impact. Given the challenging tradeoffs GiveWell’s staff are forced to make in allocating funding across a wide range of highly effective charitable organizations, more accurate information on cost-effectiveness would likely deliver a high expected value by enabling more optimal funding allocation.

It is possible that the reason GiveWell or Open Philanthropy has not funded this kind of replication study is that they are aware of trials on the long-run economic impacts of deworming that are already occurring. However, searches of the American Economic Association’s RCT Registry and ClinicalTrials.gov do not list any currently-active clinical trials of deworming with preregistered hypotheses related to long-term income or consumption. There are, however, several active clinical trials studying the efficacy of differing deworming methods in eliminating parasitic infections and in creating short-run improvements in health and school attendance. It is possible that EA organizations could work with the investigators in these trials to add plans for long-run follow-ups studying economic impacts.

Based on the simplified model discussed in the previous section, the expected return to a replication study on the long-run economic effects of deworming with similar statistical power to the 1998 study analyzed in Miguel and Kremer (2004) and its follow-ups is equivalent to the value of millions of dollars in additional funding per year. Given this high estimate, GiveWell or other EA-affiliated nonprofits working in the global health and development space should strongly consider running such a follow-up study to maximize their impact. Moreover, they may have lost an opportunity for maximizing their positive impact by not running such a study earlier. Had an EA-aligned organization decided to run a replication study during the first iteration of the “worm wars,” 5-year follow-up data would already be available for analysis. A 5-year follow-up would be enough to compare to Miguel and Kremer (2004), which would give GiveWell and other donors much better information to go on than is currently available.

The fact that no EA organization has tried to replicate Miguel and Kremer’s work is potentially suggestive of a more general blind spot in EA thinking. EA organizations may be overall putting too low a value on information relative to action, and should consider whether other kinds of replication studies would also deliver significant returns. For example, similar types of replication trials could help resolve the contradictions between different studies’ efficacy estimates for vitamin A supplementation. Targeted investment in this kind of replication research might deliver benefits even beyond the global health and development space, such as in evaluating the efficacy of interventions to improve animal welfare. Truly doing the most possible good is likely to require active and continued investment in replicating published results, rather than accepting the state of evidence as is.

  1. ^

    Information on how GiveWell measures donations that were influenced by its recommendations is in the appendix to Givewell’s Metrics Report.

  2. ^

    See line 106 of the cost-effectiveness analysis spreadsheet for the source of this 7% factor.

  3. ^

    See line 7 of the cost-effectiveness analysis spreadsheet for where the average of these estimates is incorporated into the overall model.

  4. ^

    See table S3 of the appendix to the paper.

  5. ^

    See line 11 of the cost-effectiveness model.

  6. ^

    See line 116 for the value of saving a life and line 127 for the value of increasing consumption.

  7. ^

    The model code is available as an R Jupyter notebook.

  8. ^

    See e.g. this paper for a derivation of these formulas.