we now believe that one of the key cost-effectiveness estimates for deworming is flawed, and contains several errors that overstate the cost-effectiveness of deworming by a factor of about 100. This finding has implications not just for deworming, but for cost-effectiveness analysis in general: we are now rethinking how we use published cost-effectiveness estimates for which the full calculations and methods are not public.
The cost-effectiveness estimate in question comes from the Disease Control Priorities in Developing Countries (DCP2), a major report funded by the Gates Foundation. This report provides an estimate of $3.41 per disability-adjusted life-year (DALY) for the cost-effectiveness of soil-transmitted-helminth (STH) treatment, implying that STH treatment is one of the most cost-effective interventions for global health. In investigating this figure, we have corresponded, over a period of months, with six scholars who had been directly or indirectly involved in the production of the estimate. Eventually, we were able to obtain the spreadsheet that was used to generate the $3.41/DALY estimate. That spreadsheet contains five separate errors that, when corrected, shift the estimated cost effectiveness of deworming from $3.41 to $326.43. We came to this conclusion a year after learning that the DCP2’s published cost-effectiveness estimate for schistosomiasis treatment – another kind of deworming – contained a crucial typo: the published figure was $3.36-$6.92 per DALY, but the correct figure is $336-$692 per DALY. (This figure appears, correctly, on page 46 of the DCP2.) …
I agree with their key takeaways, in particular (emphasis mine)
We’ve previously argued for a limited role for cost-effectiveness estimates; we now think that the appropriate role may be even more limited, at least for opaque estimates (e.g., estimates published without the details necessary for others to independently examine them) like the DCP2’s.
More generally, we see this case as a general argument for expecting transparency, rather than taking recommendations on trust – no matter how pedigreed the people making the recommendations. Note that the DCP2 was published by the Disease Control Priorities Project, a joint enterprise of The World Bank, the National Institutes of Health, the World Health Organization, and the Population Reference Bureau, which was funded primarily by a $3.5 million grant from the Gates Foundation. The DCP2 chapter on helminth infections, which contains the $3.41/DALY estimate, has 18 authors, including many of the world’s foremost experts on soil-transmitted helminths.
There appears to be a surprising amount of consistency in the shape of the distributions.
The distributions also appear to be closer to lognormal than normal — i.e. they are heavy-tailed, in agreement with Berger’s findings. However, they may also be some other heavy-tailed distribution (such as a power law), since these are hard to distinguish statistically.
Interventions were rarely negative within health (and the miscellaneous datasets), but often negative within social and education interventions (10–20%) — though not enough to make the mean and median negative. When interventions were negative, they seemed to also be heavy-tailed in negative cost effectiveness.
One way to quantify the interventions’ spread is to look at the ratio of between the mean of the top 2.5% and the overall mean and median. Roughly, we can say:
The top 2.5% were around 20–200 times more cost effective than the median.
The top 2.5% were around 8–20 times more cost effective than the mean.
Overall, the patterns found by Ord in the DCP2 seem to hold to a surprising degree in the other areas where we’ve found data.
Regarding your future work I’d like to see section, maybe Vasco’s corpus of cost-effectiveness estimates would be a good starting point. His quantitative modelling spans nearly every category of EA interventions, his models are all methodologically aligned (since it’s just him doing them), and they’re all transparent too (unlike the DCP estimates).
Regarding your future work I’d like to see section, maybe Vasco’s corpus of cost-effectiveness estimates would be a good starting point. His quantitative modelling spans nearly every category of EA interventions, his models are all methodologically aligned (since it’s just him doing them), and they’re all transparent too (unlike the DCP estimates).
Thanks for the suggestion, Mo! More transparent methodologically aligned estimates:
The Centre for Exploratory Altruism Research (CEARCH) has a sheet with 23 cost-effectiveness estimates across global health and development, global catastrophic risk, and climate change.
They are produced in 3 levels of depth, but they all rely on the same baseline methodology.
Ambitious Impact (AIM) has produced hundreds of cost-effectiveness estimates across global health and development, animal advocacy, and “EA meta”.
They are produced in different levels of depth. I collected 44 regarding the interventions recommended for their incubation programs until 2024[1]. However, they have more public estimates concerning interventions which made it to the last stage (in-depth report), but were not recommended, and way more internal estimates. Not only from in-depth reports of interventions which were not recommended[2], but also from interventions which did not make it to the last stage.
You can reach out to Morgan Fairless, AIM’s research director, to know more, and ask for access to AIM’s internal estimates.
Estimates from Rethink Priorities’ cross-cause cost-effectiveness model are also methodologically aligned within each area, but they are not transparent. No information at all is provided about the inputs.
AIM’s estimates respecting a given stage of a certain research round[3] will be especially comparable, as AIM often uses them in weighted factor models to inform which ones to move to the next stage or recommend. So I think you had better look into such sets of estimates over one covering all my estimates.
Are you talking about this post? Looks like those cost-effectiveness estimates were written by Ambitious Impact so I don’t know if there are some other estimates written by Vasco.
I got the cost-effectiveness estimates I analysed in that post about global health and development directly from Ambitious Impact (AIM), and the ones about animal welfare adjusting their numbers based on Rethink Priorities’ median welfare ranges[1].
I do not have my cost-effectiveness estimates collected in one place. I would be happy to put something together for you, such as a sheet with the name of the intervention, area, source, date of publication, and cost-effectiveness in DALYs averted per $. However, I wonder whether it would be better for you to look into sets of AIM’s estimates respecting a given stage of a certain research round. AIM often uses them in weighted factor models to inform which ones to move to the next stage or recommend, so they are supposed to be specially comparable. In contrast, mine often concern different assumptions simply because they span a long period of time. For example, I now guess disabling pain is 10 % as intense as I assumed until October.
I could try to quickly adjust all my estimates such that they all reflect my current assumptions, but I suspect it would not be worth it. I believe AIM’s estimates by stage of a particular research round would still be more methodologically aligned, and credible to a wider audience. I am also confident that a set with all my estimates, at least if interpreted at face value, much more closely follow a Pareto, lognormal or loguniform distribution than a normal or uniform distribution. I estimate broiler welfare and cage-free campaigns are 168 and 462 times as cost-effective as GiveWell’s top charities, and that the Shrimp Welfare Project (SWP) has been 64.3 k times as cost-effective as such charities.
AIM used to assume welfare ranges conditional on sentience equal to 1 before moving to estimating the benefits of animal welfare interventions in suffering-adjusted days (SADs) in 2024. I believe the new system still dramatically underestimates the intensity of excruciating pain, and therefore the cost-effectiveness of interventions decreasing it. I estimate the past cost-effectiveness of SWP is 639 DALY/$. For AIM’s pain intensities, and my guess that hurtful pain is as intense as fully healthy life, I get 0.484 DALY/$, which is only 0.0757 % (= 0.484/639) of my estimate. Feel free to ask Vicky Cox, senior animal welfare researcher at AIM, for the sheet with their pain intensities, and the doc with my suggestions for improvement.
In 2011, GiveWell published the blog post Errors in DCP2 cost-effectiveness estimate for deworming, which made me lose a fair bit of confidence in DCP2 estimates (and by extension DCP3):
I agree with their key takeaways, in particular (emphasis mine)
That said, my best guess is such spreadsheet errors probably don’t change your bottomline finding that charity cost-effectiveness really does follow a power law — in fact I expect the worst cases to be actively harmful (e.g. PlayPump International), i.e. negative DALYs/$. My prior essentially comes from 80K’s How much do solutions to social problems differ in their effectiveness? A collection of all the studies we could find, who find:
Regarding your future work I’d like to see section, maybe Vasco’s corpus of cost-effectiveness estimates would be a good starting point. His quantitative modelling spans nearly every category of EA interventions, his models are all methodologically aligned (since it’s just him doing them), and they’re all transparent too (unlike the DCP estimates).
Thanks for the suggestion, Mo! More transparent methodologically aligned estimates:
The Centre for Exploratory Altruism Research (CEARCH) has a sheet with 23 cost-effectiveness estimates across global health and development, global catastrophic risk, and climate change.
They are produced in 3 levels of depth, but they all rely on the same baseline methodology.
You can reach out to @Joel Tan🔸 to know more.
Ambitious Impact (AIM) has produced hundreds of cost-effectiveness estimates across global health and development, animal advocacy, and “EA meta”.
They are produced in different levels of depth. I collected 44 regarding the interventions recommended for their incubation programs until 2024[1]. However, they have more public estimates concerning interventions which made it to the last stage (in-depth report), but were not recommended, and way more internal estimates. Not only from in-depth reports of interventions which were not recommended[2], but also from interventions which did not make it to the last stage.
You can reach out to Morgan Fairless, AIM’s research director, to know more, and ask for access to AIM’s internal estimates.
Estimates from Rethink Priorities’ cross-cause cost-effectiveness model are also methodologically aligned within each area, but they are not transparent. No information at all is provided about the inputs.
AIM’s estimates respecting a given stage of a certain research round[3] will be especially comparable, as AIM often uses them in weighted factor models to inform which ones to move to the next stage or recommend. So I think you had better look into such sets of estimates over one covering all my estimates.
Meanwhile, they have published more recommended for the early 2025 incubation program.
Only in-depth reports of recommeded interventions are necessarily published.
There are 3 research rounds per year. 2 on global health and development, and 1 on animal welfare.
Are you talking about this post? Looks like those cost-effectiveness estimates were written by Ambitious Impact so I don’t know if there are some other estimates written by Vasco.
I’m thinking of all of his cost-effectiveness writings on this forum.
Thanks for the interest, Michael!
I got the cost-effectiveness estimates I analysed in that post about global health and development directly from Ambitious Impact (AIM), and the ones about animal welfare adjusting their numbers based on Rethink Priorities’ median welfare ranges[1].
I do not have my cost-effectiveness estimates collected in one place. I would be happy to put something together for you, such as a sheet with the name of the intervention, area, source, date of publication, and cost-effectiveness in DALYs averted per $. However, I wonder whether it would be better for you to look into sets of AIM’s estimates respecting a given stage of a certain research round. AIM often uses them in weighted factor models to inform which ones to move to the next stage or recommend, so they are supposed to be specially comparable. In contrast, mine often concern different assumptions simply because they span a long period of time. For example, I now guess disabling pain is 10 % as intense as I assumed until October.
I could try to quickly adjust all my estimates such that they all reflect my current assumptions, but I suspect it would not be worth it. I believe AIM’s estimates by stage of a particular research round would still be more methodologically aligned, and credible to a wider audience. I am also confident that a set with all my estimates, at least if interpreted at face value, much more closely follow a Pareto, lognormal or loguniform distribution than a normal or uniform distribution. I estimate broiler welfare and cage-free campaigns are 168 and 462 times as cost-effective as GiveWell’s top charities, and that the Shrimp Welfare Project (SWP) has been 64.3 k times as cost-effective as such charities.
AIM used to assume welfare ranges conditional on sentience equal to 1 before moving to estimating the benefits of animal welfare interventions in suffering-adjusted days (SADs) in 2024. I believe the new system still dramatically underestimates the intensity of excruciating pain, and therefore the cost-effectiveness of interventions decreasing it. I estimate the past cost-effectiveness of SWP is 639 DALY/$. For AIM’s pain intensities, and my guess that hurtful pain is as intense as fully healthy life, I get 0.484 DALY/$, which is only 0.0757 % (= 0.484/639) of my estimate. Feel free to ask Vicky Cox, senior animal welfare researcher at AIM, for the sheet with their pain intensities, and the doc with my suggestions for improvement.