I work as an engineer, donate 10% of my income, and occasionally enjoy doing independent research. I’m most interested in farmed animal welfare and the nitty-gritty details of global health and development work. In 2022, I was a co-winner of the GiveWell Change Our Mind Contest.
MHR
The Case for Funding New Long-Term Randomized Controlled Trials of Deworming
This is really interesting, and something I hadn’t thought about before. Doing a quick literature search, there is also previously existing evidence that high levels of dietary iron may impart a diabetes risk. So the effects seen in the paper don’t seem crazy, but I did come out of this with a couple of questions/comments.
Are the new lower estimates of anemia rates in India just due to changing cutoffs, or are they also because of existing supplementation/fortification programs working? If programs switched form mass supplementation to screen-and-treat, would they end up still giving a large fraction of the population supplements?
The confidence intervals reported in the preprint seem a tiny bit suspicious to me, given that the lower bounds are all between 1.001 and 1.01. Sometimes that’s just what comes out of the analysis, but it’s also what you’d expect to see if the authors had been p-hacking.
GiveWell’s 2021 metrics report is out! Funding distributed to deworming increased greatly last year, from $15,699,622 to $44,124,942. Rerunning the model with the higher 2021 funding levels, the mean estimate of the value created by a replication study increases to approximately 370,000 GiveWell value units per year. This is equivalent to the value created by $18.5 million/year in donations to organizations with a cost-effectiveness of 6x GiveDirectly’s.
In general, as EA-related organizations distribute more money per year, the value of information is naturally going to rise. So this kind of replication work will only get more important.
Thanks so much for taking the time to read the post and for really engaging with it. I very much appreciate your comment and I think there are some really good points in it. But based on my understanding of what you wrote, I’m not sure I currently agree with your conclusion. In particular, I think that looking in terms of minimum detectable effect can be a helpful shorthand, but it might be misleading more than it’s helping in this case. We don’t really care about getting statistical significance at p <0.05 in a replication, especially given that the primary effects seen in Hamory et al. (2021) weren’t significant at that level. Rather, we care about the magnitude of the update we’d make in response to new trial data.
To give a sense of why that’s so different, I want to start off with an oversimplified example. Consider two well-calibrated normal priors, one with a mean effect of 10 and standard deviation of 0.5, and one with a mean effect of 0.2 and the same standard deviation. By the simplified MDE criterion, a trial with a standard error of 3.5 would be required to detect the effect at p <0.05 80% of the time in the first case and a trial with a standard error of 0.07 would be required to detect the effect at p <0.05 80% of the time in the second case. But we would update our estimate of the mean by the same amount in the second case as in the first case if new trial data came in with a certain standard error and difference between its mean estimate and our prior mean. (The situation for deworming is more complex because the prior distribution is probably truncated at around zero. But I think the basic concept still holds, in that the sample size required to keep the same value of new information wouldn’t grow as fast as the sample size required to keep the same statistical power.)
Therefore, I don’t think the required sample size is likely to be nearly as big as you estimated in order to get a valuable update to GiveWell’s current cost-effectiveness estimate. However, your point is clearly correct in that the sample size will need to increase to handle the worm burden effect. That was something I hadn’t thought about in the original post, so I really appreciate you bringing it up in your comment. According to GiveWell, the highest-worm-burden regions in which Deworm the World operates (Kenya and Ogun State, Nigeria) have a worm burden adjustment of 20.5%. A replication trial would likely need to be substantially larger to account for that lower burden, but I don’t think that increase would be prohibitively large.
Regarding the replicability adjustment, I’m not sure it implies that a larger sample size would be needed to make a substantial update based on new trial data (separate from the larger sample needed to handle the worm burden effect). The replicability adjustment was arrived at by starting with a prior based on short-term effect data and performing a bayesian update based on the Miguel and Kremer followup results. If the follow-up study has the same statistical power as M&K, then the two can be pooled to make the update and they should be given equal weight.
Thinking about it qualitatively, if a replication trial showed a similar or greater effect size than Hamory et al. (2021) after accounting for the difference in worm burden, I would think that would imply a strong update away from GiveWell’s current replicability adjustment of 0.13. In fact, it might even suggest that deworming worked via an alternate mechanism than the ones considered in the analysis underlying GiveWell’s adjustment. On the flip side, I don’t think that GiveWell would be recommending deworming if the Miguel and Kremer follow-ups had found a point estimate of zero for the relevant effect sizes (the entire cost-effectiveness model starts with the Hamory et al. numbers and adjusts them). So if a replication study came in with a negative point estimate for the effect size, GiveWell should probably update noticeably towards zero.
Zooming out, I think that information on deworming’s effectiveness in the presence of current worm burdens and health conditions would be very valuable. GiveWell has done an admirable job of trying to extrapolate from the Miguel and Kremer trial and its follow-ups to a bunch of extremely different environments, but they’re changing the point estimate by a factor of ~66 in doing so. To me, that implies that there’s really tremendous uncertainty here, and that even imperfect evidence in the current environment would be very useful. Since deworming is so cheap, I’m particularly worried about the case where it’s noticeably more effective than GiveWell is currently estimating, in which case EA donors would be leaving a big opportunity to do good on the table.
Thank you again for taking the time to read the post!
Your comment that being off-campus made it feel psychologically farther matches my experience at Princeton. In college I felt like it was way easier to convince people to go somewhere on-campus (except for getting non-engineers to go to the engineering quad) than it was to convince people to go off-campus for something. That’s going to be very school-specific though, you wouldn’t have the some problem at NYU.
This is really fantastic work! I really love seeing this kind of basic science being done in animal welfare, since it’s so critical to being able to discuss and evaluate interventions.
This is really cool! I liked your approach to the model and I think your big-picture finding is at least moderately likely to be correct. Marshall alluded to this as well, but I think the upper end of your distribution for the average cost-effectiveness of new interventions is pretty unlikely (at least in global health and development), given that we know lots of interventions have already been tried and only a small number are >6x GiveDirectly’s cost-effectiveness. But the general finding that we may be underinvesting in research seems robust even if you exclude the cases where the average cost-effectiveness of new interventions is anywhere close to the current best causes.
I looked at a similar but much smaller issue recently in evaluating whether it would be a cost-effective use of funding to replicate the 1998 deworming trial in Kenya that’s been used to generate GiveWell’s cost-effectiveness estimates of deworming. My finding there was that it was indeed likely to be a good use of funds, and I concluded that we were probably more generally underinvesting in replication research. I think that finding would fit with this one, in that the underinvestment in replication research may be part of an overall underinvestment in research.
My thought here would be that if climate effects (or other factors) don’t substantially reduce the rate of technological and economic progress over the 21st century, then effects after the 21st century might be likely to be pretty small because our capacity to mitigate them would be enormous. If world real GDP keeps growing at a 3% annual rate, then GDP would be at about a quadrillion dollars/year in 2100 if I’m doing my math right (of course, one might argue it’s likely to be much higher due to AGI, much lower due to slowing technological progress etc.). But that kind of enormous world output would make solutions like scaling up direct air capture to get massively negative carbon emissions feasible. In light of that, it makes a lot more sense to worry about how climate change impacts humanity’s trajectory over the next 80 years than it does to worry about what the impacts will be after 2100.
This is really fascinating and in-depth work, and seems very valuable. You might want to consider working for GiveWell or Open Phil given your skillset and clear passion for this topic!
I did want to comment about one particular item you mentioned. You said:
New Incentives includes an ‘adjustment towards skeptical prior’, while no other charity does
I think this is not actually correct? Deworming charities have a “replicability adjustment for deworming” applied to the cost-effectiveness number (see e.g. line 11 of the Deworm the World sheet), which is arrived at via a similar kind of Bayesian framework.
I’m so grateful to everyone who wrote submissions for the EA criticism and red-teaming contest! I was really blown away by the number and quality of submissions.
In particular, I was super impressed by Froolow’s submission (someone needs to pay him whatever it costs to get him to come work for an EA org full-time!) and the work by the Happier Lives Institute.
Huh this is really interesting! On the funding side, I’d suggest having Aaron reach out to GiveWell (info@givewell.org) and Open Phil (info@openphilanthropy.org) to start a discussion. Both give incubation grants, but they may also be able to suggest other funding sources or provide other helpful information.
(Note that this comment is quick and not super well thought out. I hope to research and think about it more deeply at some point in the future, and maybe write it up in a better form).
As with many articles critical of EA, this article spends a while arguing against the early EA focus on earning to give:
To that end, I heard an EA-sympathetic graduate student explaining to a law student that she shouldn’t be a public defender, because it would be morally more beneficial for her to work at a large corporate law firm and donate most of her salary to an anti-malaria charity. The argument he made was that if she didn’t become a public defender, someone else would fill the post, but if she didn’t take the position as a Wall Street lawyer, the person who did take it probably wouldn’t donate their income to charity, thus by taking the public defender job instead of the Wall Street job she was essentially murdering the people whose lives she could have saved by donating a Wall Street income to charity.1
...
MacAskill wrote a moral philosophy paper arguing that even if we “suppose that the typical petrochemical company harms others by adding to the overall production of CO2 and thereby speeding up anthropogenic climate change” (a thing we do not need to “suppose”), if working for one would be “more lucrative” than any other career, “thereby enabling [a person] to donate more” then “the fact that she would be working for a company that harms others through producing CO2” wouldn’t be “a reason against her pursuing that career” since it “only makes others worse off if more CO2 is produced as a result of her working in that job than as a result of her replacement working in that job.” (You can of course see here the basic outlines of an EA argument in favor of becoming a concentration camp guard, if doing so was lucrative and someone else would take the job if you didn’t. But MacAskill says that concentration camp guards are “reprehensible” while it is merely “morally controversial” to take jobs like working for the fossil fuel industry, the arms industry, or making money “speculating on wheat, thereby increasing price volatility and disrupting the livelihoods of the global poor.” It remains unclear how one draws the line between “reprehensibly” causing other people’s deaths and merely “controversially” causing them.)4
It’s a little frustrating to me that EA orgs and public figures have basically conceded this argument and tend to shy away from actively defending earning to give as a standard EA path. I think the utilitarian argument that the quoted graduate student was making is basically correct (with the need to properly account for one’s career decision marginally impacting salaries in your given field, and whether one is likely to be a more effective worker than the person one is displacing). On the flip side, I think the deontological argument that NJR is making doesn’t really hold up that well under scrutiny? Current Affairs is a print magazine, printing and mailing thousands of copies of it every month contributes to resource usage and climate change. NJR presumably is okay with this because he thinks that the benefits of educating and informing his readership exceed the harms of his resource usage. In the same way, I think working in a job that produces some negative harms can be okay if the net benefits of donating one’s income substantially outweigh those harms. I think this gets even more stark when you try and actually think through the human scale of it all. Imagine having to tell ten thousand parents that the reason their kids won’t get anti-malaria pills this year is that you working as a stock trader violates the categorical imperative. It sounds absurd, but that’s the kind of thing we’re talking about here.
Something that I do think I and NJR would agree on is that it’s really screwed up that the world is in this situation to start with. There’s something deeply unjust about a random American lawyer getting to decide whether people die from malaria based on their career and donation decisions. But we can’t wave a magic wand and change that at the drop of a hat. And choosing to focus only on efforts to create systemic change means not getting lifesaving medicine to a ton of people who need it right now. I wish critics engaged more deeply with those really hard tradeoffs, and that EAs did a better job of articulating them. Just trying to sidestep the conversation about earning to give really undersells the moral challenge and stakes we’re dealing with.
I think some of this is fair, but my basic instinct is that it doesn’t add a ton of value to the forum to have people writing very similar posts over and over.
Strong upvoted. This got me to ponder the thought experiment of imagining an EA community that assumed members and interested people were female by default. I do think that EA content would look somewhat different in that world, primarily in addressing questions about kids. I’d expect that advice and discussion about whether, when, and how to have and raise kids would be a very prominent topic. I might expect talent-contained EA orgs to try and differentiate themselves through perks like on-site childcare. I might also expect more weird and out there stuff that’s targeted at related questions, like maybe in that world, you’d see posts arguing that the most impactful career you can have is being a competent and value-aligned nanny for another EA.
Imagining that world gives me a sense of how the current world looks somewhat male by default, and where we might look to change that.
I think there’s a good chance this basic point is right, but I’m not sure your takeaway from the Samotsvety forecast is correct? I think the 3-100 hours lost in expectation is based on the current information about risk. The Samotsvety forecast is that conditional on a nuclear weapon being used in Ukraine, there is a ~2% chance of London being nuked. I think the mean estimate for expected hours of life loss if one stays in London in that case is ~2000. That’s a substantial number of lost hours, and I can see it being rational to get to a safer location if those are the stakes.
Hmm interesting, I got 2000 by just setting rusiaUsesNuclearWeaponsInUkraine to 1 in the squiggle model. Looking at it further, the mean moves around between runs if I just use 1000 samples. Updating to 1000000, it seems to converge on 1700.
I agree that this is a place where forecast aggregation adds a lot of challenges.
That makes sense to me, I agree that’s a good relevant comparison.
Strong upvoted, I found this very interesting and I expect that quite a few people will find it helpful.
I don’t know if it’s been posted here before, but Scott Alexander has a detailed writeup on depression treatment that people may also find useful, including information on the order he often has his patients try medications in.
I made an attempt to estimate the cost-effectiveness of replicating research on Deworming in a previous post. There’s especially large uncertainty in the Deworming’s effect size, so I doubt you’d get as big an effect for SMC. But I think a similar Bayesian modeling approach could for this!
- 28 Nov 2022 13:47 UTC; 1 point) 's comment on The Case for Funding New Long-Term Randomized Controlled Trials of Deworming by (
Yeah that’s a cool idea to have an org that specifically focuses on replication work. I think that if you fleshed out the modeling done here, you could pretty confidently show funders that it would be a cost-effective use of money to do this more widely.