In your post, I think your concerns are in two categories:
Issue A. Not tracking the effects of recipients (or more likely, initially trying to track but not finding no positive statistical effects and dropping data collection).
Indeed, AMF had a plan to monitor malaria case rates before and after distributions to prove their effectiveness. However, when they actually collected the data they concluded the data was of poor quality and so abandoned this plan...I find this very worrying. Maybe the data was of poor quality, but that is a reason for working harder in this area rather than abandoning it altogether. In general, if we only have poor quality data about malaria in a region, doesn’t that mean we do not know how effective a bednet distribution will be?
Issue B. At the country level (not monitoring recipients of AMF nets but malaria levels in countries), there is no/limited/mixed evidence for malaria reduction:
Taking a step back from the Against Malaria Foundation to look at the malaria problem more generally, there is mixed evidence that bed net distributions reduce malaria case rates. GiveWell has a macro review of the evidence which shows at the nation-level you cannot demonstrate any impact from all malaria control initiatives.
...Malaria rates in Benin, DRC, Ghana, Mali & Sierra Leone increased as net coverage increased, which is more evidence that the malaria data being used is not great. In central Africa malaria was trending downwards before bednet coverage was scaled up, further muddying the waters when trying to measure impact.
“Available data and studies appear to show some cases of apparent malaria control success, and also seem to indicate that the overall burden of malaria in Africa is more likely to be falling than rising. However, in most cases it is difficult to link changes in the burden of malaria to particular malaria control measures, or to malaria control in general, and the data remains quite limited and incomplete, such that we cannot confidently say that the burden of malaria has been falling on average.”
What you wrote and the reasoning is a complete and well reasoned line of thought from careful study of the AMF website.
However, this is not sufficient evidence for strong updates against AMF.
For me, it’s not even enough evidence that would cause me to investigate this issue further.
The root issue/crux is that the “causal inference”/”causal identification” or the information you can get from statistics you collected here is very low, and far from a model of impact or finding the Truth.
Some perspectives:
Issue A: For the first issue, where tracking recipients was ineffective (or as you suggest and I would also find plausible, they found no statistical effect and then data collection was dropped), I don’t know more than what you wrote, but finding no effects is plausible, even common, in highly successful interventions.
The statistical power may be very low. To get intuition for this, remember that a life saved costs $5000 in expectation and a bednet costs ~$2. In some real statistical sense, you literally need thousands of bednets to get an “observation” of a death or life saved. So you may need many, hundreds of thousands, or really millions of bednets to get enough observations for statistical power. But that’s just one layer of the difficulty and assumes perfectly balanced groups of treatment/control, demographics—you may need an order of magnitude more observations to do a proper observational study. Even generously, that’s a large fraction of all the bednets distributed in a year. From this problem alone, my prior would be to find no effect and also I would expect it to impose large operational costs that many donors would find unacceptable (I would).
The above implies a pretty clean, controlled test environment. E.g. two villages, one with bednets or one without, or really, two children in the same household, where one gets a treatment with one bednet and one without. This isn’t going to happen in the actual program and the effects are wildly different if not controlled.
Examples of random stories that’s going to mess up inference: a principled bednet distributor might give nets to poorer families, families that have sicker children and adults. Since everyone probably knows bednets are effective, wealthier families might get their own (which is good, AMF can give to the really poor), and these wealthy families might get more premium bednets and treatments (e.g. $10 instead of $2), so you don’t have a comparison group.
There’s even more pathological stories that mess up your inference: if you were a skilled implementor, working in this program on the group for many years, and you know you only have 100 bednets for 1000 people (maybe because the EAs got captured by the AI/futurist memes which diverted all the billionaire funds), it’s possible that you know, working on the ground, who gets the bednets is very important, like by a factor of 2 or 4. That is, if you give the right bednets to the right people you can increase cost effectiveness by 200-400%. By definition, this skill isn’t legible by some survey. So your very skill in giving bednets to the worst families, more afflicted by malaria, means that someone looking at the data will go “hey when we collect data for recipients of malaria nets, these families don’t look better worse off, let’s cancel this.”
Issue B: Cross country effects
The cross country sort of examinations suffers from all of the issues above, but is even weaker. For example, climate trends, poverty, institutional change are all going forces that will mess up results, and even this is description is a crude gesture at the realities of what is going on. What about other ways malaria can be contracted, besides sleeping in an bednet eligible bed?
These confounding effects mean that nation studies might never find an effect at all, even with very effective interventions. One new major crux is how much coverage of bednets there is in a country. Again, I don’t know anything about this more than reading your post, but if bednet distribution is 10% or even 30%, that is may not be enough to find an effect even if bednets were 100% effective.
That’s assuming that bednets were 100% effective. If bednets were even 1% effective (which by the way still makes them completely worth it and is consistent with the CEA of $5000 per life for ~$2 bednet), you may never be able to find an effect from an observational study.
Basically, cross country regressions aren’t good without being embedded with a strong model/context and this domain is sort of an “also ran” in economics.
Again, what you wrote and the reasoning is a complete and well reasoned line of thought from careful study of the AMF website.
You said:
we may be ignoring evidence that the world is more complex than we thought, something which effective altruists ignore at their peril.
Like, to be clear, let’s flip the evidence another way around:
Imagine someone who came to you for money for a new project or new business. This person didn’t understand the intervention, didn’t understand the country or people. All they present is an argument they read from papers, with just country level observational data, or data from someone who they didn’t know, who collected some data giving nets to families.
If you were being asked to give money to this person, this information is not enough to trust them, (and it may even be wise to distrust them if this was the only argument they were able to present.)
I agree with your central point that it’s very hard to use statistics to prove anything. In particular, you need a huge amount of data and there is lots of noise as the real world is not a clean & tidy place.
For bednets, we do have a huge amount of data. The World Malaria Report 2011, used in GiveWell’s macro review, says 145 million bednets were distrubuted in sub-Saharan Africa in 2010 alone [1]. That’s theoretical coverage for around 30% of the population [2]. This is a massive level of intervention.
For malaria, we also have lots of noise. The same World Malaria Report puts annual deaths in the range 537,000-907,000. That’s a pretty wide confidence interval. The Lancet gives 929,000-1,685,000 deaths per year. That’s a wider range than the first and the two ranges don’t even overlap. [3]
I understand GiveWell’s position (and yours?) to be “There is so much noise, the real world observations don’t really tell you anything. You have to focus on the Randomised Control Trials as proving the concept & monitor AMF to ensure competent delivery”. This might well be right. However, it is then unclear what information could ever be supplied to change GiveWell’s mind. How many bednets would we have to distribute with no evidence of impact before we revisit the recommendation? A billion? 100 billion? Put another way, imagine 10 years from now we find out that bednet distributions had much less impact than we expected. What would be the evidence that demonstrates this, and where might we look now for clues that such evidence is emerging?
More generally, if an intervention can’t stand out from the statistical noise then I’m not sure it passes my personal threshold for a top intervention. As a minimum this means the scale of the problem, and so the scale of our impact, is not well understood. An intervention that can’t stand out from statistical noise has no way of providing feedback to providers on when it is going well or badly, and so has no way to avoid mistakes and no way to improve. Finally, there’s also a psychological element about certainty of impact that will be a big deal to some donors, but that’s a topic for another day.
In your post, I think your concerns are in two categories:
Issue A. Not tracking the effects of recipients (or more likely, initially trying to track but not finding no positive statistical effects and dropping data collection).
Issue B. At the country level (not monitoring recipients of AMF nets but malaria levels in countries), there is no/limited/mixed evidence for malaria reduction:
What you wrote and the reasoning is a complete and well reasoned line of thought from careful study of the AMF website.
However, this is not sufficient evidence for strong updates against AMF.
For me, it’s not even enough evidence that would cause me to investigate this issue further.
The root issue/crux is that the “causal inference”/”causal identification” or the information you can get from statistics you collected here is very low, and far from a model of impact or finding the Truth.
Some perspectives:
Issue A: For the first issue, where tracking recipients was ineffective (or as you suggest and I would also find plausible, they found no statistical effect and then data collection was dropped), I don’t know more than what you wrote, but finding no effects is plausible, even common, in highly successful interventions.
The statistical power may be very low. To get intuition for this, remember that a life saved costs $5000 in expectation and a bednet costs ~$2. In some real statistical sense, you literally need thousands of bednets to get an “observation” of a death or life saved. So you may need many, hundreds of thousands, or really millions of bednets to get enough observations for statistical power. But that’s just one layer of the difficulty and assumes perfectly balanced groups of treatment/control, demographics—you may need an order of magnitude more observations to do a proper observational study. Even generously, that’s a large fraction of all the bednets distributed in a year. From this problem alone, my prior would be to find no effect and also I would expect it to impose large operational costs that many donors would find unacceptable (I would).
The above implies a pretty clean, controlled test environment. E.g. two villages, one with bednets or one without, or really, two children in the same household, where one gets a treatment with one bednet and one without. This isn’t going to happen in the actual program and the effects are wildly different if not controlled.
Examples of random stories that’s going to mess up inference: a principled bednet distributor might give nets to poorer families, families that have sicker children and adults. Since everyone probably knows bednets are effective, wealthier families might get their own (which is good, AMF can give to the really poor), and these wealthy families might get more premium bednets and treatments (e.g. $10 instead of $2), so you don’t have a comparison group.
There’s even more pathological stories that mess up your inference: if you were a skilled implementor, working in this program on the group for many years, and you know you only have 100 bednets for 1000 people (maybe because the EAs got captured by the AI/futurist memes which diverted all the billionaire funds), it’s possible that you know, working on the ground, who gets the bednets is very important, like by a factor of 2 or 4. That is, if you give the right bednets to the right people you can increase cost effectiveness by 200-400%. By definition, this skill isn’t legible by some survey. So your very skill in giving bednets to the worst families, more afflicted by malaria, means that someone looking at the data will go “hey when we collect data for recipients of malaria nets, these families don’t look better worse off, let’s cancel this.”
Issue B: Cross country effects
The cross country sort of examinations suffers from all of the issues above, but is even weaker. For example, climate trends, poverty, institutional change are all going forces that will mess up results, and even this is description is a crude gesture at the realities of what is going on. What about other ways malaria can be contracted, besides sleeping in an bednet eligible bed?
These confounding effects mean that nation studies might never find an effect at all, even with very effective interventions. One new major crux is how much coverage of bednets there is in a country. Again, I don’t know anything about this more than reading your post, but if bednet distribution is 10% or even 30%, that is may not be enough to find an effect even if bednets were 100% effective.
That’s assuming that bednets were 100% effective. If bednets were even 1% effective (which by the way still makes them completely worth it and is consistent with the CEA of $5000 per life for ~$2 bednet), you may never be able to find an effect from an observational study.
Basically, cross country regressions aren’t good without being embedded with a strong model/context and this domain is sort of an “also ran” in economics.
Again, what you wrote and the reasoning is a complete and well reasoned line of thought from careful study of the AMF website.
You said:
Like, to be clear, let’s flip the evidence another way around:
Imagine someone who came to you for money for a new project or new business. This person didn’t understand the intervention, didn’t understand the country or people. All they present is an argument they read from papers, with just country level observational data, or data from someone who they didn’t know, who collected some data giving nets to families.
If you were being asked to give money to this person, this information is not enough to trust them, (and it may even be wise to distrust them if this was the only argument they were able to present.)
Thanks Charles for your detailed response.
I agree with your central point that it’s very hard to use statistics to prove anything. In particular, you need a huge amount of data and there is lots of noise as the real world is not a clean & tidy place.
For bednets, we do have a huge amount of data. The World Malaria Report 2011, used in GiveWell’s macro review, says 145 million bednets were distrubuted in sub-Saharan Africa in 2010 alone [1]. That’s theoretical coverage for around 30% of the population [2]. This is a massive level of intervention.
For malaria, we also have lots of noise. The same World Malaria Report puts annual deaths in the range 537,000-907,000. That’s a pretty wide confidence interval. The Lancet gives 929,000-1,685,000 deaths per year. That’s a wider range than the first and the two ranges don’t even overlap. [3]
I understand GiveWell’s position (and yours?) to be “There is so much noise, the real world observations don’t really tell you anything. You have to focus on the Randomised Control Trials as proving the concept & monitor AMF to ensure competent delivery”. This might well be right. However, it is then unclear what information could ever be supplied to change GiveWell’s mind. How many bednets would we have to distribute with no evidence of impact before we revisit the recommendation? A billion? 100 billion? Put another way, imagine 10 years from now we find out that bednet distributions had much less impact than we expected. What would be the evidence that demonstrates this, and where might we look now for clues that such evidence is emerging?
More generally, if an intervention can’t stand out from the statistical noise then I’m not sure it passes my personal threshold for a top intervention. As a minimum this means the scale of the problem, and so the scale of our impact, is not well understood. An intervention that can’t stand out from statistical noise has no way of providing feedback to providers on when it is going well or badly, and so has no way to avoid mistakes and no way to improve. Finally, there’s also a psychological element about certainty of impact that will be a big deal to some donors, but that’s a topic for another day.
[1] Source: https://www.who.int/malaria/world_malaria_report_2011/WMR2011_factsheet.pdf
[2] Based on GiveWell’s assumption of 1.8 people covered per net & a population of 869m, as per here: https://www.statista.com/statistics/805605/total-population-sub-saharan-africa/
[3] Source: https://blog.givewell.org/2013/01/23/guest-post-from-david-barry-about-deworming-cost-effectiveness/