The data was noisy, so they simply stopped checking whether AMF’s bed net distributions do anything about malaria.
This is an unfair gotcha. What would the point of this be? Of course the data is noisy. Not only is it noisy, it is irrelevant—if it was not, there would never be any need to have run randomized trials in the first place, you would simply dump the bed nets where convenient and check malaria rates. The whole point of randomized trials is realizing that correlational data is extremely weak and cannot give reliable causal inferences. (I can certainly imagine reasons why malaria rates might go up in regions that AMF does bed net distribution in, just as I can imagine reasons why death rates might be greater or increase over time in patients prescribed new drug X as compared to patients not prescribed X...) If they did the followups and malaria rates held stable or increased, you would not then believe that the bednets do not work; if it takes randomized trials to justify spending on bednets, it cannot then take only surveys to justify not spending on bed nets, as the causal question is identical. Since it does not affect any decisions, it is not important to measure. Or, if it did, what you ought to be criticizing Givewell & AMF for, as well as everyone else, is ever advocating & spending resources on highly unethical randomized trials, rather than criticizing them for not doing some followup surveys.
(A reasonable critique might be that they are not examining whether the intervention—which has been identified as causally effective and passing a cost-benefit—is being correctly delivered, the right people getting the nets, and using the nets. But as far as I know, they do track that...)
If they did the followups and malaria rates held stable or increased, you would not then believe that the bednets do not work; if it takes randomized trials to justify spending on bednets, it cannot then take only surveys to justify not spending on bed nets, as the causal question is identical.
It’s hard for me to believe that the effect of bednets is large enough to show an effect in RCTs, but not large enough to show up more often than not as a result of mass distribution of bednets. If absence of this evidence really isn’t strong evidence of no effect, it should be possible to show it with specific numbers and not just handwaving about noise. And I’d expect that to be mentioned in the top-level summary on bed net interventions, not buried in a supplemental page.
It’s hard for me to believe that the effect of bednets is large enough to show an effect in RCTs, but not large enough to show up more often than not as a result of mass distribution of bednets.
You may find it hard to believe, but nevertheless, that is the fact: correlational results can easily be several times the true causal effect, in either direction. If you really want numbers, see, for example, the papers & meta-analyses I’ve compiled in https://www.gwern.net/Correlation on comparing correlations with the causal estimates from simultaneous or later conducted randomized experiments, which have plenty of numbers. Hence, it is easy for a causal effect to be swamped by any time trends or other correlates, and a followup correlation cannot and should not override credible causal results. This is why we need RCTs in the first place. Followups can do useful things like measure whether the implementation is being delivered, or can provide correlational data on things not covered by the original randomized experiments (like unconsidered side effects), but not retry the original case with double jeopardy.
This sort of framing leads to publication bias. We want double jeopardy! This isn’t a criminal trial, where the coercive power of a massive state is being pitted against an individual’s limited ability to defend themselves. This is an intervention people are spending loads of money on, and it’s entirely appropriate to continue checking whether the intervention works as well as we thought.
As I understand the linked page, it’s mostly about retroactive rather than prospective observational studies, and usually for individual rather than population-level interventions. A plan to initiate mass bednet distribution on a national scale is pretty substantially different from that, and doesn’t suffer from the same kind of confounding.
Of course it’s mathematically possible that the data is so noisy relative to the effect size of the supposedly most cost-effective global health intervention out there, that we shouldn’t expect the impact of the intervention to show up. But, I haven’t seen evidence that anyone at GiveWell actually did the relevant calculation to check whether this was the case for bednet distributions.
This is an unfair gotcha. What would the point of this be? Of course the data is noisy. Not only is it noisy, it is irrelevant—if it was not, there would never be any need to have run randomized trials in the first place, you would simply dump the bed nets where convenient and check malaria rates. The whole point of randomized trials is realizing that correlational data is extremely weak and cannot give reliable causal inferences. (I can certainly imagine reasons why malaria rates might go up in regions that AMF does bed net distribution in, just as I can imagine reasons why death rates might be greater or increase over time in patients prescribed new drug X as compared to patients not prescribed X...) If they did the followups and malaria rates held stable or increased, you would not then believe that the bednets do not work; if it takes randomized trials to justify spending on bednets, it cannot then take only surveys to justify not spending on bed nets, as the causal question is identical. Since it does not affect any decisions, it is not important to measure. Or, if it did, what you ought to be criticizing Givewell & AMF for, as well as everyone else, is ever advocating & spending resources on highly unethical randomized trials, rather than criticizing them for not doing some followup surveys.
(A reasonable critique might be that they are not examining whether the intervention—which has been identified as causally effective and passing a cost-benefit—is being correctly delivered, the right people getting the nets, and using the nets. But as far as I know, they do track that...)
It’s hard for me to believe that the effect of bednets is large enough to show an effect in RCTs, but not large enough to show up more often than not as a result of mass distribution of bednets. If absence of this evidence really isn’t strong evidence of no effect, it should be possible to show it with specific numbers and not just handwaving about noise. And I’d expect that to be mentioned in the top-level summary on bed net interventions, not buried in a supplemental page.
You may find it hard to believe, but nevertheless, that is the fact: correlational results can easily be several times the true causal effect, in either direction. If you really want numbers, see, for example, the papers & meta-analyses I’ve compiled in https://www.gwern.net/Correlation on comparing correlations with the causal estimates from simultaneous or later conducted randomized experiments, which have plenty of numbers. Hence, it is easy for a causal effect to be swamped by any time trends or other correlates, and a followup correlation cannot and should not override credible causal results. This is why we need RCTs in the first place. Followups can do useful things like measure whether the implementation is being delivered, or can provide correlational data on things not covered by the original randomized experiments (like unconsidered side effects), but not retry the original case with double jeopardy.
This sort of framing leads to publication bias. We want double jeopardy! This isn’t a criminal trial, where the coercive power of a massive state is being pitted against an individual’s limited ability to defend themselves. This is an intervention people are spending loads of money on, and it’s entirely appropriate to continue checking whether the intervention works as well as we thought.
As I understand the linked page, it’s mostly about retroactive rather than prospective observational studies, and usually for individual rather than population-level interventions. A plan to initiate mass bednet distribution on a national scale is pretty substantially different from that, and doesn’t suffer from the same kind of confounding.
Of course it’s mathematically possible that the data is so noisy relative to the effect size of the supposedly most cost-effective global health intervention out there, that we shouldn’t expect the impact of the intervention to show up. But, I haven’t seen evidence that anyone at GiveWell actually did the relevant calculation to check whether this was the case for bednet distributions.