I’m considering writing about “RCTs in NGOs: When (and when not) to implement them”
The post would explore:
Why many new NGOs feel pressured to conduct RCTs primarily due to funder / EA community requirements.
The hidden costs and limitations of RCTs: high expenses, 80% statistical power meaning 20% chance of missing real effects, wide confidence intervals
Why RCTs might not be the best tool for early-stage organizations focused on iterative learning
How academic incentives in RCT design/implementation don’t always align with NGO needs
Alternative evidence-gathering approaches that might be more appropriate for different organizational stages
Suggestions for both funders and NGOs on how to think about evidence generation
This comes from my conversations with several NGO founders. I believe the EA community could benefit from a more nuanced discussion about evidence hierarchies and when different types of evaluation make sense.
This sounds like it could be interesting, though I’d also consider if some of the points are fundamentally to do with RCTs. E.g., “80% statistical power meaning 20% chance of missing real effects”—nothing inherently says an RCT should only be powered at 80% or that the approach should even be one of null hypothesis significance testing.
Good point. Good to clarify that the 80% power standard comes from academic norms, not an inherent RCT requirement. NGOs should chose their statistical thresholds based on their specific needs, budget, and risk tolerance.
I would welcome a blog post about RCTs, and if you decide to write one, I hope you consider the perspective below.
As far as I can tell ~0% of nonprofits are interested in rigorously studying their programs in any way, RCTs or otherwise, and I can’t help but suspect that this is largely because mostly when we do run RCTs we find that these cherished programs have ~no effect. It’s not at all surprising to me that most charities that conduct RCTs feel pressured to do so by donors; but on the other hand basically all charity activities ultimately flow from donor preferences, because donors are the ones with most of the power.
Living Goods is one interesting example, where they ran an RCT because a donor demanded it, got an unexpected (positive) result, and basically pivoted the whole charity based on that. I view that as a success story.
I am certainly not claiming that RCTs are appropriate for all kinds of programs, or some kind of silver bullet. It’s more like, if you ask charities “would you like more or less accountability for results”, the answer is almost always going to be, “less, thanks”.
This is a great point. There’s an important distinction though between evaluating new programs led by early-stage NGOs (like those coming from Charity Entrepreneurship) versus established programs directing millions in funding. I think RCTs make sense for the latter group.
As far as I can tell ~0% of nonprofits are interested in rigorously studying their programs in any way, RCTs or otherwise
There’s also a difference between the typical NGOs and EA-founded ones. In my experience, EA founders actively want to rigorously evaluate their programs, they don’t want to work for ineffective interventions.
Would also love this. I think a useful contrast will be A/B testing in big tech firms. My amateur understanding is big tech firms can and should run hundreds of “RCTs” because:
No need to acquire subjects.
Minimal disruption to business since you only need to siphon off a minuscule portion of your user base.
Tech experiments can finish in days while field experiments need at least a few weeks and sometimes years.
If we assume treatments are heavy tailed, then a big tech firm running hundreds of A/B tests is more likely to learn of a weird trick that grows the business when compared to a NGO who may only get one shot.
Yes, exactly. The marginal cost of an A/B test in tech is incredibly low, while for NGOs an RCT represents a significant portion of their budget and operational capacity.
This difference in costs explains why tech can use A/B tests for iterative learning, trying hundreds of small variations, while NGOs need to be much more selective about what to test.
And despite A/B testing being nearly free, most decisions at big tech firms aren’t driven by experimental evidence.
I’m considering writing about “RCTs in NGOs: When (and when not) to implement them”
The post would explore:
Why many new NGOs feel pressured to conduct RCTs primarily due to funder / EA community requirements.
The hidden costs and limitations of RCTs: high expenses, 80% statistical power meaning 20% chance of missing real effects, wide confidence intervals
Why RCTs might not be the best tool for early-stage organizations focused on iterative learning
How academic incentives in RCT design/implementation don’t always align with NGO needs
Alternative evidence-gathering approaches that might be more appropriate for different organizational stages
Suggestions for both funders and NGOs on how to think about evidence generation
This comes from my conversations with several NGO founders. I believe the EA community could benefit from a more nuanced discussion about evidence hierarchies and when different types of evaluation make sense.
I would love to see this. Not a take I’ve seen before (that I remember).
This sounds like it could be interesting, though I’d also consider if some of the points are fundamentally to do with RCTs. E.g., “80% statistical power meaning 20% chance of missing real effects”—nothing inherently says an RCT should only be powered at 80% or that the approach should even be one of null hypothesis significance testing.
Good point. Good to clarify that the 80% power standard comes from academic norms, not an inherent RCT requirement. NGOs should chose their statistical thresholds based on their specific needs, budget, and risk tolerance.
I would welcome a blog post about RCTs, and if you decide to write one, I hope you consider the perspective below.
As far as I can tell ~0% of nonprofits are interested in rigorously studying their programs in any way, RCTs or otherwise, and I can’t help but suspect that this is largely because mostly when we do run RCTs we find that these cherished programs have ~no effect. It’s not at all surprising to me that most charities that conduct RCTs feel pressured to do so by donors; but on the other hand basically all charity activities ultimately flow from donor preferences, because donors are the ones with most of the power.
Living Goods is one interesting example, where they ran an RCT because a donor demanded it, got an unexpected (positive) result, and basically pivoted the whole charity based on that. I view that as a success story.
I am certainly not claiming that RCTs are appropriate for all kinds of programs, or some kind of silver bullet. It’s more like, if you ask charities “would you like more or less accountability for results”, the answer is almost always going to be, “less, thanks”.
This is a great point. There’s an important distinction though between evaluating new programs led by early-stage NGOs (like those coming from Charity Entrepreneurship) versus established programs directing millions in funding. I think RCTs make sense for the latter group.
There’s also a difference between the typical NGOs and EA-founded ones. In my experience, EA founders actively want to rigorously evaluate their programs, they don’t want to work for ineffective interventions.
Would also love this. I think a useful contrast will be A/B testing in big tech firms. My amateur understanding is big tech firms can and should run hundreds of “RCTs” because:
No need to acquire subjects.
Minimal disruption to business since you only need to siphon off a minuscule portion of your user base.
Tech experiments can finish in days while field experiments need at least a few weeks and sometimes years.
If we assume treatments are heavy tailed, then a big tech firm running hundreds of A/B tests is more likely to learn of a weird trick that grows the business when compared to a NGO who may only get one shot.
Yes, exactly. The marginal cost of an A/B test in tech is incredibly low, while for NGOs an RCT represents a significant portion of their budget and operational capacity.
This difference in costs explains why tech can use A/B tests for iterative learning, trying hundreds of small variations, while NGOs need to be much more selective about what to test.
And despite A/B testing being nearly free, most decisions at big tech firms aren’t driven by experimental evidence.