I work for a nonprofit focused on building $1B+ philanthropic initiatives/megaprojects. I previously ran some RCTs in East Africa.
Rory Fenton
Thanks Chris, that’s a cool idea. I will give it a go (in a few days, I have an EAG to recover from...)
One thing I should note is that other comments on this post are suggesting this is well known and applied, which doesn’t knock the idea but would reduce the value of doing more promotion. Conversely, my super quick, low-N look into cash RCTs (in my reply below to David Reinstein) suggests it is not so common. Since the approach you suggest would partly involve listing a bunch of RCTs and their treatment/control sizes (so we can see whether they are cost-optimised), it could also serve as a nice check of just how often this adjustment is/isn’t applied in RCTs
For bio, that’s way outside of my field, I defer to Joshua’s comment here on limited participant numbers, which makes sense. Though in a situation like early COVID vaccine trials, where perhaps you had limited treatment doses and potentially lots of willing volunteers, perhaps it would be more applicable? I guess pharma companies are heavily incentivised to optimise trial costs tho, if they don’t do it there’ll be a reason!
As a quick data point I just checked the 6 RCTs GiveDirectly list on their website. I figure cash is pretty expensive so it’s the kind of intervention where this makes sense.
It looks like most cash studies, certainly with just 1 treatment arm, aren’t optimising for cost:
Study Control Treatment The short-term impact of unconditional cash transfers to the poor: experimental evidence from Kenya 432 503 BENCHMARKING A CHILD NUTRITION PROGRAM
AGAINST CASH: EVIDENCE FROM RWANDA74 villages 74 villages (nutrition program)
100 (cash)Cash crop: evaluating large cash transfers to coffee
farming communities in Uganda1894 1894 Using Household Grants to Benchmark the Cost Effectiveness of a
USAID Workforce Readiness Program488 485 NGO program
762 cash
203 cash + NGOGeneral equilibrium effects of cash transfers:
experimental evidence from Kenya325 villages 328 villages Effects of a Universal Basic Income during the pandemic 100 villages 44 longterm UBI
80 shortterm UBI
71 lump sumSuggests either 1) there’s some value in sharing this idea more or 2) there’s a good reason these economists aren’t making this adjustment. Someone on Twitter suggested “problems caused by unbalanced samples and heteroskedasticity” but that was beyond my poor epidemiologist’s understanding and they didn’t clarify further.
Hi Christian—agreed but my argument here is really for fewer treatment participants, not smaller treatment doses
Ah, that’s helpful data. My experience in RCTs mostly comes from One Acre Fund, where we ran lots of RCTs internally on experimental programs, or just A/B tests, but that might not be very typical!
Hey Aidan—that’s a good point. I think it will probably apply to different extents for different cases, but probably not to all cases. Some scenarios I can imagine:
A charity uses its own funds to run an RCT of a program it already runs at scale:
In this case, you are right that treatment is happening “anyway” and in a sense the $ saved in having a smaller treatment group will just end up being spent on more “treatment”, just not in the RCT.
Even in this case I think the charity would prefer to fund its intervention in a non-RCT context: providing an intervention in an RCT context is inherently costlier than doing it under more normal circumstances, for example if you are delivering assets, your trucks have to drive past control villages to get to treatment ones, increasing delivery costs.
That’s pretty small though, I agree that otherwise the intervention is basically “already happening” and the effective savings are smaller than implied in my post
That said, if the charity has good reason to think their intervention works and so spending more on treatment is “good”, the value of the RCT in the first place seems lower to me
2) A charity uses its own funds to run an RCT of a trial program it doesn’t operate at scale:
In this case, the charity is running the RCT because it isn’t sure the intervention is a good one
Reducing the RCT treatment group frees up funds for the charity to spend on the programs that it does know work, with overall higher EV
3) A donor wants to fund RCTs to generate more evidence:
The donor is funding the RCT because they aren’t sure the intervention works
Keeping RCT costs lower means they can fund more RCTs, or more proven interventions
4) A charity applies for donor funds for an RCT of a new program:
In this case, the cheaper study is more likely to get funded, so the larger control/smaller treatment is a better option for the charity
Overall, I think cases 2/3/4 benefit from the cheaper study. Scenario 1 seems more like what you have in mind and is a good point, I just think there will be enough scenarios where the cheaper trial is useful, and in those cases the charity might consider this treatment/control optimisation.
Make RCTs cheaper: smaller treatment, bigger control groups
Hi Nick—thanks for the thoughtful post!
I think cash arms make a lot of intuitive sense, my main pushback would be a practical one: cash and intervention X will likely have different impact timelines (e.g. psychotherapy takes a few months to work but delivers sustained benefits, perhaps cash has massive welfare benefits immediately but they diminish quickly over time). This makes the timing of your endline study super important, to the point that when you run the endline is really what determines which intervention comes out on top, rather than the actual differences in the interventions. I have a post on this here with a bit more detail.
Your point on the ethics here is an interesting one, I agree that medical ethics might suggest “control” groups should still receive some kind of intervention. Part of the distinction could be that medical trials give sick patients placebos, which control patients accurately believe might be medicine, which feels perhaps deceptive, whereas control groups in development RCTs are well aware that they aren’t receiving any intervention (i.e. they know they haven’t received psychotherapy or cash), which feels more honest?
The downside is this changes the research question from “What is the impact of X?” to “How much better is X than cash”, and there are lots of cases were the counterfactual really would be inaction. A way around this might be to give control groups an intervention that we know to be “good” but that doesn’t affect the specific outcome of interest. e.g. I’ve worked on an agriculture RCT that gave control groups water/sanitation products that had no plausible way to affect their maize yield but at least meant they weren’t losing out. This might not apply to broad measures like WELBYs
I’m honestly not sure about the ethical side here though, interested to explore further.
I really appreciated this short, clear post. Thank you!
Hey!
LEEP is indeed working on this—I mentioned them in my original comment but I have no connection to them. I was thinking of a campaign on the $100M/year scale, comparable to Bloomberg’s work on tobacco. That could definitely be LEEP, my sense (from quick Googling and based purely on the small size of their reported team) is they would have to grow a lot to take on that kind of funding, so there could also be a place for a large existing advocacy org pivoting to lead elimination. I have not at all thought through the implementation side of things here.
How does the time and monetary cost of buying these products compare to the time and monetary cost of giving cash?
The total value of the bundle ($120) includes all staffing (modelled at scale with 100k recipients), including procurement staff, shipping, etc. This trial was a part of a very large nonprofit, which has very accurate costs for those kinds of things.
But obviously the researchers didn’t know beforehand that the programs would fail. So this isn’t an argument against cash benchmarking.
That’s true, I don’t think I made my point well/clearly with that paragraph. I was trying to say something like, “The Vox article points to how useful the cash comparison study had been, but the usefulness (learning that USAID shouldn’t fund the program) wasn’t actually due to the cash arm”. That wasn’t really an important point and didn’t add much to the post.
I really like the idea of asking people what assets they would like. We did do a version of this to determine what products to offer, using qualitative interviews where people ranked ~30 products in order of preference. This caused us to add more chickens and only offer maize inputs to people who already grew maize. But participants had to choose from a narrow list of products (those with RCT evidence that we could procure), I’d love have given them freedom to suggest anything.
We did also consider letting households determine which products they received within a fixed budget (rather than every household getting the same thing) but the logistics got too difficult. Interestingly, people had zero interest in deworming pills, oral hydration salts or Vitamin A supplements as they not were aware of needing them—I could see tensions arising between households not valuing these kinds of products and donors wanting to give them based on cost-effectiveness models. This “what do you want” approach might work best with products that recipients already have reasonably accurate mental models of, or that can be easily and accurately explained.
At a very basic intuitive level, hearing “participants indicated strong preference for receiving our assets to receiving twice as much cash” feels more persuasive than comparing some measured outcome between the two groups (at least for this kind of asset transfer program where it seems reasonable to defer to participants about what they need/want)
Very interesting suggestion: we did try something like this but didn’t consider it as an outcome measure and so didn’t put proper thought/resources into it. We asked people, “How much would you be willing to pay for product X?”, with the goal of saying something like “Participants valued our $120 bundle at $200″ but unfortunately the question generally caused confusion: participants would think we were asking them to pay for the product they’d received for free and either understandably got upset or just tried lowballing us with their answer, expecting it to be a negotiation.
If we had thought of it in advance, perhaps this would have worked as a way to generate real value estimates:
We randomise participants into groups
The first group is offered either our bundle (worth $120) or $120 cash
If >50% take the bundle, we then adjust our cash offer upwards and offer it to another group (or the opposite if <50% take the bundle)
We repeat this process of adjusting our price offer until we have ~50% of participants accepting our cash offer: that equilibrium price is then the “value” of the bundle
I’m not sure what the sample size implications would be but a big advantage would be the timeline: this could be done in a few weeks, not years
I bet proper economists have a way to do this but it’s interesting to brainstorm on
I can see a few issues with this:
We still have to assume people “know what’s best for them”: that’s a really patronising statement, but as above, people need reliable mental models to make these decisions, and won’t have that for novel products
We need to have donors who will accept this measure: the data only really matters when it informs decisions
I’m very open to other thoughts here, I really like the concept of a cash benchmark and would love to find a way to resurrect the idea.
Thanks for the interesting reflections.
I agree that longer term data collection can help here in principle, if the initial differences in impact timing wash out over the years. One reason we didn’t do that was statistical power: we expected our impact to decrease over time, so longer term surveys would require a larger sample to detect this smaller impact. I think we were powered to measure something like a $12/month difference in household consumption. I think I’d still call a program that cost $120 and increased consumption by, say, $3/month 10 years later a “success”, but cutting the detectable effect by 1⁄4 takes 16x the sample size. Throw in a cash arm, and that’s a 32x bigger sample (64,000 households in our case). We could get a decent sense of whether our program had worked vs control over a shorter (smaller sample) timeline, and so we went with that.
If the concern is about which measure of impact to use—you cite issues with people remembering their spending—then the (I think) obvious response is to measure individuals’ subjective wellbeing, eg 0-10 “how satisfied are you with your life nowadays?” which allows them to integrate all the background information of their life when answering the question.
The subjective wellbeing idea is interesting (and I will read your study, I only skimmed for now but I was impressed). It isn’t obvious to me that subjective wellbeing isn’t also just a snapshot of a person’s welfare and so prone to similar issues to consumption e.g. you might see immediate subjective welfare gains in the cash arm but the program arm won’t start feeling better until they harvest their crops. I’m not really familiar with the measure, I might be missing something there.
I agree with you that you don’t need a cash arm to prove your alternative didn’t work. But, if you already knew in advance your alternative would be worse, then it raises questions as to why you’d do it at all.
Agreed—I’m sure they expected their program to work, I just don’t think adding a cash arm really helped them determine if it did or not.
Against cash benchmarking for global development RCTs
Thanks for sharing!
My initial sense is that China’s method is focused on controlling rainfall, which might mitigate some of the effects of climate change (e.g. reduce drought in some areas, reduce hurricane strength) but not actually prevent it. The ideas I had in mind were more emergency approaches to actually stopping climate change either by rapidly removing carbon (e.g. algae in oceans) or reducing solar radiation absorbs on the Earth’s surface (making clouds/oceans more reflective, space mirrors).
Will all funding applications be made public? If so, is it possible for ask for specific application not to be public? No problem if actual funding will be publicized, I’m just wondering about the applications themselves. Thanks!
Eliminate all mosquito-borne viruses by permanently immunizing mosquitoes
Biorisk and Recovery from Catastrophe
Billions of people are at risk from mosquito-borne viruses, including the threat of new viruses emerging. Over a century of large-scale attempts to eradicate mosquitoes as virus vectors has changed little: there could be significant value in demonstrating large-scale, permanent vector control for both general deployment and rapid response to novel viruses. Recent research has shown that infecting mosquitoes with Wolbachia, a bacterium, out-competes viruses (including dengue, yellow fever and Zika), preventing the virus from replicating within the insect, essentially immunizing it. The bacterium passes to future generations by infecting mosquito eggs, allowing a small release of immunized mosquitoes to gradually and permanently immunize an entire population of mosquitoes. We are interested in proposals for taking this technology to massive scale, with a particular focus on rapid deployment in the case of novel mosquito-borne viruses.
Epistemic status: Wolbachia impact on dengue fever has been demonstrated in a large RCT and about 10 city-level pilots. Impact on other viruses only shown in labs. The approach is likely to protect against novel viruses but that has not been demonstrated.
Conflict of interest: I work for a small, new nonprofit focused on $B giving. I have had conversations with potential Wolbachia implementers to understand their work but have no direct commercial interest.
Campaign to eliminate lead globally
Economic Growth
Lead exposure limits IQ, takes over 1M lives every year and costs Africa alone $130B annually, 4% of GDP: an extraordinary limit on human potential. Most lead exposure is through paint in buildings and toys. The US banned lead paint in 1978 but 60% of countries still permit it. We would like to see ideas for a global policy campaign, perhaps similar to Bloomberg’s $1B tobacco advocacy campaign (estimated to have saved ~30M lives), to push for regulations and industry monitoring.
Epistemic status: The “prize” feels very large but I am not aware of proven interventions for lead regulations. 30 minutes of Googling suggests the only existing implementer (www.leadelimination.org) might be too small for this level of funding so there may not be many applicants.
Conflict of interest: I work for a small, new non-profit focused on $B giving. We are generally focused on projects with large, existing implementers so have not pursued lead elimination policy beyond initial light research
Pilot emergency geoengineering solutions for catastrophic climate change
Research That Can Help Us Improve
Toby Ord puts the risk of runaway climate change causing the extinction of humanity by 2100 at 1/1000, a staggering expected loss. Emergency solutions, such as seeding oceans with carbon-absorbing algae or creating more reflective clouds, may be our last chance to prevent catastrophic warming but are extraordinarily operationally complex and may have unforeseen negative side-effects. Governments are highly unlikely to invest in massive geoengineering solutions until the last minute, at which point they may be rushed in execution and cause significant collateral damage. We’d like to fund people who can:
Identify and pilot at large scale top geoengineering initiatives over the next 5-10 years to develop operational lessons. E.g. promote algae growth in a large, private lake, launch a small cluster of mirrors into space
Develop advanced supercomputer models, potentially with input from the above pilots, of the potential negative side-effects of geoengineering solutions
Identify and pilot harm-mitigation responses for geoengineering solutions
Epistemic status: there seems to be reasonable expert agreement on the kinds of geoengineering solutions that might work. I have no idea how much funding geoengineering pilots might need.
Conflict of interest: I work for a small, new nonprofit focused on $B giving. We are generally focused on projects that already have large implementers so have not pursued geoengineering beyond initial light research
Love the clarity of the post but I agree with Geoffrey that the $ impact/household seems extremely low and I also don’t follow how you get to $1k+/HH (which would be like doubling household income).
Back calculating to estimate benefits/household:
$1.5m national savings over 5 years = $300k/year
Number of adopters:
50m people in Uganda
5 people/household means 10m households
1⁄3 of households use charcoal: 10m/3 = ~3m households use charcoal
1% adopt: 3m * 1% = 30k adopting households
Benefits/household: $300k/year over 30k adopting households = $10/household/ year (or just $1/person/year), which seems super low to me
I’d guess that’s at least part of why you don’t see more bean soaking already, the savings are just so modest, unless I’ve missed something in my calculation.
As you note, behaviour change around cooking practices is also super hard. When I worked at One Acre Fund Tanzania, our 2 biggest failures were introducing clean cookstoves and high-iron beans, both of which people just didn’t want to use because of how they conflicted existing norms, e.g. color of the new bean variety “bled” into ugali, making it look dirty.
So the $ benefits would make me skeptical of this as promising but I’m hoping I missed something big in my calculation!