[Cause Exploration Prizes] Social and Behavioral Science R&D

This essay was submitted to Open Philanthropy’s Cause Exploration Prizes contest.

If you’re seeing this in summer 2022, we’ll be posting many submissions in a short period. If you want to stop seeing them so often, apply a filter for the appropriate tag!

We appreciate Open Philanthropy’s call for essays on cause prioritization. We think there is a strong case for Open Phil to fund (at a much higher level than currently) rigorous social and behavioral science R&D on interventions with the potential to save lives and increase incomes at scale. We believe this kind of targeted R&D has the potential to generate social returns that exceed Open Phil’s threshold of 1000x return on investment. Social and behavioral science R&D is neglected, tractable, and important.[1]

Social/​Behavioral Science R&D is Neglected

EAs write compelling articles about why RCTs are a great way to understand the causal impact of a policy or treatment. And GiveWell’s claim to fame is that it has led to many millions of dollars of donations to “several charities focusing on RCT-backed interventions as the ‘most effective’ ones around the world.”

But we wonder if the EA movement is allocating nearly enough money to new RCTs and program evaluations, or to social and behavioral science R&D more broadly, so as to build out new evidence in a strategic way.

After all, the agreed-upon list of the “best” evidence-based interventions seems a bit stagnant.

  • When Stuart spoke at the EA Global conference in 2016, GiveWell’s best ideas for global giving involved malaria, deworming, and cash transfers.

  • When we look at GiveWell’s current list of the top charities, they still are mostly focused on malaria, deworming, and cash transfers (albeit with the addition of Vitamin A supplements, a vaccine program operating in northwest Nigeria, and clean water dispensers).

Such a tiny set of interventions doesn’t seem at all proportionate to the scale of the myriad challenges posed by global poverty and disease.

And how do we even know that this handful of interventions are the best ideas to fund? Because at some point in the past, someone thought to fund rigorous RCTs on anti-malaria efforts, deworming, cash transfers, Vitamin A supplementation, childhood immunization, and water dispensers.

But why would this handful of isolated ideas be the best we can possibly do?

To be a bit provocative:

The EA community has mostly [albeit not entirely] taken the world’s supply of research as a given—with all of its oversights, poorly-aligned academic incentives, and irreproducibility—and then picked the best-supported interventions it could find there.

Doesn’t EA Already Fund R&D?

There are a number of cases where EA does indeed fund academic research on the effectiveness of interventions, such as GiveWell’s recent funding of this Michael Kremer et al. meta-analysis finding that water chlorination is a highly cost-effective way of improving child mortality. GiveWell has written recently of its commitment to research on malnutrition and lead exposure, while Open Phil has recently funded research on air quality sensors, Covid vaccines, a potential syphilis vaccine, etc. We’re sure there are other examples we’ve missed.

But on a closer look, with few exceptions not much of this research is squarely within the realm of what we’re talking about – i.e., directly funding RCTs and program evaluations themselves as part of a broader and well-designed agenda.

For example, the Kremer et al. meta-analysis of water treatment hinged on 15 main program evaluations (see Table 1). As far as we can tell, none of them were funded by major EA initiatives or donors:

  • The Haushofer et al. 2021 paper was funded by NIH, the Dioraphte Foundation, and Sint Antonius Stichting.

  • The Dupas et al. 2021 paper was funded by Stichting Dioraphte and the Stanford Center for Innovation in Global Health.

  • The Humphrey et al. 2019 paper was funded by Gates Foundation, UK Department for International Development, Wellcome Trust, Swiss Development Cooperation, UNICEF, and NIH.

  • The Kirby et al. 2019 paper was funded by DelAgua Health Limited.

  • The Null et al. 2018 paper was funded by the US Agency for International Development and the Gates Foundation.

  • The Luby et al. 2018 paper was funded by the Gates Foundation.

  • The Boisson et al. 2013 paper was funded by the Program for Appropriate Technology in Health (PATH); United States Agency for International Development (USAID); Medentech, Ltd.; and Chemical Chlorine Association.

  • The Peletz et al. 2012 paper was funded by Vestergaard-Frandsen SA and the United States National Science Foundation.

  • The Kremer et al. 2011 paper was funded by Hewlett Foundation, USDA/​Foreign Agricultural Service, International Child Support, Swedish International Development Agency, Finnish Fund for Local Cooperation in Kenya, google.org, the Bill and Melinda Gates Foundation, and the Sustainability Science Initiative at the Harvard Center for International Development.

  • The other studies are from 2006 and before, when the EA movement didn’t really exist yet.

Instead, the term “research” in this case consisted of summarizing other people’s research, followed by inserting the effect sizes into cost-effectiveness models.

Which is a fine and valuable activity! We agree that rigorous meta-analysis is one of the best things to perform and fund (e.g., Rachael Meager’s groundbreaking work).

Again, though, it’s derivative of what everyone else chooses to fund. If EAs don’t fund enough underlying RCTs and evaluations, then they are assuming that they can mostly sit back and hope that non-EA funders will support studies conducted with the right amount of rigor and the right amount of focus on cost and quality.

We think it’s likely that that isn’t happening very often. Indeed, an EA leader told us in conversation that there are at least two serious problems with relying on the existing academic literature.

  • First, there is an imperfect overlap between the questions that academic researchers want to study, and the questions that EAs want to answer.

  • Second, even when there’s overlap, the academic journal system usually doesn’t ask researchers to collect cost data, which means that if EAs want to know anything about cost-effectiveness, they are left with trying to reconstruct those numbers after the fact.

This implies that there are likely many opportunities for the EA community to exploit weaknesses in the current academic system by funding a large number of R&D projects with an eye towards cost-effectiveness and scale.

Doesn’t the Federal Government Already Fund R&D?

Unfortunately, not nearly enough or as effectively as it should.

One problem with federal funding for social and behavioral science is that awards are simply too small. For example, the average award from the National Science Foundation’s Division of Social, Behavioral, and Economic Sciences, the primary funder of social and behavioral science in the U.S., is only $150,000 (with 13 of NSF awards typically allocated to institutional overhead).

In order to manage their budget constraints, researchers often conduct small trials with unrepresentative study populations (often in locations chosen to generate the largest treatment effects). The evidence from these small federally-funded studies can be worse than uninformative: an unrepresentative study says that X works; everyone decides that X works; whole organizations spin up to do X and government agencies decide to do X; much money is spent; and then it turns out that X doesn’t replicate at scale.

Some of the potential harm posed by small federally-funded studies could be alleviated if competitions for federal dollars prioritized predictably scalable interventions.

For example, interventions based on human capital might be deprioritized for funding. Small studies of human capital-based interventions often are run with the best teachers/​workers/​etc. someone could recruit, but scaling up inevitably means that you get less-qualified people delivering the program. That’s just reality no matter what organization or government is involved.

A classic example is what happened with class size reduction. One fairly small experiment showed that reducing class size had amazingly positive effects, but when class size reduction was adopted by the state of California, a study showed that the “increase in the share of teachers with neither prior experience nor full certification dampened the benefits of smaller classes, particularly in schools with high shares of economically disadvantaged, minority students.”

In other words, putting students in small classes was a great idea in a small study, but when you try to reduce class size statewide (never mind nationwide), you end up having to hire so many teachers that their quality and experience goes down, mostly or fully offsetting the benefits of smaller classes.

Interventions that likely have offsetting general equilibrium effects could also be deprioritized for funding. For example, job training programs often have positive effects in isolation. If you pick 100 or 200 people to receive better training, they might easily do better than otherwise.

But can such programs work when scaled up? After all, if there are 100 welding jobs in a community (just to make up a hypothetical example), and if you train 100 people for those jobs, they might do well, but if you train 500 people for the 100 welding jobs, the training effect will necessarily dissipate.

Unfortunately, that’s what one study found in France, where the study team had the (rare!) opportunity to randomize not only who got access to the job training program, but also how many people the program actually served across different communities. They found that the job training successes came mostly at the expense of the control group.

By contrast, interventions leveraging peer effects could be prioritized for funding. Perhaps if you offer a drug treatment program or a high school graduation program to just 10% of the local students, they get distracted by the other 90% of kids not in the program, but if you put everyone in the program, they would all reinforce each other’s decisions. In this case, that would mean that a small study in fact underestimates the program’s effect at a larger scale. If we dismissed this type of program on the basis of small studies, we might be missing the boat.

There is plenty of work on scalability (See John List’s new book, his recent A16z interview, or his scholarly work on the issue here, here, and here, or this article by Mary Ann Bates and Rachel Glennerster). It would be great if federal funders prioritized scalability as a funding criterion. But, to the best of our knowledge, they don’t.

As a result, existing studies funded by federal dollars (again, the largest source of funding for social and behavioral science R&D) have likely failed to identify the full set of interventions effective at saving lives and increasing incomes at scale. The inadequacy of the existing evidence base implies that investments in social and behavioral science R&D likely have high positive expected value.

Social/​Behavioral Science R&D is Tractable

Could R&D funded by Open Phil plausibly identify new interventions that clear its 1000x cost-effectiveness threshold? We think the answer to this question is yes.

One potential source of highly effective interventions lies in increasing take-up of life-saving technologies. R&D on the margin of increased take-up likely has large positive expected value, given the low take-up of many effective technologies and the current underinvestment in R&D explicitly designed to elicit the most cost-effective and scalable strategies to increase take-up.

Consider the case of Covid-19 vaccinations. Recent estimates of excess mortality rates due to Covid have revealed staggeringly high mortality rates in low-income countries–e.g., India is estimated to have an excess of 4 million deaths. The virus also produces high rates of post-infection complications. Effective Covid-19 vaccines are widely available free of cost in low-income countries. Yet vaccination rates in LMIC remain stubbornly low: only 20% of LIC residents have received at least one dose of a Covid vaccine as of July 20, 2022.

With Covid prevalent and rapidly evolving everywhere, there is a pressing need to identify interventions with the potential to increase vaccination take-up. Pandemic preparedness requires R&D not only on vaccine development, but also on vaccine acceptance. Effective vaccines are miraculous technological and scientific achievements, but their potential is unfulfilled if they are left on the shelf at a local clinic or pharmacy.

In order for an intervention that increases Covid vaccination rates to have greater than 1,000x ROI, we need each $100K spent on vaccine take-up to yield greater than $1M in net benefits. Using the Open Philanthropy/​GiveWell benchmarks of 32 DALYS per adult death, and $100,000 per DALY, averting an adult death is worth approximately $3.2M (Cell E45 here).

Assuming that reducing adult mortality is the only benefit of Covid vaccination, for each $100K in the costs of an intervention to increase Covid vaccination take-up, we would need to avert at least 31.25 adult deaths ($100M/​$3.2M). Recent estimates of excess mortality rates due to Covid in LIC range as high as 0.007. A population-wide COVID mortality rate of even 0.004 implies that we would see 31.25 Covid-related adult deaths in a population of 7,813 unvaccinated adults. With vaccine efficacy of 0.75 (the efficacy of an inexpensive vaccine widely available in LIC settings) we would need to fully vaccinate about 10,417 adults to avert at least 31.25 deaths.

For an investment of $100,000, we therefore need each full vaccination (as a result of a take-up increasing intervention) to cost no more than about $9.60. Is that within reach? Maybe!

One potential intervention is to deploy scaled SMS-based messaging campaigns. Several large-scale RCTs have demonstrated the effectiveness of SMS-based messages from health providers at increasing Covid and flu vaccination rates in the United States (here, here, and here). Of particular note, SMS-based messages from a U.S.-based health provider deploying ownership language (e.g., “Your vaccine is waiting for you”) increased two-dose Covid vaccination take-up at a cost of $2.50 per full vaccination in a large-scale RCT. If this intervention cost holds in a LIC setting with high excess Covid-related mortality, $100,000 in costs could purchase about 40,000 vaccinations, averting 120 adult deaths (at a vaccine efficacy of 0.75 and a mortality rate of 0.004), for an ROI of 3,840x. We don’t know if this kind of scaled SMS-based intervention could increase Covid vaccination rates in LIC settings. We should find out!

Another potentially effective way to increase Covid vaccination take-up is to leverage (scalable) peer effects. There is a growing literature about the cost-effective incentives provided by social pressure to motivate individuals to take costly pro-social actions, including in contexts such as voter turnout and energy conservation. In the context of vaccination, an RCT of a social norm intervention saw increases in full childhood vaccination rates at a cost of less than $1 per child in Sierra Leone. If this intervention cost held for adult Covid vaccination in LIC settings, we could vaccinate about 100,000 adults for $100,000, averting potentially around 300 deaths at a whopping ROI of 9,600x. We don’t know if a social norm intervention could increase Covid vaccination rates in LIC settings. We should find out!

Other vaccination incentive costs reported in the RCT literature also fall below the $9.60 threshold required for at least 1,000x ROI. An RCT of an intervention in Kenya saw increased childhood immunization rates in Kenya using SMS messages and financial incentives at a cost of $8 per full immunization. An RCT in India saw increased childhood immunization rates from offering 1 kg of raw lentils for each vaccine and a set of metal plates upon completion of the full series (with total incentive costs of $6.64). In short, even ignoring all other potential benefits of Covid vaccination beyond reductions in adult mortality, it appears highly likely that there are interventions to increase Covid vaccination rates that are above the 1,000x ROI threshold.

GiveWell did fund at least one RCT (of which we are aware) of a vaccination take-up intervention, finding that New Incentives, an NGO offering $11 in cash incentives per full childhood immunization in Northwest Nigeria, was successful in boosting immunization rates and reducing child mortality. We think there is much more R&D on vaccination take-up that could and should be funded.

There are also other effective technologies beyond vaccinations with low take-up. For example, utility-scale solar and wind power installations, necessary to significantly reduce CO2-equivalent emissions, face fierce local resistance in many communities, with obstructed renewable energy projects comprising approximately 50% of challenged energy projects. Identifying interventions with the capacity to cost-effectively increase take-up of utility-scale renewable energy installations also has high positive expected value. High-yielding crop varieties have led to dramatic increases in global incomes. But adoption of yield-enhancing technologies remains low in many Sub-Saharan regions. Identifying interventions with the capacity to cost-effectively increase take-up of yield-increasing technologies likewise has high expected value.

Even long-termist EA goals could arguably benefit from investment in more serious applied R&D. For example, one long-termist goal is higher-functioning epistemic institutions, enabling us to better “sort truth from falsehood,” to “make appropriate decisions in the face of uncertainty,” and to direct our attention to “what’s most important.” Investing in R&D evaluating alternative epistemic interventions on the largest digital platforms could help achieve this goal. For example, a large-scale RCT revealed that Facebook users in low-quality epistemic environments can be cost-effectively encouraged through Facebook ads to follow higher-quality news outlets, leading to reductions in both epistemic and political polarization. Another RCT revealed that accuracy nudges sent (virtually costlessly) by Twitter DM can reduce the sharing of inaccurate information by Twitter users. Yet another RCT has revealed that apps allowing users to restrict time spent on social media platforms can be made more effective by instituting delays before platform access is resumed, again leading to reductions in epistemic and political polarization.

RCTs evaluating alternative epistemic interventions on digital platforms can also reveal ineffective interventions. For example, a recent RCT of Twitter’s content moderation practices revealed that, if anything, content moderation increased the time spent on the platform by those who had posted slurs about disability or who had denied the existence of the Holocaust. Much policy and advocacy effort is devoted to lobbying for increased content moderation on the platforms. Perhaps that effort could be more productively directed elsewhere.

Social/​Behavioral Science R&D is Important

If R&D did in fact reveal interventions with greater than 1,000x return, would that justify the investment in R&D? We think the answer to this question is also yes.

Consider that GiveWell expects to allocate about $500M - $600M in philanthropy in 2022, to causes that it believes produce about 1,000x social return on investment.

Assume for the sake of simplicity that GiveWell’s portfolio is $510M for the next few years (could be more; could be less), and that $10M of these funds are allocated to R&D in each year. Imagine further that, if this R&D identified any interventions with greater than 1,000x return on investment, GiveWell would shift its giving to NGOs deploying these new interventions. A shift to 1,200x interventions, for example, would produce an additional 100B in returns over GiveWell’s baseline expected ROI in any given year. This would imply that, even if the R&D had only a 10% chance of finding a 1200x intervention, the $10M in R&D would have an expected ROI itself of 1000x ($100B * 0.10 = $10M x 1000), clearing GiveWell’s current ROI threshold. Although these numbers are of course somewhat arbitrary, they nonetheless illustrate that the large returns to finding new cost-effective interventions make the R&D itself cost-effective even at the 1000x threshold.

And perhaps this sort of benefit-cost ratio shouldn’t be a surprise. For example, as this Open Phil blog post observes, the Rockefeller Foundation saved “over a billion people from starvation” by employing Norman Borlaug as a plant scientist.

We are also possibly understating the potential benefits from R&D beyond the returns generated by shifts in EA giving. Evidence of successful interventions may lead to widespread adoption by other NGOs and/​or governments, generating substantial additional returns. Evidence of unsuccessful interventions may also, perhaps counterintuitively, generate substantial expected value if this evidence causes NGOs and/​or governments to shift funding away from unproductive interventions/​policies to more productive interventions/​policies. Federal funding agencies may be incentivized to increase federal funding for applied research more closely aligned with the goal of increasing take-up of welfare-enhancing technologies, leading to additional discoveries. R&D investments may induce additional follow-on research as others seek to further explore both successful and unsuccessful interventions, and this follow-on research may itself lead to increased take-up and improvements in downstream outcomes. Investments in research projects involving NGO and governmental partners may induce these partners to place more emphasis on rigorous evidence of cost-effectiveness, leading to changes in practices that may also improve downstream outcomes.

Conclusion

As Holden Karnofsky has pointed out, there’s a strong case for funding high-risk, high-reward projects that are up to 90+% likely to fail: “hits-based giving.”

Investing in RCTs, evaluations, and other forms of R&D that have perhaps only a small chance of finding a cost-effective treatment/​program/​innovation would be a hits-based way of furthering Open Phil’s aims. Indeed, funding social and behavioral science R&D might be the best example of “hits-based giving.” The result would be a much broader set of interventions, programs, and policies that could feed into Open Phil’s and GiveWell’s investments.

We recommend that Open Philanthropy invest in a robust, thoughtful, and thorough social and behavioral science R&D agenda with three stages: pilots of plausibly scalable interventions, mid-sized replications, and full-blown evaluations at scale. In doing so, we join other voices calling on the EA funder community to increase investments in targeted R&D (see here and here). Open Phil has an opportunity to lead both public and private funders in the direction of producing more rigorous, more scalable, and more relevant evidence about how to cost-effectively increase global well-being. We hope it seizes this opportunity.

  1. ^

    Note: this post has been adapted from Stuart’s earlier EA Forum post on EA prioritization.