Do Prof Eva Vivalt’s results show ‘evidence-based’ development isn’t all it’s cut out to be?

I recently interviewed a member of the EA community—Prof Eva Vivalt—at length about her research into the value of using trials in medical and social science to inform development work. The benefits of ‘evidence-based giving’ has been a core message of both GiveWell and Giving What We Can since they started.

Vivalt’s findings somewhat challenge this, and are not as well known as I think they should be. The bottom line is that results from existing studies only weakly predict the results of similar future studies. They appear to have poor ‘external validity’ - they don’t reliably indicate the measured result an intervention will seem to have in future. This means that developing an evidence base to figure out how well projects will work is more expensive than it otherwise would be.

Perversely, in some cases this can make further studies more informative, because we currently know less than we would if past results generalized well.

Note that Eva discussed an earlier version of this paper at EAG 2015.

Another result that conflicts with messages 80,000 Hours has used before is that experts on average are fairly good at guessing the results of trials (though you need to average many guesses). Aggregating these guesses may be a cheaper alternative to running studies, though the guesses may become worse without trial results to inform them.

Eva’s view is that there isn’t much alternative to collecting evidence like this—if it’s less useful we should just accept that, but continue to run and use studies of this kind.

I’m more inclined to say this should shift our approach. Here’s one division of the sources of information that inform our beliefs:

  1. Foundational priors

  2. Trials in published papers

  3. Everything else (e.g. our model of how things work based on everyday experience).

Inasmuch as 2 looks less informative, we should rely more on the alternatives (1 and 3).

Of course Eva’s results may also imply that 3 won’t generalize between different situations either. In that case, we also have more reason to work within our local environment. It should nudge us towards thinking that useful knowledge is more local and tacit, and less universal and codified. We would then have greater reason to become intimately familiar with a particular organisation or problem and try to have most of our impact through those areas we personally understand well.

It also suggests that problems which can be tackled with published social science may not have as high tractability—relative to alternative problems we could work on—as it first seems.

You can hear me struggle to figure out how much these results actually challenge conventional wisdom in the EA community later in the episode, and I’m still unsure.

For an alternative perspective from another economist in the community, Rachel Glennerster, you can read this article: Measurement & Evaluation: The Generalizability Puzzle. Glennerster believes that generalisability is much less of an issue than does Vivalt, and is not convinced by how she has tried to measure it.

There are more useful links and a full transcript on the blog post associated with the podcast episode.