How Much Can We Generalize from Impact Evaluations? (link)
Tyler Cowen posted a link to this paper(PDF), outlining how effective programs are when transported to new contexts, or scaled up by governments.
Two key quotes:
The program implementer is the main source of heterogeneity in results,
with government-implemented programs faring worse than and being poorly predicted
by the smaller studies typically implemented by academic/NGO research teams, even
controlling for sample size
The average intervention-outcome combination is comprised 37% of positive, significant studies;
58% of insignificant studies; and 5% of negative, significant studies. If a particular result is positive
and significant, there is a 61% chance the next result will be insignificant and a 7% chance the
next result will be significant and negative, leaving only about a 32% chance the next result will
again be positive and significant.
What a great study! I read through it and if I understand the PRESS statistic correctly, if you have a meta-analysis of impact evaluations, their average effect size predicts 33% of the variation of the next study run on the topic. So that’s not as great as I expected, but given the huge variations between countries and charities, it makes sense.
My takeaway is that since studies aren’t as generalizable as I would like, I should weight a study done on the charity I’m supporting much higher. This makes GiveDirectly a much stronger choice because it has a study run on it itself.
Another takeaway—studies done on studies are EA badass.
Note, the author is Eva Vivalt from AidGrade. They have a new blog post here saying they have new data and papers coming out over the next few weeks.
From the post:
It’ll be interesting to see the working papers on each of the topics (especially the microhealth insurance).
Thanks for sharing, this is important data. My experience working in non-profits throughout my career (and the contact with the charity world that it gave me) has steadily moved me to the conclusion which that paper speaks towards: namely that having a diligent program implementer, which is sincerely focused on valuable end goals, is comparably important to having a program which seems effective on the face of it. It’s interesting that ” the smaller studies typically implemented by academic/NGO research teams” typically had better results—it suggests that they might be skewed towards being unrepresentatively positive.
The document and summary makes a good read and a good point, viewing fig1 and fig2 alone is good thought provoking stuff. I personally am not surprised by the findings, they simply show we need more people to adopt an EA approach towards and within such programs and organisations. Less is more for sure. Less is even more when enough of it is done and learnt from! Im with Tom!
This sort of assessment, while different, is not a million miles from the experience of most private let alone public sector spending/investments. The 3rd and 4th sector are no different so we should not see it as something to bash AID or ODA for, just encourage its focus and effectiveness while recognising more is needed. Which more and how is not always clear. There are few silver bullets in any sector.
The comparison I make is with “above and below the line” analysis in business. In principle keeping the Pareto Principle in mind. whether its manufacturing or commercial activity its well knows that a significant minority of activity/resource delivers the majority of value and/or a significant minority of issues delivers the majority of waste. The 80:20 works on the positive and the negative perspective, both expose the inefficiency or ineffectiveness of operations. Its also true that its often unpredictable or undetectable/unknowable (for the below the line especially ie adv see later) but things have changed in the last 30yrs particular if thinking about QC, QA and things like Kanban etc ie better planning and nr real time allocation.
However like in advertising, most companies know that 70-80% of their spend is wasted but the 20% that’s works covers the cost, not acceptable but a working reality due to a changing world, daily. The same applies with R&D budgets and M&A budgets. Too often this is used as an excuse by ineffectual exec’s and so benchmarking intra and inter organisationally and across sectors is key.
Perhaps EA as it advances with the analysis and due diligence should also flag up the benchmarking that works. (in ODA P2P pressures across African nations has allowed in some instances the MDG’s to be powerful pressure tool, not just a source of AID, thus forging action by leaders instead of them hiding behind complacency). Personally id avoid any preference/biased towards size, both big and small are important but usually small make the paradigm shift breakthroughs, that’s why R&D budgets in large orgs end up morphing into M&A budgets as they get big, old and out of touch, unable to attract the best and the brightest of new young people, so they have to buy them, their idea’s and solutions through acquisition. Of course that doesn’t happen in the charity and NGO world. Sadly they just are left to an often long slow death or withering away while the performers hopefully get more access to their breakfast!
Agree, I think this raises the idea that EAs getting directly involved with potentially effective charities could be as effective as earning to give.