External evaluation of GiveWell’s research

Link post

Aaron’s note: I stum­bled over this to­day and got lost in the links; I hadn’t re­al­ized how much effort GiveWell spent (at least in the early years) hiring peo­ple to eval­u­ate their re­search. Here’s a full list of their ex­ter­nal re­views, which goes into a lot more depth than this post.

Also, this post was origi­nally writ­ten in 2013; I just re­cently dis­cov­ered it and de­cided to re­post.


We’ve long been in­ter­ested in the idea of sub­ject­ing our re­search to for­mal ex­ter­nal eval­u­a­tion. We pub­lish the full de­tails of our anal­y­sis so that any­one may cri­tique it, but we also rec­og­nize that it can take a lot of work to di­gest and cri­tique our anal­y­sis, and we want to be sub­ject­ing our­selves to con­stant crit­i­cal scrutiny (not just to the the­o­ret­i­cal pos­si­bil­ity of it).

A cou­ple of years ago, we de­vel­oped a for­mal pro­cess for ex­ter­nal eval­u­a­tions, and had sev­eral such eval­u­a­tions con­ducted and pub­lished. How­ever, we haven’t had any such eval­u­a­tions con­ducted re­cently. This post dis­cusses why.

In brief,

  • The challenges of ex­ter­nal eval­u­a­tion are sig­nifi­cant. Be­cause our work does not fall cleanly into a par­tic­u­lar dis­ci­pline or cat­e­gory, it can be difficult to iden­tify an ap­pro­pri­ate re­viewer (par­tic­u­larly one free of con­flicts of in­ter­est) and provide enough struc­ture for their work to be both mean­ingful and effi­cient. We put a sub­stan­tial amount of ca­pac­ity into struc­tur­ing and so­lic­it­ing ex­ter­nal eval­u­a­tions in 2010, and if we wanted more ex­ter­nal eval­u­a­tions now, we’d again have to in­vest a lot of our ca­pac­ity in this goal.

  • The level of in-depth scrutiny of our work has in­creased greatly since 2010. While we would still like to have ex­ter­nal eval­u­a­tions, all else equal, we also feel that we are now get­ting much more value than pre­vi­ously from the kinds of eval­u­a­tions that we ul­ti­mately would guess are most use­ful – in­ter­ested donors and other au­di­ence mem­bers scru­ti­niz­ing the parts of our re­search that mat­ter most to them.

Between these two fac­tors, we aren’t cur­rently plan­ning to con­duct more ex­ter­nal eval­u­a­tions in the near fu­ture. How­ever, we re­main in­ter­ested in ex­ter­nal eval­u­a­tion and hope even­tu­ally to make fre­quent use of it again. And if some­one vol­un­teered to do (or fa­cil­i­tate) for­mal ex­ter­nal eval­u­a­tion, we’d wel­come this and would be happy to promi­nently post or link to crit­i­cism.

The challenges of ex­ter­nal evaluation

The challenges of ex­ter­nal eval­u­a­tion are sig­nifi­cant:

  • There is a ques­tion around who counts as a “qual­ified” in­di­vi­d­ual for con­duct­ing such an eval­u­a­tion, since we be­lieve that there are no other or­ga­ni­za­tions whose work is highly similar to GiveWell’s. Our work is a blend of eval­u­at­ing re­search and eval­u­at­ing or­ga­ni­za­tions, and it in­volves both in-depth scrutiny of de­tails and holis­tic as­sess­ments of the of­ten “fuzzy” and het­ero­ge­neous ev­i­dence around a ques­tion.

On the “eval­u­at­ing re­search” front, one plau­si­ble can­di­date for “qual­ified eval­u­a­tor” would be an ac­com­plished de­vel­op­ment economist. How­ever, in prac­tice many ac­com­plished de­vel­op­ment economists (a) are ex­tremely con­strained in terms of the time they have available; (b) have af­fili­a­tions of their own (the more in­ter­ested in prac­ti­cal im­pli­ca­tions for aid, the more likely a scholar is to be di­rectly in­volved with a par­tic­u­lar or­ga­ni­za­tion or in­ter­ven­tion) which may bias eval­u­a­tion.

  • Based on past work on ex­ter­nal eval­u­a­tion, we’ve found that it is very im­por­tant for us to provide a sub­stan­tial amount of struc­ture for an eval­u­a­tor to work within. It isn’t prac­ti­cal for some­one to go over all of our work with a fine-toothed comb, and the higher-sta­tus the per­son, the more of an is­sue this be­comes. Our cur­rent set of eval­u­a­tions is based on old re­search, and to have new eval­u­a­tions con­ducted, we’d need to cre­ate new struc­tures based on cur­rent re­search. This would take trial-and-er­ror in terms of find­ing an eval­u­a­tion type that pro­duces mean­ingful re­sults.

  • There is also the ques­tion of how to com­pen­sate peo­ple for their time: we don’t want to cre­ate a pro-GiveWell bias by pay­ing, but not pay­ing fur­ther limits how much time we can ask.

I felt that we found a good bal­ance with a 2011 eval­u­a­tion by Prof. To­bias Pfutze, a de­vel­op­ment economist. Prof. Pfutze took ten hours to choose a char­ity to give to – us­ing GiveWell’s re­search as well as what­ever other re­sources he found use­ful – and we “paid” him by donat­ing funds to the char­ity he chose. How­ever, de­vel­op­ing this as­sign­ment, find­ing some­one who was both qual­ified and will­ing to do it, and pro­vid­ing sup­port as the eval­u­a­tion was con­ducted in­volved sig­nifi­cant ca­pac­ity.

Given the time in­vest­ment these sorts of ac­tivi­ties re­quire on our part, we’re hes­i­tant to go for­ward with one un­til we feel con­fi­dent that we are work­ing with the right per­son in the right way and that the re­search they’re eval­u­at­ing will be rep­re­sen­ta­tive of our work for some time to come.

Im­prove­ments in in­for­mal evaluation

Over the last year, we feel that we’ve seen sub­stan­tially more deep en­gage­ment with our re­search than ever be­fore, even as our in­vest­ments in for­mal ex­ter­nal eval­u­a­tion have fallen off.

Where we stand

We con­tinue to be­lieve that it is im­por­tant to en­sure that our work is sub­jected to in-depth scrutiny. How­ever, at this time, the scrutiny we’re nat­u­rally re­ceiv­ing – com­bined with the high costs and limited ca­pac­ity for for­mal ex­ter­nal eval­u­a­tion – make us in­clined to post­pone ma­jor effort on ex­ter­nal eval­u­a­tion for the time be­ing.

That said,

  • If some­one vol­un­teered to do (or fa­cil­i­tate) for­mal ex­ter­nal eval­u­a­tion, we’d wel­come this and would be happy to promi­nently post or link to crit­i­cism.

  • We do in­tend even­tu­ally to re-in­sti­tute for­mal ex­ter­nal eval­u­a­tion.