Rethink Grants: an evaluation of Donational’s Corporate Ambassador Program

Ex­ec­u­tive Summary

Re­think Grants

Re­think Grants (RG) is an anal­y­sis-driven grant eval­u­a­tion ex­per­i­ment by Re­think Pri­ori­ties and Re­think Char­ity. In ad­di­tion to es­ti­mat­ing the ex­pected costs and im­pacts of the pro­posed pro­ject, RG as­sists with plan­ning, sourc­ing fund­ing, fa­cil­i­tat­ing net­work­ing op­por­tu­ni­ties, and other as-needed efforts tra­di­tion­ally sub­sumed un­der pro­ject in­cu­ba­tion. We do not yet fund grants our­selves, but re­fer grants to other grant­mak­ers within our net­works who we have rea­son to be­lieve would be in­ter­ested.

Dona­tional’s Cor­po­rate Am­bas­sador Program

This re­port is our first pub­lished eval­u­a­tion – an as­sess­ment of a new pro­ject pro­posed by Dona­tional. The Dona­tional plat­form more effi­ciently pro­cesses dona­tions made through its part­ner or­ga­ni­za­tions, and al­lows users to set up dona­tion port­fo­lios in­formed by the ex­per­tise of char­ity eval­u­a­tors en­dorsed by the effec­tive al­tru­ism com­mu­nity. Dona­tional re­quested $100,000 to es­tab­lish a Cor­po­rate Am­bas­sador Pro­gram (CAP), which would re­cruit ad­vo­cates for effec­tive giv­ing in US work­places. Th­ese ‘am­bas­sadors’ would en­courage their col­leagues to donate through the plat­form, thereby rais­ing money for highly effec­tive char­i­ties.

Methods

We eval­u­ated CAP with refer­ence to five crite­ria: a for­mal cost-effec­tive­ness es­ti­mate (based on an Ex­cel model), team strength, in­di­rect benefits, in­di­rect harms, and ro­bust­ness to moral un­cer­tainty. Each was given a qual­i­ta­tive as­sess­ment of Low, Medium, or High, cor­re­spond­ing to a nu­mer­i­cal score of 1–3. The weighted av­er­age of these scores con­sti­tuted the over­all Pro­ject Po­ten­tial Score, which formed the ba­sis of our fi­nal de­ci­sion.

Results

Cost-effec­tive­ness Es­ti­mate: Low (1)

The base case dona­tion-cost ra­tio of around 2:1 is be­low the 3x re­turn that we con­sider the ap­prox­i­mate min­i­mum for the pro­ject to be worth­while, and far from the 10x or higher re­ported by com­pa­rable or­ga­ni­za­tions. The re­sults are sen­si­tive to the num­ber and size of pledges (re­cur­ring dona­tions), and CAP’s abil­ity to re­tain both am­bas­sadors and pledgers. Be­cause of the high un­cer­tainty, very rough value of in­for­ma­tion calcu­la­tions sug­gest that the benefits of run­ning a pi­lot study to fur­ther un­der­stand the im­pact of CAP would out­weigh the costs by a large mar­gin.

Team Strength: Medium (2)

Dona­tional’s founder, Ian Yamey, is very ca­pa­ble and falls on the high end of this score. His track record sug­gests above-av­er­age com­pe­tency in sev­eral di­men­sions of pro­ject im­ple­men­ta­tion. While the plan­ning pro­cess for the CAP pre­sented some po­ten­tial gaps in aware­ness, Yamey demon­strates an ea­ger­ness to take cor­rec­tions and stead­fast com­mit­ment to iter­at­ing his plans in search of the most effec­tive ver­sion of the pro­gram.

Indi­rect Benefits: High (3)

We think there is a small-to-mod­er­ate chance that CAP would gen­er­ate sev­eral very im­pact­ful in­di­rect benefits. For ex­am­ple, the ad­di­tional dona­tions go­ing to an­i­mal-fo­cused char­i­ties may re­duce the risk of global pan­demics caused by an­tibiotic re­sis­tance, and the pro­gram may help cre­ate a broader cul­ture of effec­tive giv­ing at US work­places.

Indi­rect Harms: High (1)

We also think there is a small-to-mod­er­ate chance that CAP would in­di­rectly cause or ex­ac­er­bate a num­ber of prob­lems. For in­stance, char­i­ties that re­duce poverty and dis­ease may cause eco­nomic growth, which is likely to in­crease the num­ber of an­i­mals raised in fac­tory farms and could con­tribute to cli­mate change and ex­is­ten­tial risks.

Ro­bust­ness to Mo­ral Uncer­tainty: Medium (2)

CAP is com­pat­i­ble with most wor­ld­views, but there may be ex­cep­tions. For ex­am­ple, some so­cial­ists be­lieve char­ity of­ten does more harm than good by pro­mot­ing in­di­vi­d­u­al­is­tic norms.

Pro­ject Po­ten­tial Score: Low (1.27)

RG team mem­bers gave the vast ma­jor­ity of the weight to the cost-effec­tive­ness es­ti­mate, lead­ing to an over­all Pro­ject Po­ten­tial Score of 1.27. This falls clearly into the Low cat­e­gory.

Decision

After con­clud­ing the eval­u­a­tion, we have de­cided not to recom­mend fund­ing for a full-scale CAP at this time. This is based heav­ily on our cost-effec­tive­ness anal­y­sis, which sug­gests the pro­gram is un­likely to be worth­while at any rea­son­able cost-effec­tive­ness thresh­old, at least in its pro­posed form. How­ever, we have recom­mended fund­ing of up to $40,000 to run a pi­lot study based on three pri­mary con­sid­er­a­tions: (i) con­cern that the cost-effec­tive­ness anal­y­sis may un­der­es­ti­mate CAP’s promis­ing­ness rel­a­tive to com­pa­rable pro­grams; (ii) the po­ten­tially high value of in­for­ma­tion from run­ning a pi­lot; and (iii) the rel­a­tively low risk in­volved, given that we ex­pect the pi­lot’s costs to be lower than the vol­ume of dona­tions gen­er­ated.

The fu­ture of Re­think Grants

Any fur­ther eval­u­a­tions con­ducted by RG will not nec­es­sar­ily in­volve the same pro­cess or level of rigor as seen in this re­port. This eval­u­a­tion was an ex­per­i­ment that in­volved front­load­ing many one-time costs, such as cre­at­ing the eval­u­a­tion frame­work; how­ever, we also recog­nise im­por­tant short­com­ings of our method­ol­ogy, and are aware that eval­u­a­tions of early-stage pro­jects in this depth may not always be ad­vis­able, de­pend­ing on fac­tors such as the grant size, fea­si­bil­ity of run­ning a low-cost pi­lot study, and availa­bil­ity of rele­vant data. Should RG con­tinue, fu­ture eval­u­a­tions will in­cor­po­rate a num­ber of im­por­tant les­sons learned from this ex­pe­rience.

In­tro­duc­ing Re­think Grants

Re­think Grants (RG) is a col­lab­o­ra­tion be­tween the Re­think Pri­ori­ties re­search team and the Re­think Char­ity se­nior lead­er­ship. RG works in­di­vi­d­u­ally with pro­ject leads to pro­duce tai­lored eval­u­a­tions of grant pro­pos­als, and refers those pro­jects to grant­mak­ers within our net­works when we be­lieve they merit fund­ing. In ad­di­tion, we as­sist in early-stage plan­ning, fa­cil­i­tat­ing net­work­ing op­por­tu­ni­ties, and other as-needed efforts tra­di­tion­ally sub­sumed un­der pro­ject in­cu­ba­tion. RG’s sin­gle most im­por­tant value add is our uniquely thor­ough and per­son­al­ized ap­proach.

Our prin­ci­pal aim is to help raise the qual­ity of funded pro­jects within effec­tive al­tru­ism and ad­ja­cent world-im­prove­ment do­mains. The RG pro­cess sig­nal-boosts their po­ten­tial value through for­mal recom­men­da­tions made on the strength of our in-depth anal­y­sis.

Below, we dis­cuss our prin­ci­ples and pro­cess in more de­tail, and then re­port on our first grant eval­u­a­tion – an as­sess­ment of a new work­place giv­ing pro­gram pro­posed by Dona­tional.

Our principles

  • We con­duct and pub­lish a de­tailed, trans­par­ent eval­u­a­tion of ev­ery po­ten­tial grant recom­men­da­tion.

  • We base de­ci­sions pri­mar­ily on cost-effec­tive­ness, while rec­og­niz­ing that for­mal ex­pected value es­ti­mates should not be taken liter­ally.

  • We take a growth ap­proach to eval­u­at­ing pro­jects, con­sid­er­ing not just the im­pact of a marginal dol­lar but the po­ten­tial cost-effec­tive­ness at scale.

  • We see grants as ex­per­i­ments. Fol­low­ing our thor­ough as­sess­ment, we err on the side of cau­tiously test­ing promis­ing ideas. We then help set up crite­ria for suc­cess and failure, and then re­new the grant recom­men­da­tion if the ap­proach shows promise (or there is a promis­ing pivot).

  • Beyond just eval­u­at­ing grant op­por­tu­ni­ties, we want to help prospec­tive grantees im­prove. To do this, we offer them ac­cess to our re­search, gen­eral sup­port, and de­tailed feed­back.

  • Be­cause we have to make grant­ing de­ci­sions un­der sig­nifi­cant moral un­cer­tainty, we aim to prac­tice wor­ld­view di­ver­sifi­ca­tion.

Our process

Re­think Grants will be­gin with an in-net­work ap­proach to sourc­ing pro­jects, rely­ing on trusted refer­rals to help us reach out to promis­ing in­di­vi­d­u­als and or­ga­ni­za­tions. If RG con­tinues to con­duct eval­u­a­tions, we then con­sider pro­jects on a rol­ling ba­sis. A pro­ject that seems po­ten­tially cost-effec­tive, run by a high-qual­ity team, and has room for more fund­ing moves for­ward through our eval­u­a­tion pro­cess.

The po­ten­tial of a given pro­ject is nor­mally as­sessed us­ing both quan­ti­ta­tive and qual­i­ta­tive es­ti­mates of cost-effec­tive­ness and over­all im­pact. To a large ex­tent, eval­u­a­tions are tai­lored to the in­di­vi­d­ual pro­posal, but the fol­low­ing crite­ria are used in most cases:

Quantitative

  • Cost-effec­tive­ness Es­ti­mate – Based on a for­mal cost-effec­tive­ness anal­y­sis, how much im­pact do we ex­pect it to cre­ate per dol­lar spent?

Qualitative

  • Team Strength – How effec­tively will the or­ga­ni­za­tion be able to im­ple­ment and grow the pro­ject?

  • Indi­rect Benefits – How large might the in­di­rect benefits of the pro­ject be?

  • Indi­rect Harms – How large might the in­di­rect harms of the pro­ject be?

  • Ro­bust­ness to Mo­ral Uncer­tainty – Might the pro­ject cause ter­rible harms from the per­spec­tive of differ­ent wor­ld­views?

To un­der­stand how a par­tic­u­lar pro­ject fares against these crite­ria, we aim to spend be­tween 6 and 10 weeks gath­er­ing in­for­ma­tion rele­vant to the pro­ject through liter­a­ture re­view, along with con­ver­sa­tions with the or­ga­ni­za­tion’s team, sub­ject-area ex­perts, and po­ten­tial fun­ders.

Our team

The RG team that con­ducted this eval­u­a­tion com­prised Derek Foster, Luisa Ro­driguez, Tee Bar­nett, Mar­cus A. Davis, Peter Hur­ford, and David Moss. In fu­ture eval­u­a­tions, con­tribut­ing team mem­bers may vary.

In­tro­duc­ing Donational

The platform

Launched in 2018 by Ian Yamey, Dona­tional is a user-friendly on­line plat­form that aims to em­power in­di­vi­d­u­als to make their dona­tion dol­lars do as much good as pos­si­ble. It does this in two main ways: by more effi­ciently pro­cess­ing dona­tions ob­tained through part­ner or­ga­ni­za­tions, and by guid­ing donors to­wards more effec­tive giv­ing op­por­tu­ni­ties.

Dona­tion processing

About 75% of Dona­tional’s users to date have come through its part­ner One for the World, which en­courages stu­dents to com­mit to donat­ing 1% of their pre-tax in­come upon grad­u­a­tion. OFTW’s port­fo­lio of recom­mended char­i­ties cur­rently com­prises 16 of GiveWell’s recom­mended and “stand­out” char­i­ties, five of which are des­ig­nated “Top Picks” by OFTW. For those users, OFTW had pre­sum­ably already done the work of con­vinc­ing users to change their dona­tion habits, but some use­ful fea­tures available on the Dona­tional plat­form mul­ti­ply those benefits.

  • It charges a lower fee (2%) than any com­pa­rable or­ga­ni­za­tion for dis­burs­ing the dona­tions to the char­i­ties.

  • It pro­vides dona­tion pages tai­lored to each school, which al­lows OFTW to set the de­fault pledge size to 1% of the av­er­age grad­u­ate in­come for that par­tic­u­lar in­sti­tu­tion.

  • It en­ables pledges more than one year in ad­vance.

  • It au­to­mat­i­cally up­dates the users’ credit card in­for­ma­tion when a new card is is­sued. Ac­cord­ing to OFTW, this keeps re­cur­ring donors from laps­ing in cases where they might have oth­er­wise for­got­ten to up­date their de­tails on the plat­form.

Philan­thropic advising

About a quar­ter of users find Dona­tional through other means, such as web searches. The plat­form uses a chat bot to ask these vis­i­tors a set of ba­sic ques­tions about their val­ues, giv­ing his­tory, and giv­ing goals, and the an­swers are used to de­sign a cus­tomized giv­ing port­fo­lio, nor­mally based around char­i­ties sug­gested by Dona­tional. Users can then learn more about the recom­mended char­i­ties, and ad­just the al­lo­ca­tion of their port­fo­lios ac­cord­ingly. Dur­ing the pro­cess, Dona­tional also ‘nudges’ users to set up re­cur­ring (rather than one-off) dona­tions, and to give a higher per­centage of their in­come.

The recom­mended char­i­ties cur­rently en­com­pass a broad range of causes, in­clud­ing global health, de­vel­op­ing world poverty, US so­cial jus­tice is­sues, an­i­mal welfare, en­vi­ron­men­tal pro­tec­tion, and cli­mate change – and users can add any other char­ity that is reg­istered in the US as a 501(c)3 or­ga­ni­za­tion. How­ever, to max­imise im­pact, Yamey has agreed in fu­ture to limit the se­lec­tion to those recom­mended by GiveWell and An­i­mal Char­ity Eval­u­a­tors, plus one US crim­i­nal jus­tice or­ga­ni­za­tion sug­gested by rele­vant staff at the Open Philan­thropy Pro­ject.

Example portfolio

Costs and revenue

In ad­di­tion to the op­por­tu­nity costs as­so­ci­ated with the two days that Yamey spends on Dona­tional each week, the pro­ject op­er­ated on a bud­get of $32,909 in 2018. How­ever, it also earns rev­enue by charg­ing a 2% fee on all dona­tions pro­cessed through the plat­form. Once the stu­dents who have taken the OFTW pledge be­gin to grad­u­ate, Yamey pre­dicts around $1.5 mil­lion per year will be dis­bursed, mak­ing the pro­ject roughly cost-neu­tral.

The Cor­po­rate Am­bas­sador Program

While con­tin­u­ing its part­ner­ships and re­main­ing available for in­di­vi­d­u­als who find the web­site in­de­pen­dently, Dona­tional is con­sid­er­ing launch­ing a Cor­po­rate Am­bas­sador Pro­gram (CAP). The plan is to re­cruit vol­un­teers at com­pa­nies, who would then en­courage their col­leagues to donate through Dona­tional. Th­ese ‘am­bas­sadors’ would be given re­sources and train­ing, en­abling them to more effec­tively pitch the idea of high-im­pact giv­ing to their cowork­ers. If the pro­gram is suc­cess­ful, its di­rect im­pacts would be three­fold:

  1. Dona­tional hopes its users will donate more reg­u­larly than they oth­er­wise would have.

  2. Dona­tional hopes its users will donate larger sums each time than they oth­er­wise would have.

  3. Dona­tional hopes peo­ple will donate to more im­pact­ful char­i­ties than they oth­er­wise would have.

If the pro­gram is suc­cess­ful at a large scale, there may be ad­di­tional benefits that are less di­rect. In par­tic­u­lar, Dona­tional hopes to con­tribute to a cul­ture shift in work­places, helping effec­tive giv­ing to be­come the norm rather than the ex­cep­tion.

The re­main­der of this re­port eval­u­ates CAP against our five crite­ria, be­gin­ning with a cost-effec­tive­ness es­ti­mate.

Cost-effec­tive­ness Analysis

Our process

Re­think Grants ex­plic­itly mod­els the costs and con­se­quences of the pro­posed pro­ject to gen­er­ate a cost-effec­tive­ness es­ti­mate (CEE). Our ap­proach to cost-effec­tive­ness anal­y­sis has a num­ber of no­table fea­tures:

  • The anal­y­sis is as trans­par­ent as pos­si­ble with­out sac­ri­fic­ing too much pre­ci­sion. Depend­ing on the na­ture of the pro­ject, the time available, and the type of anal­y­sis re­quired, the model may be cre­ated in Guessti­mate, Google Sheets, Ex­cel, or R. The model struc­ture and in­di­vi­d­ual pa­ram­e­ters are clearly de­scribed and jus­tified, and its limi­ta­tions are high­lighted and dis­cussed.

  • In line with the growth ap­proach, we aim to as­sess the pro­ject’s long-run cost-effec­tive­ness, not just the im­pact of a marginal dol­lar. This nor­mally in­volves es­ti­mat­ing the costs and con­se­quences of the pro­ject over differ­ent time pe­ri­ods and at differ­ent scales, tak­ing into ac­count the prob­a­bil­ity of reach­ing (or du­ra­tion at) each scale.

  • Po­ten­tial bi­ases, such as the plan­ning fal­lacy, are ex­plic­itly con­sid­ered and fac­tored into the anal­y­sis where pos­si­ble.

  • Model pa­ram­e­ters of­ten in­volve con­sid­er­able sub­jec­tive judge­ment. Where the CEE is likely to be sen­si­tive to this, we may take a weighted av­er­age of prob­a­bil­ity dis­tri­bu­tions elic­ited from mul­ti­ple RG team mem­bers and/​or rele­vant ex­perts. The model is also de­signed so that users can eas­ily re­place the team’s in­puts with their own as­sump­tions.

  • Where fea­si­ble, the pri­mary anal­y­sis is prob­a­bil­is­tic, tak­ing into ac­count un­cer­tainty around all the pa­ram­e­ters. This usu­ally pro­duces a more ac­cu­rate CEE than a de­ter­minis­tic model, and en­ables more in­for­ma­tive analy­ses of de­ci­sion un­cer­tainty (Clax­ton, 2008).

  • A range of meth­ods are used to char­ac­ter­ise un­cer­tainty. Th­ese may in­clude con­fi­dence in­ter­vals, cost-effec­tive­ness planes, cost-effec­tive­ness ac­cept­abil­ity curves and fron­tiers, one-way and multi-way de­ter­minis­tic sen­si­tivity analy­ses, and as­sess­ments of het­ero­gene­ity (see Briggs et al., 2012 for an overview). We may also es­ti­mate the value of gath­er­ing ad­di­tional in­for­ma­tion, to de­ter­mine whether it would be worth con­duct­ing fur­ther re­search be­fore mak­ing a fi­nal de­ci­sion (Wil­son et al., 2014).

  • Dis­count rates may be ap­plied to both costs and out­comes. The ap­pro­pri­ate figures will vary among pro­jects but – in line with the philo­soph­i­cal con­sen­sus – the rate for out­comes does not give any weight to pure time prefer­ence (the idea that benefits are worth less in the fu­ture, just be­cause they’re in the fu­ture).

While as­sess­ing cost-effec­tive­ness is the pri­mary goal of the eval­u­a­tion, we do not take ex­pected value es­ti­mates liter­ally. To avoid the illu­sion of high pre­ci­sion, we there­fore also rate cost-effec­tive­ness more sub­jec­tively as Low, Medium, or High. This is done with refer­ence to ‘best buys’ in the same cause area, rather than by try­ing to use one out­come met­ric for a broad range of in­ter­ven­tions. As well as en­abling com­par­i­son with other crite­ria, this ap­proach is in line with our efforts to­wards wor­ld­view di­ver­sifi­ca­tion.

The ap­pli­ca­tion of our meth­ods to Dona­tional’s Cor­po­rate Am­bas­sador Pro­gram is de­scribed in de­tail be­low.

Methods

We con­structed a math­e­mat­i­cal model to es­ti­mate the ex­pected costs (in US dol­lars), con­se­quences, and cost-effec­tive­ness of CAP. We ex­cluded in­di­rect effects, moral un­cer­tainty, and team strength from the fi­nal model due to the difficulty of mak­ing mean­ingful quan­ti­ta­tive es­ti­mates; these are ad­dressed more sub­jec­tively as sep­a­rate crite­ria. This sec­tion out­lines some key method­olog­i­cal choices, the model struc­ture, our meth­ods for es­ti­mat­ing pa­ram­e­ter in­puts, and the sen­si­tivity analy­ses we car­ried out on the re­sults.

Software

We cre­ated the model in Microsoft Ex­cel as it seemed like the best com­pro­mise be­tween trans­parency and func­tion­al­ity. Google Sheets is a lit­tle more ac­cessible for end users, but com­plex mod­el­ling can be trick­ier in Sheets, es­pe­cially when macros are needed. R is pow­er­ful but the pri­mary mod­el­ler was not profi­cient in it, and the calcu­la­tions can be harder to ex­am­ine for those un­fa­mil­iar with the lan­guage. Guessti­mate is more con­ve­nient for some kinds of straight­for­ward prob­a­bil­is­tic es­ti­mates, but it lacks some key fea­tures (such as charts) that are nec­es­sary for im­por­tant sen­si­tivity analy­ses, and can run into difficulty when the dis­tri­bu­tion of costs and/​or effects in­cludes nega­tive num­bers.[1]

Out­come measures

While it is the effect on over­all wellbe­ing that ul­ti­mately mat­ters, it was not fea­si­ble in the time available to con­vert the out­comes of a di­verse group of char­i­ties into one com­mon met­ric. In­stead, the mea­sure of benefit was dona­tions to CAP-recom­mended char­i­ties, ad­justed for coun­ter­fac­tual im­pact.

The pri­mary out­come mea­sure, which con­sti­tutes our CEE, was the dona­tion-cost ra­tio (DCR), i.e. the num­ber of (time dis­counted) dol­lars donated per (time dis­counted) dol­lar spent on the pro­gram. The CEE could also be ex­pressed as a cost-dona­tion ra­tio (CDR), which is more similar in form to widely-used cost-effec­tive­ness ra­tios (such as dol­lars per life saved) in that it di­vides the costs by the out­comes. How­ever, the CDR’s in­ter­pre­ta­tion is less in­tu­itive for this kind of pro­ject, e.g. a re­turn of $7 mil­lion for an ex­pen­di­ture of $1 mil­lion im­plies a DCR of 7 but a CDR of 0.142.

Note that the DCR is not the same as the benefit-cost ra­tio widely used in eco­nomics, which puts costs and effects in the same (mon­e­tary) units. Un­like in a benefit-cost ra­tio, a dol­lar spent on the pro­gram may have a differ­ent op­por­tu­nity cost (benefit fore­gone) than a dol­lar donated – ex­pen­di­ture may gen­er­ate ei­ther more or less value than equiv­a­lent dona­tions. Nor is it quite the same as a re­turn on in­vest­ment, which is typ­i­cally based on ab­solute rev­enue rather than the (dis­counted) net pre­sent value of in­vest­ments. It is equiv­a­lent to what One for the World, The Life You Can Save (TLYCS), and Giv­ing What We Can (GWWC) call their “lev­er­age ra­tio”, al­though there are some differ­ences in the method­ol­ogy used to es­ti­mate it.

For the base case (pri­mary) anal­y­sis, cost-effec­tive­ness was as­sessed by pro­gram year (see the C-E by Pro­gram Year work­sheet). In other words, all costs and dona­tions were at­tributed to the year the am­bas­sadors that caused them were re­cruited, rather than the year they took place. For ex­am­ple, the pre­sent-dis­counted life­time value of a pledge (Value of Pledge work­sheet) taken in year 1 was con­sid­ered a year 1 out­come, even though a large pro­por­tion of the dona­tions would not be re­ceived un­til much later. This seemed to provide the most rele­vant in­for­ma­tion, since we as­sume fun­ders would be more in­ter­ested in how much value would be cre­ated by fund­ing the pro­gram for a cer­tain pe­riod, rather than when the im­pact would be re­al­ized. How­ever, we also calcu­lated the ab­solute vol­ume of dona­tions pro­cessed in each year, pri­mar­ily to help Yamey with plan­ning, and the cost-effec­tive­ness by year of dis­burse­ment (Dis­burse­ments work­sheet).

Comparator

The com­para­tor for the sake of this anal­y­sis is im­plic­itly ‘Do Noth­ing’, which we as­sume has no costs or con­se­quences. Ideally, we would have com­pared CAP di­rectly to one or more al­ter­na­tive pro­jects and calcu­lated the in­cre­men­tal DCR (the differ­ence in dona­tions di­vided by the differ­ence in costs). For ex­am­ple, if CAP has ex­pected dona­tions of $10 mil­lion and costs of $2 mil­lion, and the al­ter­na­tive has dona­tions of $8 mil­lion and costs of $1 mil­lion, the DCR for CAP would be ($10M /​ $2M) = 5, but the in­cre­men­tal DCR would be ($10M - $8M) /​ ($2M - $1M) = $2M /​ $1M = 2. So long as the al­ter­na­tive was a vi­able op­tion, the rele­vant figure would have been 2x, which re­flects the re­turn achieved com­pared to what would oth­er­wise hap­pen (the coun­ter­fac­tual). How­ever, af­ter dis­cus­sions with rele­vant or­ga­ni­za­tions, it was un­clear whether an­other similar pro­gram would be run, when it would be­gin, whether CAP would dis­place that pro­gram (rather than run alongside it), who would fund it, and how costly and effec­tive it would be in com­par­i­son. We there­fore de­cided to dis­re­gard po­ten­tial al­ter­na­tives in the main anal­y­sis – though they have in­fluenced our cost-effec­tive­ness thresh­old and are dis­cussed later in this re­port.

Cost-effec­tive­ness threshold

We eval­u­ated CAP with refer­ence to the min­i­mum ac­cept­able dona­tion-cost ra­tio (minDCR). Cost-effec­tive­ness thresh­olds like this should ideally be based on the op­por­tu­nity cost of car­ry­ing out the pro­gram, which de­pends on how those re­sources would oth­er­wise be spent. For in­stance, marginal gov­ern­men­tal health spend­ing in In­dia averts a dis­abil­ity-ad­justed life-year for around $300 (Peas­good, Foster, & Dolan, 2019, p. 35) so spend­ing more than this from a fixed gov­ern­ment bud­get is likely to cause more harm than good. How­ever, the op­por­tu­nity cost is very hard to es­ti­mate in the case of CAP, since we are not cer­tain of who the fun­der would be, or how the funds would oth­er­wise be used. We there­fore com­pared the out­comes to three po­ten­tial thresh­olds:

  • 1x (mean­ing a minDCR of 1, i.e. a dol­lar donated for ev­ery dol­lar spent on the pro­gram) may be con­sid­ered an ab­solute lower bound, since a lower ra­tio im­plies that it would be bet­ter to donate the money di­rectly to the char­i­ties.

  • 3x is ap­prox­i­mately the re­turn that both Yamey and the Re­think Grants team con­sider the min­i­mum to make the pro­ject worth­while, and is there­fore the pri­mary refer­ence point.

  • 10x is roughly in line with cost-effec­tive­ness es­ti­mates by the most com­pa­rable ex­ist­ing pro­grams, OFTW (which kindly pro­vided ac­cess to its in­ter­nal CEA) and TLYCS. GWWC claims a “re­al­is­tic” lev­er­age ra­tio of more than 100:1, but a re­cent anal­y­sis by Re­think Pri­ori­ties casts doubt on the es­ti­mate. GWWC is also of a sub­stan­tially differ­ent na­ture from CAP, TLYCS, and OFTW in that it pri­mar­ily tar­gets a small num­ber of com­mit­ted effec­tive al­tru­ists rather than larger num­bers of ‘or­di­nary’ donors.

Time horizon

Re­sults are pre­sented for three time hori­zons: 3, 10, and 20 years. We chose 10 years for the base case be­cause it seemed like a rea­son­able com­pro­mise be­tween rec­og­niz­ing long-term po­ten­tial and be­ing able to make mean­ingful pre­dic­tions. Re­sults for other hori­zons can eas­ily be ob­tained us­ing the “User in­put” cell next to the Hori­zon pa­ram­e­ter (#42 in the Pa­ram­e­ters work­sheet).

By de­fault, the model in­cludes a pi­lot year in the main re­sults. Pre­limi­nary ‘back of the en­velope’ calcu­la­tions early in the eval­u­a­tion pro­cess sug­gested that the DCR would not be high or cer­tain enough to jus­tify large-scale fund­ing from the start, so we switched to con­sid­er­ing fund­ing for a pi­lot study. The pi­lot pe­riod is con­sid­ered ‘year 0’ so a 3-year hori­zon ac­tu­ally cov­ers 4 years (years 0, 1, 2, and 3), a 10-year hori­zon 11 years, and so on. We felt this was ap­pro­pri­ate as the pi­lot costs and out­comes con­tribute to its ex­pected value. How­ever, the pi­lot year can eas­ily be ex­cluded from the anal­y­sis us­ing the switch on the right of the Main Re­sults work­sheet.

Similarly, the prob­a­bil­ity of CAP pro­gress­ing be­yond the pi­lot study (pa­ram­e­ter #41) can be switched off. It does not af­fect the DCR as the costs and dona­tions are mul­ti­plied by the same num­ber, but it does af­fect the to­tal ex­pected costs and im­pact. Th­ese switches also make the model eas­ier to up­date af­ter the pi­lot study (should one take place).

Structure

The struc­ture of the model was based heav­ily on Yamey’s de­scrip­tion of the in­tended pro­gram, but also took into ac­count in­for­ma­tion from team mem­bers and effec­tive al­tru­ists with rele­vant ex­pe­rience, such as those who had en­gaged in work­place fundrais­ing. The pa­ram­e­ters can be grouped into ones re­lat­ing to im­pact, costs, and both con­cur­rently.

Impact

In the broad­est terms, the effec­tive­ness of the pro­ject was con­sid­ered a func­tion of the num­ber of am­bas­sadors, the num­ber of donors each am­bas­sador man­aged to re­cruit, and the size of the dona­tions.

Ambassadors

To re­flect differ­ent po­ten­tial growth tra­jec­to­ries, am­bas­sador num­bers were only di­rectly es­ti­mated for the pi­lot study (Pa­ram­e­ter #1) and year 1 of the full pro­gram (#2), with sub­se­quent years’ num­bers ob­tained us­ing an an­nual (lin­ear) growth rate (#3) and a max­i­mum scale (#4), both mea­sured in terms of the num­ber of am­bas­sadors re­cruited. The pro­gram was as­sumed to re­main at that scale in­definitely once reached. This struc­ture was in­formed by Yamey’s be­lief – which we shared – that, at some point, there would be diminish­ing re­turns to scale. This could oc­cur be­cause re­cruit­ing am­bas­sadors would be­come more difficult once the ‘low-hang­ing fruit’ (such as per­sonal con­tacts in com­pa­nies with a cul­ture re­cep­tive to effec­tive giv­ing) have been ex­hausted, and be­cause or­ga­ni­za­tions may be more difficult to man­age be­yond a cer­tain size. Am­bas­sadors were as­sumed to re­main ac­tive for a max­i­mum of two years, af­ter which we thought most donor re­cruit­ment op­por­tu­ni­ties would have already been taken. A com­pos­ite pa­ram­e­ter, the num­ber of am­bas­sador-years (i.e. av­er­age years of ac­tive donor re­cruit­ment by one am­bas­sador), was calcu­lated based on the am­bas­sador “churn” (non-par­ti­ci­pa­tion) in each year (#5 and #6).

Donors

Donors were di­vided into “pledgers” who com­mit to re­cur­ring pay­ments, and “one-time donors”. The num­ber of each type re­cruited per am­bas­sador-year was es­ti­mated di­rectly (#7 and #8), along with the donor churn – the pro­por­tion of the value of pledged dona­tions that are not re­ceived, due pri­mar­ily to pledgers can­cel­ling pay­ments. Churn was es­ti­mated sep­a­rately for the first year af­ter tak­ing the pledge (#9) and sub­se­quent years (#10), as ev­i­dence from TLYCS (pro­vided by email) and OFTW sug­gests at­tri­tion would be high­est soon af­ter the pledge be­comes ac­tive.

Donations

The size of dona­tions was also es­ti­mated sep­a­rately for the one-off (#11) and re­cur­ring (#12) dona­tions, since they are likely to be differ­ent. They were then ad­justed for ‘fung­ing’ – roughly speak­ing, the dis­place­ment of funds from other sources. “EA fung­ing” (#13) is the pro­por­tion of the value of the dona­tions that would have been re­ceived by EA char­i­ties in the ab­sence of CAP. This in­cludes ‘di­rect’ fung­ing: any dona­tions made by peo­ple re­cruited through CAP that they would have made to effec­tive al­tru­ist causes any­way, e.g. be­cause they would have taken an­other EA pledge (TLYCS, OFTW, GWWC) or found EA or­gani­sa­tions through other chan­nels. It also cov­ers ‘in­di­rect fung­ing’: the pro­por­tion of the im­pact of CAP dona­tions that would have ob­tained any­way, e.g. be­cause large fun­ders would have (par­tially) filled the fund­ing gap of CAP-recom­mended char­i­ties, and with a smaller op­por­tu­nity cost than CAP dona­tions. In par­tic­u­lar, Good Ven­tures reg­u­larly grants to GiveWell top char­i­ties based in part on the size of their fund­ing gaps, which would be smaller in a world with CAP. There are rea­sons to be­lieve Good Ven­tures’ giv­ing is pri­mar­ily con­strained by the availa­bil­ity of re­search into giv­ing op­por­tu­ni­ties, so the main effect of their giv­ing less to GiveWell char­i­ties would be to hold on to the money for sev­eral years or decades, at which point we might ex­pect there to be less im­pact per dol­lar. How­ever, Good Ven­tures does not fill the fund­ing gaps en­tirely, and giv­ing later would pre­sum­ably still have some benefit, so much less than 100% of a CAP dona­tion is ‘funged’.

“Non-EA fung­ing” (#14) is the pro­por­tion of the re­main­ing dona­tions that would have gone to char­i­ties not cur­rently recom­mended by EA or­ga­ni­za­tions had they not been given via CAP. For ex­am­ple, a donor may can­cel their monthly dona­tions to Ox­fam and give (some of) it to the Against Malaria Foun­da­tion in­stead; or they might have started giv­ing money they wouldn’t have oth­er­wise donated to Ox­fam had they not been ex­posed to CAP.

With the as­sis­tance of a fur­ther pa­ram­e­ter, the value of non-EA dona­tions rel­a­tive to EA dona­tions (#15), these are used to con­struct two “ad­justed mean dona­tion” com­pos­ite pa­ram­e­ters – one each for one-time and re­cur­ring dona­tions – that are used in the rest of the model. Th­ese rep­re­sent the coun­ter­fac­tual im­pact of dona­tions bet­ter than the ab­solute dona­tion size.

Costs

The net cost of a pro­gram is a func­tion of ex­pen­di­ture and rev­enue.

Expenses

La­bor dom­i­nated the cost pa­ram­e­ters. First, above a cer­tain num­ber of am­bas­sadors (#18), am­bas­sador man­agers would be re­quired. In ad­di­tion to their an­nual salary (#16), the num­ber of man­agers was es­ti­mated based on the num­ber of am­bas­sadors the team be­lieved each man­ager could han­dle (in­clud­ing re­cruit­ment, train­ing, and sup­port while ac­tive) (#17). Se­cond, we es­ti­mated the scale above which a part-time (#20) and a full-time (#21) chief op­er­at­ing officer (COO) would be needed to lead the day-to-day ac­tivi­ties of the pro­gram, along with their salary (#19). Third, the cost of hiring a soft­ware de­vel­oper to pro­cess dona­tions was based on an ex­pected hourly rate (#22) and the vol­ume of dona­tions pro­cessed by the plat­form (#23). (The soft­ware de­vel­oper would not pro­cess the dona­tions di­rectly, but Yamey be­lieves that the work re­quired – such as sup­port­ing user ac­counts – scales roughly with dona­tion vol­ume.) Fourth, Yamey thinks that a sec­ond soft­ware de­vel­oper (#24), to work on in­fras­truc­ture, such as tools for am­bas­sadors to com­mu­ni­cate with each other, would be needed above a cer­tain scale (#25). Fifth, Yamey also thought a part-time mar­keter (#26) would be needed af­ter reach­ing a cer­tain thresh­old (#27) – though not dur­ing a pi­lot study of any size – and a full-time mar­keter (#28) at a larger scale. Sixth, af­ter the pi­lot year, Yamey would also like to hire a con­trac­tor to do graphic de­sign (#29), with the num­ber of hours de­pend­ing on the num­ber of am­bas­sadors (#30 and #31).

There are three non-la­bor ex­pense pa­ram­e­ters. Beyond the pi­lot, CAP will have to pay for web host­ing and in­for­ma­tion tech­nol­ogy-re­lated ex­penses (#32). Based on his pre­vi­ous ex­pe­rience, Yamey thinks a mul­ti­ple of the square root of dona­tions is the best way to cap­ture the re­turns to scale for this item. Am­bas­sadors may also in­cur mar­ket­ing and travel costs (#33), though we as­sumed there would be a dis­count above 400 am­bas­sadors (#34) – a fairly ar­bi­trary figure pro­vided by Yamey. Fi­nally, we as­sumed mis­cel­la­neous costs (#35) as a pro­por­tion of all other costs com­bined, as this seems to be stan­dard in pro­gram bud­get­ing.

Revenue

Dona­tional cur­rently charges a fee of 2% to pro­cess dona­tions. Yamey in­tends to ap­ply this to dona­tions through CAP as well (#36), in or­der to partly offset the costs of run­ning the pro­gram, though we did not as­sume the rate would nec­es­sar­ily re­main at 2%.

In­fla­tion and discounting

An an­nual in­fla­tion rate (#37) was ap­plied to both costs and dona­tions. This cap­tures the ten­dency for costs to rise over time. It is also ap­plied to dona­tions in this model, since we might ex­pect the salaries of donors, and there­fore dona­tion size, to in­crease at roughly the same rate.

The choice of dis­count rates was much more com­pli­cated, and differ­ent rates were re­quired for differ­ent parts of the model. How­ever, the team pro­vided sep­a­rate rates for three other cat­e­gories of rea­sons for dis­count­ing, based in part on analy­ses by GiveWell staff (James Snow­den, Caitlin McGu­gan, Emma Trefethen, and Josh Rosen­berg).

  • Im­prov­ing cir­cum­stances and rein­vest­ment (#38). It is widely be­lieved that spend­ing now gen­er­ally cre­ates more benefit than spend­ing the same amount later (diminish­ing marginal util­ity of con­sump­tion), as peo­ple tend to be get­ting richer, healthier, etc. More­over, benefi­cia­ries can make cap­i­tal in­vest­ments that grow over time, which makes dona­tions more valuable sooner than later. While these two pro­cesses are in­de­pen­dent, they were com­bined into one rate as they both re­flect the ‘real’ value (op­por­tu­nity cost) of a given cost or dona­tion, as dis­tinct from the prob­a­bil­ity of that cost or dona­tion oc­cur­ring.

  • Back­ground un­cer­tainty (#39). Roughly speak­ing, this rep­re­sents the risk of the pro­gram clos­ing down due to a catas­trophic event, such as a nat­u­ral dis­aster, rapid tech­nolog­i­cal ad­vance, or eco­nomic col­lapse. More pre­cisely, it is the an­nual ex­pected pro­por­tion of costs and out­comes that are not coun­ter­fac­tu­ally caused by CAP due to fac­tors not di­rectly re­lated to the pro­gram.

  • Pro­gram un­cer­tainty (#40). This cov­ers the prob­a­bil­ity of CAP clos­ing down due to fac­tors other than the ‘back­ground’ risks men­tioned above. Ex­am­ple rea­sons in­clude poor out­comes, failure to ob­tain fund­ing, le­gal is­sues, and in­ter­nal con­flict. Note that this does not in­clude the ad­di­tional un­cer­tainty of the pi­lot year, which has its own pa­ram­e­ter, i.e. this is the prob­a­bil­ity of CAP failing each year given that it has pro­gressed be­yond the pi­lot.

Th­ese were com­bined into three com­pos­ite pa­ram­e­ters for use in differ­ent parts of the model:

  • (#38 + #39): An­nual dis­count rate for value of a pledge. The life­time value of a pledge (see the “Value of pledge” work­sheet) would be af­fected by changes in the value of money over time and ex­treme events that dis­rupt (or ren­der ob­so­lete) the dona­tions, but we as­sume that there would re­main a mechanism for col­lect­ing and dis­burs­ing the pledged funds even if the pro­gram shut down for in­ter­nal rea­sons.

  • (#38 + #39 + #40): An­nual dis­count rate for value of the pro­gram. The ex­pected value of the CAP pro­gram in any given year, as es­ti­mated in the C-E by Pro­gram Year work­sheet, is in­fluenced by all three fac­tors.

  • (#39 + #40): An­nual dis­count rate for value of dis­burse­ments. When pre­dict­ing the ab­solute num­ber of dol­lars pro­cessed by Dona­tional due to CAP, which is use­ful for plan­ning pur­poses, only the prob­a­bil­ity of those pay­ments hap­pen­ing is rele­vant. How­ever, note that Dis­burse­ments work­sheet also pro­vides es­ti­mates of the value of dis­burse­ments each year, which uses the dis­count rate for the value of the pro­gram, and those figures are the ba­sis for the cost-effec­tive­ness es­ti­mates by year of dis­burse­ment given in the Main Re­sults work­sheet.

Pa­ram­e­ter estimates

The pro­ce­dure de­vel­oped for ob­tain­ing pa­ram­e­ter val­ues tried to strike a bal­ance be­tween rigor and prac­ti­cal­ity. As well as get­ting es­ti­mates (usu­ally a best guess plus 90% con­fi­dence in­ter­val [CI]) from Yamey, we re­quested in­for­mal guessti­mates on key pa­ram­e­ters from sev­eral anony­mous in­di­vi­d­u­als with rele­vant knowl­edge and ex­pe­rience. We also ex­am­ined rele­vant in­for­ma­tion, such as OFTW’s 6-month up­date and ad­di­tional data kindly pro­vided by the OFTW team; TLYCS’s an­nual re­ports and email dis­cus­sions with its se­nior staff; GWWC’s im­pact as­sess­ment; and gen­eral in­for­ma­tion found on­line.

How­ever, pa­ram­e­ter val­ues were ul­ti­mately ob­tained by elic­it­ing and ag­gre­gat­ing prob­a­bil­ity dis­tri­bu­tions from the six Re­think Grants team mem­bers (TMs). The two ex­cep­tions to this were the num­ber of am­bas­sadors in the pi­lot study (#1), which we de­cided by con­sen­sus af­ter tak­ing into ac­count pre­limi­nary re­sults, and the time hori­zon (#42), as we had de­cided it would be more in­for­ma­tive to pre­sent re­sults for mul­ti­ple hori­zons. The pro­cess was based loosely on the Delphi method but also drew heav­ily on ma­te­ri­als and soft­ware de­vel­oped for use with the Sheffield Elic­i­ta­tion Frame­work (SHELF). De­tailed pro­to­cols for these (plus a more Bayesian ap­proach called Cooke’s method) are pro­vided in a re­port for the Euro­pean Food Safety Author­ity (EFSA, 2014). How­ever, with 40 pa­ram­e­ters to es­ti­mate, con­flict­ing sched­ules, and many other obli­ga­tions, it was not pos­si­ble for our team to fol­low ei­ther of them in full. For ex­am­ple, SHELF re­quires all ‘ex­perts’ (in this case TMs) to un­dergo cal­ibra­tion train­ing then gather to­gether in a multi-hour (of­ten multi-day) work­shop to pro­duce a ‘con­sen­sus dis­tri­bu­tion’ for each pa­ram­e­ter.

Elicitation

The elic­i­ta­tion pro­cess ob­tained five val­ues from each of the six TMs for each pa­ram­e­ter, in the fol­low­ing or­der:

  1. Lower plau­si­ble limit (L). The TM was al­most cer­tain the true value lay above this quan­tity (less than a 1 in 1,000 chance it was lower).

  2. Up­per plau­si­ble limit (U). The TM was al­most cer­tain the true value was lower (less than a 1 in 1,000 chance it was higher).

  3. Me­dian (M). The TM thought there was an equal chance it was higher or lower than this value.

  4. 5th per­centile (pc5). The TM be­lieved there was a 1 in 20 chance it was lower.

  5. 95th per­centile (pc95). The TM be­lieved there was a 1 in 20 chance it was higher.

The pro­cess had three rounds:

  1. Make ini­tial guessti­mates. The first round re­lied en­tirely on TMs’ ex­ist­ing knowl­edge to ob­tain pre­limi­nary figures. In or­der to min­i­mize bias, they placed the val­ues in their own spread­sheet, with­out dis­cus­sion, view­ing oth­ers’ in­puts, or do­ing fur­ther read­ing. They were al­lowed to skip this round for any pa­ram­e­ters about which they felt it was im­pos­si­ble to make mean­ingful guessti­mates, or if they were very short on time.

  2. Con­sider ad­di­tional in­for­ma­tion. In this round, TMs were given ad­di­tional rele­vant in­for­ma­tion about each pa­ram­e­ter. This in­cluded in­for­ma­tion gath­ered by the pri­mary cost-effec­tive­ness an­a­lyst, such as Yamey’s own es­ti­mates, data from similar pro­grams like OFTW, and com­ments from third par­ties with rele­vant ex­pe­rience. Be­fore up­dat­ing their es­ti­mates, they were also en­couraged to do their own re­search and to use the SHELF-sin­gle or MATCH web apps to fit a prob­a­bil­ity dis­tri­bu­tion to their in­puts. They were then asked to record a “con­fi­dence” score be­tween 0 and 10, rep­re­sent­ing how much they trusted their in­puts, plus a ra­tio­nale for their re­sponses.

  3. Con­sider other team mem­bers’ in­puts. After ev­ery­one had com­pleted Rounds 1 and 2 for all pa­ram­e­ters, they looked at other TMs’ es­ti­mates and com­ments, and used that new in­for­ma­tion to up­date their own if they wished.

All TMs were pro­vided de­tailed in­struc­tions for each round, in­clud­ing ways to im­prove the ac­cu­racy of their es­ti­mates and min­i­mize bias. Each pa­ram­e­ter was also given a pri­or­ity score from 1 to 5, based loosely on the re­sults of a pre­limi­nary sen­si­tivity anal­y­sis, which TMs could use to guide how much time to spend on con­sid­er­ing their in­puts.

Fit­ting and aggregation

In­puts from all TMs were com­bined in an Ex­cel spread­sheet (see Team In­puts). Jeremy Oak­ley, de­vel­oper of the SHELF R pack­age and Shiny apps, kindly wrote an R script to fit a dis­tri­bu­tion to each one. Pa­ram­e­ters with a hard lower bound but no the­o­ret­i­cal max­i­mum (such as am­bas­sador num­bers, which could not be nega­tive) were fit­ted to log­nor­mal or gamma dis­tri­bu­tions. Those with hard up­per and lower bounds (such as prob­a­bil­ities) were fit­ted to beta. The nor­mal dis­tri­bu­tion was used for the re­main­der.

A ‘lin­ear pool’ (weighted av­er­age) of dis­tri­bu­tions for each pa­ram­e­ter was then gen­er­ated within Ex­cel, with weights de­ter­mined by TMs’ self-re­ported con­fi­dence lev­els. Speci­fi­cally, the for­mula (see the “Pooled sam­ple” column) chose a sam­ple from one of the six dis­tri­bu­tions (“Sam­ple” column), where the prob­a­bil­ity of each dis­tri­bu­tion be­ing cho­sen was pro­por­tional to its weight (“Weight” column). All pa­ram­e­ter in­puts were de­rived from these pooled dis­tri­bu­tions. There are many other ways of math­e­mat­i­cally ag­gre­gat­ing dis­tri­bu­tions (e.g. log-lin­ear pool­ing and fully Bayesian meth­ods), but the ev­i­dence sug­gests lin­ear pool­ing tends to be com­pa­rably ac­cu­rate as well as be­ing much more straight­for­ward (e.g. see O’Ha­gan et al., 2006, chap­ter 9).

We en­coun­tered some difficul­ties dur­ing this pro­cess, such as miss­ing in­puts and poorly-fit­ting dis­tri­bu­tions.

Ap­pendix 1 out­lines these challenges, the steps taken to ad­dress them, and some po­ten­tial ways of im­prov­ing the pro­ce­dure in fu­ture eval­u­a­tions.

Sen­si­tivity analysis

We as­sessed un­cer­tainty around the re­sults us­ing both prob­a­bil­is­tic and de­ter­minis­tic sen­si­tivity analy­ses (PSA and DSA, re­spec­tively).

Prob­a­bil­is­tic analysis

The prob­a­bil­is­tic anal­y­sis used 5,000 Monte Carlo simu­la­tions to gen­er­ate ex­pected costs and out­comes. This method takes a ran­dom sam­ple from each of the in­put dis­tri­bu­tions, records the re­sults, and re­peats the pro­cess many times. The DCR calcu­lated from the prob­a­bil­is­tic point es­ti­mates (means) af the costs and dona­tions is con­sid­ered the base case CEE.

The simu­la­tions can be used to char­ac­ter­ise over­all un­cer­tainty much bet­ter than the DSA. We calcu­lated 90% CIs for the net costs and im­pact-ad­justed dona­tions, but it was not pos­si­ble to provide a mean­ingful CI for the DCR be­cause in some simu­la­tions the costs, dona­tions, or both were nega­tive. This can hap­pen when the rev­enue earned by charg­ing a pro­cess­ing fee is greater than the ex­pen­di­ture, or when CAP di­verts dona­tions from more effec­tive char­i­ties. Uncer­tainty around the DCR was there­fore rep­re­sented in other ways.

  • A cost-effec­tive­ness plane (a spe­cial kind of scat­ter­plot) illus­trated the spread of val­ues by plot­ting the re­sults of the simu­la­tions and the cost-effec­tive­ness thresh­olds.

  • Cost-effec­tive­ness ac­cept­abil­ity curves (CEACs) showed the prob­a­bil­ity of each al­ter­na­tive (CAP or Do Noth­ing) be­ing cost-effec­tive – hav­ing the high­est net benefit – at differ­ent thresh­olds. Net benefit is the value of dona­tions minus the cost of the pro­gram, similar to the con­cept of net pre­sent value used in other fields.[2]

  • A cost-effec­tive­ness ac­cept­abil­ity fron­tier (CEAF) showed the prob­a­bil­ity of the most cost-effec­tive op­tion at any given thresh­old be­ing op­ti­mal – hav­ing the high­est ex­pected net benefit – which is nor­mally the most rele­vant crite­rion for de­ci­sion-mak­ing. (In most cases, the op­tion with the high­est prob­a­bil­ity of be­ing cost-effec­tive, as in­di­cated by the CEACs, is also the op­ti­mal choice, but there are ex­cep­tions.)

  • We also calcu­lated the ex­pected value of perfect in­for­ma­tion (EVPI). This is the the­o­ret­i­cal max­i­mum that should be spent to re­move all un­cer­tainty in the model, which can help guide de­ci­sions such as how much (if any­thing) to in­vest in a pi­lot study.

Cost-effec­tive­ness planes are in­tro­duced in Black (1990). CEACs, CEAFs, and EVPI are ex­plained in more de­tail in Bar­ton, Briggs, & Fen­wick (2008), and the steps for calcu­lat­ing them in this case are de­tailed in Ap­pendix 2.

Since re­mov­ing all un­cer­tainty is in­fea­si­ble, it is im­por­tant to con­sider the value of the in­for­ma­tion that could re­al­is­ti­cally be ob­tained in a pi­lot study. There are es­tab­lished meth­ods for do­ing this, but they are rel­a­tively com­plex and, in Ex­cel, would re­quire a macro that runs for dozens of hours (see e.g. Briggs, Clax­ton, & Sculpher, 2006, chap­ter 7; Wil­son et al., 2014](https://​​link.springer.com/​​ar­ti­cle/​​10.1007/​​s40273-014-0219-x); Strong, Oak­ley, Bren­nan, & Breeze, 2015). We there­fore made ex­tremely rough es­ti­mates, as fol­lows.

  • We guessti­mated the pro­por­tion of the re­main­ing de­ci­sion un­cer­tainty that would be re­solved by the pi­lot study.

  • We mul­ti­plied that by the EVPI to get the ex­pected value of in­for­ma­tion ob­tained in the pi­lot.

  • We sub­tracted the es­ti­mated cost of the pi­lot, which gave us the ex­pected net benefit of the pi­lot. A pos­i­tive figure in­di­cates that the pi­lot would cost less than the value of the in­for­ma­tion ob­tained, sug­gest­ing it would be worth­while.

We did this for all three thresh­olds, and four po­ten­tial pi­lot types:

  • Yamey alone. Yamey thinks he could re­cruit and man­age about 5 am­bas­sadors with­out in­cur­ring sig­nifi­cant costs or re­quiring ex­ter­nal as­sis­tance.

  • Vol­un­teer. Yamey thinks a part-time vol­un­teer could man­age up to 10 am­bas­sadors. It is un­clear whether they would re­quire a stipend, but to be con­ser­va­tive we have as­sumed a to­tal cost of $10,000.

  • PT COO. Yamey’s prefer­ence is to hire a chief op­er­at­ing officer. He thinks a part-time COO, on a salary of about $40,000, could run up to 20 am­bas­sadors (though we sus­pect this is at the lower end of the fea­si­ble range) while also work­ing on over­all strat­egy.

  • FT COO. For about $80,000, Yamey be­lieves a full-time COO could han­dle up to 50 am­bas­sadors alongside other tasks.

Deter­minis­tic analysis

The de­ter­minis­tic CEE was ob­tained us­ing the means of the pooled dis­tri­bu­tions used in the PSA (de­scribed above). A DSA then iden­ti­fied the main sources of un­cer­tainty in or­der to guide in­for­ma­tion-gath­er­ing pri­ori­ties, both dur­ing this eval­u­a­tion and po­ten­tially in fu­ture stud­ies, such as the pi­lot study.

  • In a one-way sen­si­tivity anal­y­sis, each pa­ram­e­ter was set to the 5th and 95th per­centiles, and the re­sult­ing DCRs pre­sented in a tor­nado chart.

  • A thresh­old anal­y­sis de­ter­mined the value that each of the 10 most sen­si­tive pa­ram­e­ters would need to at­tain in or­der for the DCR to reach our po­ten­tial minDCR thresh­olds.

  • A two-way sen­si­tivity anal­y­sis recorded the DCRs re­sult­ing from chang­ing any two of the 10 most sen­si­tive pa­ram­e­ters at once.

Model verification

A num­ber of mea­sures were taken to min­imise the risk of er­ror in the model, in­clud­ing the fol­low­ing.

  • Named ranges re­duced the risk of er­ro­neous cell refer­ences.

  • The effect of vari­a­tions in in­puts on out­puts were ob­served, to en­sure they made sense (e.g. higher costs → lower cost-effec­tive­ness).

  • Con­sis­tency across differ­ent parts of the model was checked, e.g. higher EVPI when the CEAF is lower.

  • The per­centiles of fit­ted dis­tri­bu­tions were com­pared to each TM’s in­puts for each pa­ram­e­ter, and any sub­stan­tial dis­par­i­ties were in­ves­ti­gated fur­ther.

  • Sam­ples from the pooled dis­tri­bu­tions were gen­er­ated in R and com­pared to the Ex­cel-based re­sults.

  • Pa­ram­e­ter val­ues, and fi­nal re­sults, were com­pared to our pre­limi­nary es­ti­mates, and any ma­jor dis­par­i­ties in­ves­ti­gated.

  • Macros in­cluded an ex­pla­na­tion for ev­ery line of code, and key in­puts (such as num­ber of sam­ples) were dis­played in the work­sheet.

  • The model was thor­oughly checked by one RG team mem­ber, and less thor­oughly by other TMs, Yamey, and two ex­pe­rienced health economists.

Results

Pa­ram­e­ter estimates

Each TM’s es­ti­mates and fit­ted dis­tri­bu­tions are shown in the Team In­puts work­sheet, and the means and 90% con­fi­dence in­ter­vals of the pooled dis­tri­bu­tions are in the Pa­ram­e­ters work­sheet. Over­all, the in­puts were much more pes­simistic than those en­tered into pre­limi­nary mod­els, which were based heav­ily on Yamey’s guessti­mates. The main ex­cep­tion was the av­er­age pledge size of nearly $1,000, which the team thought would be close to those re­ported by OFTW for un­der­grad­u­ates, al­though our 90% CI was wide (ap­prox­i­mately $100–$3,000). Even af­ter ad­just­ing our es­ti­mates in light of each other’s in­puts and com­ments, there was con­sid­er­able di­ver­gence of opinion among TMs for many pa­ram­e­ters; for ex­am­ple, the me­dian es­ti­mates for 1st year donor churn ranged from 22% to 70%, EA fung­ing from 10% to 67%, and the num­ber of am­bas­sadors per man­ager from 15 to 80. Notably, the three in­di­vi­d­u­als who were most closely in­volved in the eval­u­a­tion tended to give more op­ti­mistic in­puts than the three more de­tached TMs. The av­er­age con­fi­dence score of 2.3/​10 – which in­di­cates how much we trusted our es­ti­mates be­yond the un­cer­tainty re­flected in the con­fi­dence in­ter­vals – also re­flects the highly spec­u­la­tive na­ture of most pa­ram­e­ters.

Prob­a­bil­is­tic analysis

Table 1: Base case prob­a­bil­is­tic results

Probabilistic results

In the base case, CAP is ex­pected to cost around $500,000 over a 10-year hori­zon, but with a very wide 90% con­fi­dence in­ter­val (ap­prox­i­mately $50,000–$1.5 mil­lion). Ex­pected im­pact-ad­justed dona­tions are about $1 mil­lion, with even greater un­cer­tainty ($7,000–$4 mil­lion). The base case cost-effec­tive­ness es­ti­mate is a dona­tion-cost ra­tio of 1.94, mean­ing just un­der $2 is donated to CAP-recom­mended char­i­ties for ev­ery dol­lar spent on the pro­gram. This is higher than our low­est po­ten­tial cost-effec­tive­ness thresh­old of 1x, but well be­low the pri­mary refer­ence point of 3x, and even fur­ther off the CEEs re­ported by other EA fundrais­ing or­ga­ni­za­tions.

Figure 1: Cost-effec­tive­ness plane

CE_plane

The cost-effec­tive­ness plane (Figure 1) shows a tight cluster of es­ti­mates with less than $2 mil­lion in dona­tions and costs, al­though a non-triv­ial pro­por­tion of each sur­pass this figure. Dona­tions in par­tic­u­lar are pos­i­tively skewed, with a hand­ful of the 5,000 sce­nar­ios reach­ing $20 mil­lion and be­yond (not shown on the plane for pre­sen­ta­tional rea­sons). The mark­ers in the north-west (top left) quad­rant rep­re­sent sce­nar­ios where Do Noth­ing strictly dom­i­nates CAP (i.e. CAP causes nega­tive effects with a pos­i­tive cost), likely re­flect­ing the small risk that CAP would dis­place dona­tions to more effec­tive char­i­ties. Con­versely, the few es­ti­mates just in­side the south-east (bot­tom right) quad­rant sug­gest a very small chance that CAP would dom­i­nate Do Noth­ing (i.e. cost less – by gen­er­at­ing rev­enue greater than its ex­pen­di­ture – and cause more benefit).

Figure 2: Cost-effec­tive­ness ac­cept­abil­ity curves and frontier

CEACs & CEAF

The cost-effec­tive­ness ac­cept­abil­ity curves (Figure 2) sug­gest there is just a 36% chance of CAP be­ing cost-effec­tive (hav­ing higher net benefit than Do Noth­ing) at a minDCR of 1x. The cost-effec­tive­ness ac­cept­abil­ity fron­tier nev­er­the­less in­di­cates that CAP would be the op­ti­mal choice at that thresh­old (i.e. have the high­est ex­pected net benefit). This is be­cause the dis­tri­bu­tion of net benefits at that thresh­old is pos­i­tively skewed, with a mean higher than the me­dian. Beyond a minDCR of 1.94, how­ever, Do Noth­ing be­comes op­ti­mal; there is just a 15% chance of CAP be­ing op­ti­mal at 3x, and 4% at 10x. In other words, this anal­y­sis sug­gests that, for a risk-neu­tral donor, pay­ing up­front for the full 10-year pro­gram would only make sense if their min­i­mum ac­cept­able DCR was be­low about 2x.

Figure 3: Ex­pected value of perfect in­for­ma­tion (EVPI) at differ­ent cost-effec­tive­ness thresholds

EVPI

Table 2: Very ap­prox­i­mate es­ti­mates of the value of in­for­ma­tion that could be ob­tained from var­i­ous sizes of pi­lot study.

VOI_pilot

The ex­pected value of perfect in­for­ma­tion (Figure 3) at the 1x, 3x, and 10x thresh­olds is around $230,000, $170,000, and $30,000, re­spec­tively. Our crude es­ti­mates of the value of a pi­lot study are shown in Table 2, with green cells in­di­cat­ing that the pi­lot is prob­a­bly worth­while. At our pri­mary thresh­old of 3x, hiring a full-time COO to run a pi­lot with about 50 am­bas­sadors could be jus­tified, but the ex­pected net benefit – the differ­ence be­tween the value of in­for­ma­tion ob­tained and the costs in­curred – is a lit­tle higher for smaller pi­lots led by a part-time COO, a vol­un­teer on a stipend, or Yamey alone. With a minDCR of 10x, only a small pi­lot run by Yamey him­self (per­haps with the as­sis­tance of un­paid vol­un­teers) seems war­ranted. Note that these es­ti­mates dis­re­gard the dona­tions re­sult­ing from the pi­lot, which are ex­pected to be at least as high as the costs, so they may be con­sid­ered con­ser­va­tive. How­ever, it is also worth high­light­ing that value of in­for­ma­tion is very sen­si­tive to the time hori­zon: a pro­gram of shorter ex­pected du­ra­tion would gen­er­ate less to­tal value so it would not be worth spend­ing as much on find­ing out whether to sup­port it, and the con­verse would be true of a longer one.

Deter­minis­tic analysis

Table 3: Deter­minis­tic re­sults by pro­gram year

Deterministic results by program year

The de­ter­minis­tic anal­y­sis (based on the means of the pa­ram­e­ter in­puts) gave 10-year ex­pected costs of about $640,000. As in­di­cated by the pie chart (Figure 4), am­bas­sador man­agers ac­count for about half of ex­pen­di­ture, fol­lowed by the chief op­er­at­ing officer’s salary. The mar­keter and mis­cel­la­neous costs con­sti­tute a ma­jor­ity of the rest.

Figure 4: Break­down of costs

Costs breakdown

Ac­cord­ing to our model, the av­er­age am­bas­sador would gen­er­ate about $4,000 for CAP char­i­ties ($2,800 af­ter ad­just­ing for fung­ing, and $2,000 af­ter dis­count­ing). While around 72% of donors would give a one-off (rather than re­cur­ring) dona­tion, pledges ac­count for 89% of the ex­pected dona­tion vol­ume. Each pledge is es­ti­mated to have an im­pact-ad­justed dis­counted value of nearly $900 over a 20-year pe­riod; as in­di­cated by Figure 5, al­most all of this value is re­al­ized within the first five years af­ter tak­ing the pledge.

Figure 5: Value of a pledge over time

Value of a pledge

The 10-year de­ter­minis­tic dona­tion-cost ra­tio is 1.52, sig­nifi­cantly lower than the prob­a­bil­is­tic one. Cost-effec­tive­ness is worse (1.21) in the first three years, when am­bas­sador num­bers are high enough to in­cur con­sid­er­able labour costs but not high enough to gen­er­ate a lot of dona­tions; yet the model pre­dicts only mod­est re­turns to scale, with a DCR of just 1.56 at the 20-year mark. Ex­clud­ing the pi­lot study from the to­tals does not sig­nifi­cantly af­fect the cost-effec­tive­ness.

Figures by year of dis­burse­ment (Table 4) are lower due to the lag in re­ceiv­ing pledged dona­tions. Even over a 20-year hori­zon, the DCR is not ex­pected to reach 1. Note that these es­ti­mates as­sume the pro­gram (e.g. the re­cruit­ment of new am­bas­sadors) con­tinues at least as long as the given time hori­zon. If the pro­gram (and there­fore ex­pen­di­ture) stops, but the dona­tions from out­stand­ing pledges are still re­ceived in later years, the DCR at those later hori­zons will be higher.

Table 4: Deter­minis­tic re­sults by year of disbursement

Deterministic results by year of disbursement

One-way sen­si­tivity analysis

The tor­nado chart (Figure 6) gives some in­di­ca­tion of which pa­ram­e­ters con­tribute the most un­cer­tainty, though it can­not ac­count for in­ter­ac­tions among pa­ram­e­ters. Op­ti­mistic as­sump­tions for any one of three pa­ram­e­ters – mean pledge size, num­ber of pledges, and donor churn be­yond the first year – cause the DCR to com­fortably sur­pass the 3x thresh­old. The pes­simistic con­fi­dence limit for any of the top eight pa­ram­e­ters brings the DCR be­low 1. In­ter­est­ingly, the ‘pes­simistic’ 5th per­centile value for max­i­mum scale ac­tu­ally raises the DCR more than the ‘op­ti­mistic’ 95th per­centile, be­cause with a very low num­ber of am­bas­sadors the ma­jor costs are not yet in­curred. Some­thing similar hap­pens with 1st year am­bas­sador churn. This should not be taken to im­ply that a smaller pro­gram is prefer­able (the over­all im­pact is far lower), but it is one of sev­eral in­di­ca­tions that the pro­gram has limited re­turns to scale.

Figure 6: Tor­nado chart illus­trat­ing the re­sults of a one-way sen­si­tivity anal­y­sis on the 20 most sen­si­tive parameters

OWSA

Thresh­old analysis

The thresh­old anal­y­sis (Table 5) re­vealed that achiev­ing 3x would re­quire a mean dona­tion over $2,000, seven pledges per am­bas­sador, donor churn be­yond the first year of 12%, or (very un­re­al­is­ti­cally) 2nd year am­bas­sador churn of just 8% – as­sum­ing all other pa­ram­e­ters re­main un­changed. A 10x re­turn would re­quire about three pledges of $7,000 per am­bas­sador, or 23 pledges at the base case mean of just un­der $1,000, both of which seem fairly im­plau­si­ble. No change to any one of the other seven pa­ram­e­ters would en­able ei­ther 3x or 10x.

Table 5: Thresh­old anal­y­sis on the 10 most sen­si­tive parameters

Threshold analysis

Two-way sen­si­tivity analysis

The two-way anal­y­sis helps to cap­ture in­ter­ac­tions be­tween pairs of pa­ram­e­ters, which can lead to fluc­tu­a­tions in the cost-effec­tive­ness ra­tio greater than the sum of the changes caused by vary­ing them in­di­vi­d­u­ally. As shown in Figure 7, op­ti­mistic val­ues for any two of the three most sen­si­tive in­di­vi­d­ual pa­ram­e­ters – av­er­age pledge size (#12), av­er­age num­ber of pledgers per am­bas­sador-year (#8), and donor churn af­ter the first year (#10) – would en­able CAP to reach 10x. A fur­ther 25 com­bi­na­tions push the DCR past 3x, while pes­simistic con­fi­dence limits for al­most any two pa­ram­e­ters bring the DCR be­low 1.

Figure 7: Two-way sen­si­tivity anal­y­sis on the 10 most sen­si­tive parameters

TWSA

Qual­i­ta­tive assessment

Our over­all sub­jec­tive score for cost-effec­tive­ness is de­cided with refer­ence to ‘best buys’ of a similar na­ture. In this case, One for the World and The Life You Can Save are the most rele­vant com­para­tors, as they so­licit both pledges and one-off dona­tions from in­di­vi­d­u­als who do not nec­es­sar­ily con­sider them­selves ‘effec­tive al­tru­ists’. Of course, there are sig­nifi­cant differ­ences: OFTW pri­mar­ily op­er­ates in uni­ver­si­ties, and TLYCS ap­peals to a broad range of de­mo­graph­ics. But for a CAP-style pro­gram to war­rant fund­ing over these al­ter­na­tives, it should ar­guably demon­strate com­pa­rable cost-effec­tive­ness.

OFTW and TLYCS both re­port dona­tion-cost ra­tios of at least 10:1. The sub­jec­tive score was there­fore given us­ing the fol­low­ing crite­ria:

CE thresholds

The base case DCR of 1.94 is equiv­a­lent to 0.194x the best buy in our sub­jec­tive scor­ing frame­work. This falls clearly into the Low cat­e­gory.

Discussion

Our anal­y­sis sug­gests CAP is un­likely to be cost-effec­tive. The base case es­ti­mate of around $2 donated per dol­lar spent is be­low the 3x re­turn that both Yamey and the Re­think Grants team con­sider an ap­prox­i­mate lower bound for the pro­ject to be worth­while, and far from the 10x or higher re­ported by One for the World and The Life You Can Save. It con­se­quently re­ceives an over­all score of Low in our sub­jec­tive frame­work. With only a 15% chance of be­ing cost-effec­tive at the 3x thresh­old, it would be un­wise to in­vest in a full-scale pro­gram at this stage.

Nev­er­the­less, the anal­y­sis pro­vides a strong case for run­ning a pi­lot study. Our very rough es­ti­mates sug­gest that, as­sum­ing a to­tal pro­gram du­ra­tion of at least sev­eral years, a pi­lot of any rea­son­able size would cost far less than the value of the in­for­ma­tion it would gen­er­ate. A small or medium-sized study (5–30 am­bas­sadors) run by a part-time chief op­er­at­ing officer, a vol­un­teer, or Yamey alone seems to offer the most fa­vor­able trade-off be­tween in­for­ma­tion gain and cost.

Our sen­si­tivity analy­ses can be used to guide fur­ther re­search and pro­gram de­vel­op­ment. The pri­mary sources of un­cer­tainty ap­pear to be the num­ber and size of re­cur­ring dona­tions, and CAP’s abil­ity to re­tain both am­bas­sadors and donors, so these should be the fo­cus of the pi­lot study. It may also be worth putting some ad­di­tional re­sources into de­ter­min­ing the coun­ter­fac­tual im­pact of a dona­tion, tak­ing into ac­count fung­ing from both EA and non-EA sources. Since am­bas­sador man­ager salaries are the ma­jor cost, al­ter­na­tive pro­gram struc­tures that do not in­volve so much over­sight of vol­un­teers, or that use vol­un­teers to sup­port am­bas­sadors, should per­haps be con­sid­ered as well.

This anal­y­sis has many limi­ta­tions, only a few of which can be dis­cussed here. Over­all, it seems likely to have un­der­es­ti­mated the promis­ing­ness of CAP rel­a­tive to the al­ter­na­tives, for sev­eral rea­sons.

  • Eval­u­a­tions of com­para­tor pro­grams use differ­ent method­ol­ogy. We have not closely ex­am­ined the calcu­la­tions be­hind lev­er­age ra­tios re­ported by other or­ga­ni­za­tions, but it may not be ap­pro­pri­ate to make di­rect com­par­i­sons. For ex­am­ple, OFTW uses lower dis­count rates and does not ap­pear to take into ac­count fung­ing from large donors such as Good Ven­tures, while TLYCS does not ad­just for fung­ing at all (though it gives a helpful dis­cus­sion of coun­ter­fac­tual is­sues in its 2017 an­nual re­port). We sus­pect 10x is there­fore an un­rea­son­ably high bar.

  • More gen­er­ally, com­par­ing to ‘best buys’ of a similar na­ture could be mis­lead­ing. In par­tic­u­lar, we con­sid­ered any DCR un­der 5x Low, yet a DCR over 1x – as in our base case – would im­ply that sup­port­ing CAP would be bet­ter than donat­ing di­rectly to some of the most cost-effec­tive char­i­ties in the world. This sug­gests that fun­ders who would be happy to di­rectly sup­port GiveWell- or ACE-recom­mended char­i­ties ought to con­sider CAP a com­pet­i­tive op­por­tu­nity. Depend­ing on how the fun­der would oth­er­wise use the money, it is even pos­si­ble that a sub-1x re­turn would be cost-effec­tive (al­though Yamey has stated that he would likely not con­sider CAP worth­while in such cir­cum­stances).

  • It does not ac­count for in­di­rect benefits, which may well have higher ex­pected value than the di­rect im­pact. The po­ten­tial for cre­at­ing a cul­ture of effec­tive giv­ing in work­places, for ex­am­ple, could be more im­por­tant than the di­rect im­pact of the dona­tions. This is ad­dressed fur­ther in the Indi­rect Benefits sec­tion be­low.

  • It as­sumes a static pro­gram struc­ture. The model nec­es­sar­ily makes a num­ber of as­sump­tions about the na­ture of CAP, and im­plic­itly as­sumes these would re­main con­stant over the years. In re­al­ity, a well-run pro­gram would evolve in re­sponse to in­for­ma­tion, op­por­tu­ni­ties, and con­straints. For ex­am­ple, Dona­tional could look into pay­roll giv­ing, so­licit pledges from am­bas­sadors’ friends and fam­ily, or pur­sue a smaller num­ber of high-value dona­tions from se­nior man­agers at large firms. As dis­cussed in the Team Strength sec­tion, there are signs that Dona­tional would be ca­pa­ble of adapt­ing over time – and if the pro­gram still did not seem very promis­ing, Yamey has de­clared an in­ten­tion to close it down rather than con­tinue in­definitely, thereby min­i­miz­ing any losses.

There are also ways in which the anal­y­sis may fa­vor CAP.

  • It does not ac­count for in­di­rect harms. CAP could back­fire or have nega­tive ‘spillover’ effects that make it less cost-effec­tive or even harm­ful over­all. This is dis­cussed in the rele­vant sec­tion be­low.

  • It does not ac­count for moral un­cer­tainty. Un­der some wor­ld­views, CAP it­self, or some of the recom­mended char­i­ties, may do ac­tive harm. This is also ad­dressed in a sep­a­rate sec­tion.

  • It is vuln­er­a­ble to cog­ni­tive bi­ases. We tried to take into ac­count op­ti­mism bias when pro­vid­ing the pa­ram­e­ter in­puts, but it is a per­va­sive phe­nomenon and we can­not be sure that we en­tirely es­caped its in­fluence, par­tic­u­larly since we did not un­dergo for­mal cal­ibra­tion train­ing. After tak­ing a con­sid­er­able amount of both Yamey’s and RG team mem­bers’ time, there also is a dan­ger of be­ing in­fluenced by re­ciproc­ity – a sense of obli­ga­tion to offer some­thing in re­turn – and some­thing akin to the sunk cost fal­lacy – the feel­ing that, hav­ing in­vested so much in the eval­u­a­tion, it would be a shame not to recom­mend at least a lit­tle fund­ing. We con­sciously tried to re­sist these pres­sures, but we may not have been en­tirely suc­cess­ful.

  • It re­lied heav­ily on Yamey’s in­puts. Many of the RG team’s pa­ram­e­ter es­ti­mates, as well as the model struc­ture, were quite heav­ily in­fluenced by Yamey’s own pre­dic­tions. This is not nec­es­sar­ily ir­ra­tional as he was in a bet­ter po­si­tion to es­ti­mate some pa­ram­e­ters, and we did not get a strong sense that he was con­sciously try­ing to ex­ag­ger­ate the likely suc­cess of the pro­ject. How­ever, pro­ject founders are per­haps es­pe­cially vuln­er­a­ble to op­ti­mism bias (Cas­sar 2009), so it is pos­si­ble we gave too much weight to his guessti­mates, par­tic­u­larly given that he him­self was highly un­cer­tain about many of them.

  • It only con­sid­ers fi­nan­cial costs of per­son­nel time. Yamey and any staff, con­trac­tors, or vol­un­teers en­gaged in the pro­ject may oth­er­wise be do­ing things that are worth more than the costs used in this anal­y­sis. For ex­am­ple, a good COO might in­stead earn a high salary and donate a large pro­por­tion to char­ity, or work on an­other high-im­pact startup. How­ever, there are sev­eral rea­sons for dis­re­gard­ing these ‘op­por­tu­nity costs’, in­clud­ing the fol­low­ing.

    • It is very hard to make mean­ingful es­ti­mates of these, es­pe­cially be­fore the pro­gram has be­gun.

    • They may be at least partly ac­counted for in the cost-effec­tive­ness thresh­olds. We don’t think, say, a 2x re­turn would be worth­while be­cause the per­son­nel (not just the fun­ders) could have more im­pact through other ac­tivi­ties.

    • Most other rele­vant CEAs also use fi­nan­cial costs, so de­part­ing from this ten­dency may hin­der com­par­i­sons across pro­jects.

    It nev­er­the­less re­mains a con­cern, and users may wish to put their own cost as­sump­tions in the model.

Other limi­ta­tions of our meth­ods may sub­stan­tially af­fect the re­sults, but in an un­clear di­rec­tion.

  • Elic­ited pa­ram­e­ter in­puts are highly un­cer­tain. Our team mem­bers did not put a great deal of trust in their pa­ram­e­ter es­ti­mates. In many cases, this re­flected both un­cer­tainty about the quan­tity be­ing es­ti­mated, and difficulty iden­ti­fy­ing a stan­dard dis­tri­bu­tion that matched their be­liefs. This is un­der­stand­able, since there was no good in­for­ma­tion on the vast ma­jor­ity of them, and no op­por­tu­nity for proper cal­ibra­tion train­ing that could have im­proved our abil­ity to make good es­ti­mates. But it should be em­pha­sised that the con­fi­dence in­ter­vals do not cap­ture all of the un­cer­tainty around these crit­i­cal in­puts. Some ad­di­tional is­sues with our elic­i­ta­tion and ag­gre­ga­tion meth­ods are dis­cussed in Ap­pendix 1, along with some po­ten­tial ways of im­prov­ing the pro­cess.

  • Pa­ram­e­ter cor­re­la­tion is not fully cap­tured. A com­mon crit­i­cism of prob­a­bil­is­tic analy­ses is that they im­plic­itly treat all pa­ram­e­ters as in­de­pen­dent, which is of­ten un­re­al­is­tic. We par­tially ad­dressed this con­cern by mak­ing the costs de­pen­dent on in­di­ca­tors of scale (such as the num­ber of am­bas­sadors), but some re­la­tion­ships re­main un­mod­el­led. For ex­am­ple, pledge size may be higher when the num­ber of pledges is lower (nega­tive cor­re­la­tion), since it would likely re­flect a strat­egy of tar­get­ing a smaller num­ber of high earn­ers; and high first-year donor churn would be a good pre­dic­tor of high sec­ond-year donor churn (pos­i­tive cor­re­la­tion), since they are likely to be driven by similar fac­tors. The Monte Carlo simu­la­tions on which the base case re­sults de­pend con­se­quently in­clude some fairly im­plau­si­ble sce­nar­ios. The benefits of prob­a­bil­is­tic anal­y­sis al­most cer­tainly out­weigh these draw­backs, and users are free to re­place the in­puts with their own as­sump­tions for any and all pa­ram­e­ters, but it does add an ad­di­tional el­e­ment of un­cer­tainty.

  • More gen­er­ally, ex­pected value es­ti­mates should not (usu­ally) be taken liter­ally. The use of rel­a­tively so­phis­ti­cated meth­ods can give the illu­sion of greater cer­tainty than is war­ranted given the var­i­ous limi­ta­tions of both the model struc­ture and in­puts. Com­plex mod­els are also more prone to er­rors, and harder to repli­cate, than sim­ple ones. We have greater con­fi­dence in this anal­y­sis than we do in a ‘back of the en­velope’ calcu­la­tion, or in most mod­els used for eval­u­at­ing prospec­tive in­vest­ments, but we would be some­what sur­prised if it turned out to be a highly ac­cu­rate rep­re­sen­ta­tion of re­al­ity.

Given the short­com­ings of our cost-effec­tive­ness anal­y­sis, it is im­por­tant to also con­sider our other crite­ria – Team Strength, Indi­rect Benefits and Harms, and Ro­bust­ness to Mo­ral Uncer­tainty. Th­ese are dis­cussed in the fol­low­ing sec­tions.

Team Strength

Most pub­lic-fac­ing sources with the au­thor­ity to com­ment on what it takes to cre­ate a suc­cess­ful pro­ject stress the im­por­tance of found­ing team mem­bers.[3] In­ter­est­ingly, our re­view of the eval­u­a­tion frame­works used by grant­mak­ers in effec­tive al­tru­ism and ad­ja­cent spaces have com­par­a­tively lit­tle to con­tribute, po­ten­tially be­cause as­sess­ing teams is no­to­ri­ously difficult and re­veal­ing in­for­ma­tion about peo­ple is un­der­stand­ably a very sen­si­tive en­deavor. This re­luc­tance is likely com­pounded by un­cer­tainty about the ap­pro­pri­ate method­olog­i­cal ap­proaches, let alone pars­ing the highly sub­jec­tive world of hu­man ca­pa­bil­ity and po­ten­tial.

Our claim is that the suc­cesses and failures of a pro­ject rest in part upon the vi­sion, co­or­di­na­tion and im­ple­men­ta­tion of the team. By ex­ten­sion, our eval­u­a­tion places non­triv­ial weight on the founder and team eval­u­a­tion. Sim­ply be­cause nav­i­gat­ing the dy­nam­ics be­tween team mem­bers within a pro­ject is tricky and enor­mously sub­jec­tive, we be­lieve this is not rea­son enough to shy away from mak­ing sincere efforts to iden­tify the ways in which the plans, com­pe­ten­cies and in­ter­per­sonal fit of team mem­bers might af­fect pro­ject suc­cess.

Our team as­sess­ment crite­ria, kept in­ter­nal largely to guard against in­fluenc­ing the way ap­pli­cants pre­sent them­selves, was con­structed to sup­ple­ment the in­dus­try wis­dom of eval­u­at­ing teams based upon ‘track record’, preferred cre­den­tial sig­nals, and in-net­work lit­mus tests. The ad­di­tional con­sid­er­a­tions we find im­por­tant in eval­u­at­ing founders and teams are de­rived from first­hand and sec­ond­hand sources, offi­cial and un­offi­cial con­ver­sa­tions, and by ex­pe­rience, rea­son­ing, and in­tu­itions. It is a check­list that helps us iden­tify dis­qual­ify­ing crite­ria, core qual­ifi­ca­tions, and strengths and weak­nesses, that is de­signed to span ev­ery de­tectable as­pect of a team’s abil­ity to carry out their pro­ject plans. This in­cludes af­for­dances for fu­ture skill growth and per­sonal con­sid­er­a­tions. Rather than be­ing af­fixed to a rigid frame­work, or at­tempt­ing to over-quan­tify sub­jec­tive con­sid­er­a­tions, our crite­ria pri­mar­ily seek to un­cover de­vi­a­tions in any di­rec­tion from what could be best de­scribed as baseline com­pe­tency and com­mit­ment to launch­ing and scal­ing up effec­tive pro­jects.

Spe­cific to the RG eval­u­a­tion pro­cess, the Team Strength sec­tion also ac­counts for what we dis­cover through­out the course of the afore­men­tioned ‘in­cu­ba­tion’ por­tion of our pro­cess. Early-stage pro­jects nec­es­sar­ily have com­po­nents of their plans that will need re­fine­ment, and part of the value-add of RG is to offer as-needed plan­ning as­sis­tance. While the so­phis­ti­ca­tion of a pro­ject’s plan and our per­cep­tion of the team’s abil­ity to ex­e­cute are key com­po­nents of the eval­u­a­tion, RG also fac­tors in the iden­ti­fi­able com­pe­ten­cies of the team re­quired for sur­vival and later flour­ish­ing, which can be broadly con­strued as the abil­ity of the team to up­date and act upon re­vised plans.

Dona­tional overview

Dona­tional’s pro­gres­sion dis­plays am­ple ev­i­dence of Yamey’s fit­ness for lead­ing a pro­ject of this sort, be­gin­ning first as a per­sonal dona­tion au­toma­tion sys­tem and later evolv­ing into a plat­form that pro­cesses hun­dreds of thou­sands of dol­lars for effec­tive char­i­ties.

Yamey serv­ing as the sole tech­ni­cal pres­ence on the pro­ject is tes­ta­ment to his skill in craft­ing a high-qual­ity plat­form that serves the needs of One for the World, and pre­sents an at­trac­tive op­por­tu­nity to test a cor­po­rate out­reach model. As we al­luded to above, our team eval­u­a­tion crite­ria, in this in­stance eval­u­at­ing Dona­tional as a one-man team for the time be­ing, seeks to stack Yamey’s abil­ities against baseline com­pe­tency in the co­or­di­na­tion and ex­e­cu­tion of the pro­ject.

As it re­lates to the post-as­sess­ment write-up, this means that the fol­low­ing will not only briefly touch on eas­ily iden­ti­fi­able in­di­ca­tors of fit­ness to lead the pro­ject (e.g. track record, ed­u­ca­tion, for­mal skills ac­quired), but also em­pha­size less ob­vi­ous team qual­ities that could ad­e­quately han­dle bring­ing a suc­cess­ful pro­ject to fruition. Much of this will be done by way of in­tro­duc­ing se­lected crite­ria that we feel have re­vealed these con­sid­er­a­tions.

Defeator check

Yamey cleanly passes all of the ‘defeator’ (dis­qual­ifier) crite­ria, defined as dis­cov­er­able things that would be crit­i­cally bad for the pro­ject. This sec­tion com­prises a nar­row set of in­di­ca­tors that would dis­qual­ify the pro­ject out­right as an op­tion for fun­ders to con­sider. The ‘defeators’ sec­tion breaks down along the fol­low­ing lines:

  • Dis­po­si­tion, traits, and beliefs

  • Abil­ities and skills

  • Life plan considerations

  • Other

  • Lack of sev­eral core qualifiers

A con­ven­tional ex­am­ple of sub­crite­ria that would fall un­der “dis­po­si­tions, traits, and be­liefs” would be in­ter­per­sonal pre­sen­ta­tion. Be­ing ‘tone-deaf’ in this do­main could in­clude mak­ing re­peated poor choices in pub­lic con­texts or hav­ing a bad rep­u­ta­tion within a given com­mu­nity, which severely in­hibits the suc­cess of the pro­ject mov­ing for­ward. The method for de­ter­min­ing this crite­rion ranges from doc­u­ment­ing gen­eral in­ter­per­sonal im­pres­sions to con­firm­ing rep­u­ta­tional reads with trusted com­mu­nity mem­bers.

Core qual­ifier check

Yamey ap­pears to pass all of the ‘core qual­ifier’ crite­ria, defined as dis­cov­er­able things per­ceived to be cru­cial for the sur­vival and even­tual scal­ing of the pro­ject. This sec­tion com­prises a broad ar­ray of in­di­ca­tors that break down along the fol­low­ing lines:

  • Dis­po­si­tion, traits, and beliefs

  • Abil­ities and skills

  • Life plan considerations

  • Other

  • Lack­ing defeaters

A con­ven­tional ex­am­ple of a Core Qual­ifier sub­crite­rion that would fall un­der “abil­ities and skills” is un­der­stand­ing and bal­ance in judge­ment defer­ral. Be­cause many pro­jects en­com­pass a vast num­ber of do­mains, many of which the founder will need as­sis­tance in nav­i­gat­ing, know­ing when to defer is cru­cial. Defer­ring too much to oth­ers risks mak­ing the founder too de­pen­dent upon oth­ers for the enor­mous amount of de­ci­sions that need to be made and un­der­mines the judge­ment of the per­son who has the most ac­cess to de­ci­sion-rele­vant in­for­ma­tion. Defer­ring too lit­tle to oth­ers could naively set the pro­ject up to in­cur crit­i­cal failures or miss out on im­por­tant op­por­tu­ni­ties.

In the case of Dona­tional, Yamey’s track record sug­gests ap­pro­pri­ate and con­struc­tive pat­terns of defer­ral. For ex­am­ple, Yamey largely out­sourced de­ci­sions about which char­i­ties to in­clude on the plat­form to those recom­mended by GiveWell, An­i­mal Char­ity Eval­u­a­tors, and the Open Philan­thropy Pro­ject. In di­rect dis­cus­sions with the RG team as well, Yamey of­ten up­dated his out­look when fac­tor­ing the in­put of oth­ers, ap­pear­ing to weigh the con­tri­bu­tions of oth­ers sen­si­bly. His self-di­rect­ed­ness was also clearly ev­i­dent, hav­ing pushed back on the RG eval­u­a­tion team where it made sense. Although Dona­tional scores well against the Core Qual­ifier crite­ria, we did un­cover some po­ten­tial flags in within the sub­crite­ria “com­pe­tency across pro­ject di­men­sions” out­lined be­low.

High­lighted crite­rion #1: Life plan considerations

Life plan con­sid­er­a­tions in this con­text are in refer­ence to one’s life plans in re­la­tion to the pro­posed pro­ject. Life plan con­sid­er­a­tions are crit­i­cal to the suc­cess of early-stage pro­jects be­cause as­surances must be made that the pro­ject founder(s) and core team mem­bers in­tend to treat the pro­ject as an im­por­tant part of their lives.[4] If the founder plans on ini­ti­at­ing a multi-year plan to carry out an im­pact­ful pro­ject, there must be ev­i­dence that the founder is com­mit­ted to the effort, or that other po­ten­tial fu­tures will not ob­vi­ously dis­rupt that com­mit­ment.

A founder’s per­ceived val­ues are also cru­cial to ac­cu­rately map­ping out life plan con­sid­er­a­tions. As an ex­am­ple, if a founder is ex­plic­itly dis­satis­fied with their role in the pro­ject be­cause ac­tu­ally be­ing ‘on the ground’ in some sense makes them hap­pier, this does not bode well for their prospects of re­main­ing on the pro­ject. Another ex­am­ple could be im­pend­ing life cir­cum­stances that do not seem to square with the re­al­ities of play­ing a found­ing role in a pro­ject. Ten­sion would arise, for ex­am­ple, in a case where a startup founder ac­tu­ally bi­ases to­ward feel­ing more val­i­dated by in­sti­tu­tional cre­den­tial sig­nals (e.g. de­grees ac­crued, in­sti­tu­tions at­tended). Rest­ing within their life plans may be la­tent anx­iety that years are pass­ing with­out any ‘progress’ to­ward get­ting a de­sired ca­reer where tra­di­tion­ally rec­og­nized cre­den­tial sig­nals are crit­i­cal to ac­quire. Plan ten­sions of this sort are hardly un­com­mon, es­pe­cially within mod­ern la­bor mar­kets.

En­courag­ing signs from Donational

Since early 2018, Dona­tional has been a sta­bly run­ning plat­form that shows lit­tle signs of wind­ing down any time soon. Yamey cre­ated Dona­tional in the hopes of lead­ing a more im­pact­ful life and sees the pro­ject as an im­por­tant part of his life plans for the fore­see­able fu­ture. This, in ad­di­tion to hav­ing the flex­i­bil­ity to forego two days per week val­ued at ~$80,000 per an­num, dis­plays a well-above av­er­age com­mit­ment to in­te­grat­ing Dona­tional into his life plans.

Noth­ing has sug­gested that Yamey would abruptly di­vest effort from Dona­tional. Dona­tional was origi­nally cre­ated to satisfy an ar­ray of in­trin­sic needs for Yamey. Notable to men­tion here was a stated in­trin­sic mo­ti­va­tion to cre­ate unique solu­tions for the world that con­tribute to the great­est good, rather than filling an ex­ist­ing role or serv­ing as part of a team to am­plify the im­pact of oth­ers. His be­lief in want­ing to make a unique con­tri­bu­tion to­wards do­ing the most good gen­er­ally re­flects well for life plan con­sid­er­a­tions re­lated to this kind of pro­ject.

Often new pro­jects benefit from founders that de­sire and drive to cre­ate unique value for the world. This stated de­sire ap­pears con­sis­tent with Yamey’s work his­tory, and area of study as well, lead­ing up to the found­ing of Dona­tional. Many of the other rel­a­tively minor signs of cre­at­ing al­igned life plans, in­clud­ing in­definite plans to con­tinue liv­ing in a place that is gen­er­ally re­garded as well-suited for EA out­reach (New York City), check out – aside from the po­ten­tial is­sues listed be­low.

Due to the ease of main­tain­ing an au­to­mated plat­form, much of this sug­gests Yamey could be a sen­si­ble bet for ex­per­i­ment­ing with out­reach ap­proaches that could tie into Dona­tional. The part­ner­ship with OFTW, a GiveWell-in­cu­bated or­ga­ni­za­tion which has part­nered with Dona­tional in pro­cess­ing in­com­ing dona­tions, serves as proof of con­cept in this re­spect.

Po­ten­tial issues

Yamey clears many of the life plan con­sid­er­a­tions checks. The com­fortable slot Dona­tional holds in Yamey’s life plans, how­ever, may also work against the pro­ject’s po­ten­tial up­side in sev­eral dis­tinct ways. By care­fully trac­ing his plans over sev­eral con­ver­sa­tions, we con­cluded that it is un­likely Yamey would go on to lead the pro­ject full-time due to three pri­mary con­straints re­gard­ing the pro­ject bud­get:

  • Yamey’s in­come ex­pec­ta­tions are largely de­rived from the pri­vate sec­tor.

  • Yamey would like the pro­ject to be self-sus­tain­ing such that fees for pro­cess­ing will cover staff salary re­quire­ments. Ac­cord­ing to our model, this would only be­come vi­able af­ter sev­eral mil­lion dol­lars in pro­cessed pay­ments de­pend­ing on salary re­quire­ments – well above cur­rent lev­els.

  • Per­sonal flour­ish­ing through pro­fes­sional de­vel­op­ment is quite im­por­tant to him, and he be­lieves this is less likely to oc­cur work­ing by him­self or even with too small of a team.

To sum this up, in or­der to have Yamey go full-time on the pro­ject, the op­er­at­ing bud­get would need to cover Yamey’s per­sonal salary ex­pec­ta­tions and those of a small team, at least some of whom would be in­cur­ring a rel­a­tively high cost of liv­ing in New York City. All of this sug­gests that im­pact re­turns from the Dona­tional plat­form and out­reach efforts will need to be rel­a­tively high in or­der to jus­tifi­ably cover a po­ten­tial fu­ture where Yamey is work­ing full-time on the pro­ject. Th­ese re­al­ities have a very real effect on how plans should be ap­proached and built through the Dona­tional pro­ject.

The con­straints keep­ing Yamey from run­ning the pro­ject full-time are not straight­for­wardly pro­hibitive. Suc­cess­ful en­trepreneurs ca­pa­ble of run­ning mul­ti­ple pro­jects ex­ist. Nonethe­less, on two days a week run­ning the tech­ni­cal side of things, real trade-offs will ex­ist re­gard­ing Yamey’s band­width to con­duct var­i­ous ex­per­i­ments or up­skill in or­der to ap­proach new do­mains that could be cru­cial to the pro­ject (e.g. fundrais­ing and re­cruit­ment). We fac­tor these re­al­ities into our fund­ing recom­men­da­tion for the CAP pi­lot.

Yamey’s life plans cre­ate the need to re­cruit other in­di­vi­d­u­als to cover the re­sult­ing skill and band­width gaps. For ex­am­ple, Yamey’s cur­rent plan in­volves re­cruit­ing a co-founder and COO-type to ex­e­cute the CAP pi­lot and other vi­tal parts of the pro­ject. Some fo­cus of our as­sess­ment then shifts to whether bring­ing in oth­ers to cover these gaps is pos­si­ble and de­sir­able. Im­por­tant to con­sider as well is Yamey’s abil­ity to re­cruit and sta­bly main­tain a team of other in­di­vi­d­u­als in car­ry­ing out these plans. His track record and cur­rent re­spon­si­bil­ities lead­ing a team for his day job sug­gest high com­pe­tency here, par­tic­u­larly hav­ing played a vi­tal role in scal­ing up three or­ga­ni­za­tions pre­vi­ously. In the pro­ceed­ing sec­tion, we pre­sent our read on Yamey’s ca­pa­bil­ity in this re­gard.

Takeaways

  • Much sug­gests this pro­ject will sta­bly re­main in good hands for the fore­see­able fu­ture. Yamey’s life plans, stated val­ues and core com­pe­ten­cies ap­pear to match quite well with the con­tinu­a­tion of the pro­ject.

  • Dona­tional’s po­si­tion in Yamey’s plans, along with the na­ture of the pro­ject it­self, pre­sent a po­ten­tially good op­por­tu­nity for out­reach and mar­ket­ing ex­per­i­men­ta­tion.

  • Var­i­ous speci­fied con­straints make it very un­likely that Yamey works full-time on the pro­ject and also put pres­sure on the CAP plans to yield rel­a­tively high re­turns to jus­tify the in­vest­ment were a full-time tran­si­tion to oc­cur.

  • Much de­pends on ex­e­cu­tion of plans that com­pen­sate for life plan con­straints.

High­lighted crite­rion #2: Com­pe­tency across pro­ject dimensions

Bridg­ing ab­stract plans all the way to im­ple­men­ta­tion and adop­tion re­quires reg­u­larly work­ing on differ­ent ‘lev­els’ of a pro­ject as well, mostly within the ob­ject- and meta- lev­els. Skil­lfully nav­i­gat­ing these lev­els demon­strates fluency and com­mand along sev­eral di­men­sions rele­vant to the pro­ject. The ob­ject-level are con­crete di­men­sions of a pro­ject, such as the com­ple­tion of tasks. The meta-level could be char­ac­ter­ized as tac­ti­cal or strate­gic di­men­sions. Ideally, a found­ing team will pos­sess aware­ness at these differ­ent lev­els, re­main­ing re­spon­sive to var­i­ous pres­sures and in­com­ing ev­i­dence streams. It is rare to find peo­ple that can fluidly move be­tween both the ob­ject- and meta-level di­men­sions, so in this re­spect, Yamey stands out.

One ex­am­ple at the meta level is hav­ing the abil­ity to iden­tify, di­ag­nose, and ad­dress bot­tle­necks. As a pro­ject evolves, there will arise bot­tle­necks, or points of con­ges­tion in the func­tion­ing of a sys­tem, that need to be iden­ti­fied and reme­died. Di­ag­nos­ing bot­tle­necks alone re­quires in­ti­mately un­der­stand­ing a va­ri­ety of fac­tors per­tain­ing to the pro­ject and its goals, some of which reside be­yond sim­ply car­ry­ing out in­tended tasks. Th­ese points of con­ges­tion can sur­face globally for the pro­ject (e.g. the team isn’t aware of the im­por­tance of ex­ter­nal mar­ket­ing, which is plau­si­bly slow­ing rev­enue gen­er­a­tion) within and be­tween de­part­ments, and per­sonal to in­di­vi­d­u­als. Plan­ning and ac­tion are also needed for get­ting bot­tle­necks ad­dressed, how­ever. A bot­tle­neck di­ag­no­sis is use­less un­less an in­di­vi­d­ual can per­suade oth­ers on the team of its im­por­tance and the need to take ac­tion. This in­fluences the at­ten­tional di­rec­tion of the team, af­fect­ing the pri­ori­ties of the pro­ject in turn. Hav­ing the abil­ity to cause the re­s­olu­tion of bot­tle­necks, whether it be as a sole in­di­vi­d­ual or on a team, is cru­cial for the team to pos­sess. We out­line be­low what the “com­pe­tency across pro­ject di­men­sions” crite­rion in­di­cated about the strength of Dona­tional as a pro­ject.

In or­der to have cre­ated a solu­tion that fits squarely as a preferred dona­tion pro­cess­ing op­tion for OFTW and TLYCS, Yamey dis­played un­usu­ally high com­pe­tence in un­der­stand­ing and ex­e­cut­ing on prob­lem/​solu­tion fit con­sid­er­a­tions. This set of con­sid­er­a­tions an­ti­ci­pates how cus­tomers will be­have in the mar­ket­place when solv­ing prob­lems they en­counter. In this case, Yamey an­ti­ci­pated that his dona­tion pro­cess­ing pro­ject could be mod­ified and scaled to provide a vi­tal solu­tion for at least two or­ga­ni­za­tions. This en­tails more than sim­ply the con­cep­tual leg­work of cal­ibrat­ing on prob­lem/​solu­tion fit con­sid­er­a­tions, but also the im­ple­men­ta­tion of his tech­ni­cal knowl­edge and a broad ar­ray of in­ter­per­sonal skills that would cause the or­ga­ni­za­tions to adopt his solu­tion. All of this must be car­ried out in the right doses and with pre­cise timing.

En­courag­ing signs

As men­tioned above, Yamey demon­strated a de­tailed and ac­cu­rate un­der­stand­ing of how his abil­ities gen­er­ate real-world value, what he can offer in re­la­tion to his pro­ject, and what his pro­ject can offer in re­la­tion to the needs of ex­ist­ing or­ga­ni­za­tions. There is also con­sid­er­able leg­work en­tailed in de­liv­ery, in­clud­ing skil­lfully con­duct­ing co­or­di­na­tion efforts in or­der to com­pel or­ga­ni­za­tions to test his solu­tion.

Po­ten­tial issues

While the Team Strength por­tion of the as­sess­ment does fac­tor in a wide va­ri­ety of sig­nals meant to eval­u­ate fit­ness to lead the pro­ject, the fo­cus of the over­all as­sess­ment, in­clud­ing the cost-effec­tive­ness anal­y­sis, is the pre­sent-day po­ten­tial of the CAP pro­gram. Through­out the pro­cess, it has not been en­tirely clear that Yamey has been able to get con­cep­tual clar­ity on whether the CAP pro­gram is worth pur­su­ing. For ex­am­ple, RG pro­duced a rel­a­tively sim­ple back-of-the-en­velope calcu­la­tion (BOTEC) pro­ject­ing the po­ten­tial im­pact of a cor­po­rate am­bas­sador pro­gram that re­sulted in large up­dates to the CAP plan. The per­ceived po­ten­tial of the CAP as origi­nally con­ceived was re­vised down con­sid­er­ably, and al­ter­na­tive plans were as­sessed in or­der to fur­ther ex­plore the vi­a­bil­ity of the pro­gram. Mov­ing for­ward, Re­think Grants will likely pro­duce toy mod­els like these ear­lier in our eval­u­a­tion pro­cess. We ex­pect this will al­low us to iden­tify key ques­tions and po­ten­tial shifts be­fore sink­ing sig­nifi­cant re­sources into deeper anal­y­sis.

In the Life Plan Con­sid­er­a­tions sec­tion, we touch on the im­por­tance of teams be­ing able to rec­og­nize and sub­se­quently cover in­evitable skill gaps. Ideally, Yamey would have worked through these pre­limi­nary calcu­la­tions sooner with the in­ten­tion of gath­er­ing in­put from oth­ers af­ter­ward. This would have con­ceiv­ably led to im­prov­ing his BOTEC, or upon re­al­iz­ing the need for more skil­led as­sis­tance, mov­ing to em­ploy some­one else’s quan­ti­ta­tive mod­el­ing skill en­tirely. Un­for­tu­nately this hadn’t yet hap­pened for a va­ri­ety of stated rea­sons, all of which were plau­si­ble but ad­dress­able. To Yamey’s credit, tak­ing part in the RG pro­cess is a (some­what be­lated) at­tempt at do­ing this. It should also be rec­og­nized that the cur­rent iter­a­tion of the CAP plans are a re­sult of plan changes once it was re­al­ized that the CAP wasn’t as cost-effec­tive as origi­nally thought.

Out­stand­ing ques­tions re­main, how­ever, re­gard­ing this po­ten­tial meta-level in­de­ci­sion. Not tak­ing ad­e­quate steps to model the CAP suffi­ciently at an ear­lier stage could con­sti­tute red flags in cer­tain di­men­sions of pro­ject man­age­ment, where is­sues may ex­ist re­gard­ing his (i) aware­ness of and abil­ity to cover skill gaps in the pro­ject, and (ii) prop­erly weigh­ing the im­por­tance of do­mains where he does not have tech­ni­cal profi­ciency. To put it con­cisely, there is a worry that the im­por­tance of ba­sic quan­ti­ta­tive mod­el­ing wasn’t prop­erly ap­pre­ci­ated. One would ex­pect that pay­ing care­ful at­ten­tion to con­duct­ing crude quan­ti­ta­tive es­ti­mates of pro­posed pro­jects ought to be pri­ori­tized, and are quite im­por­tant re­gard­less of whether the founder knows ex­actly how to con­duct them.

Un­re­lated to Dona­tional, an ex­am­ple that illus­trates the im­por­tance of this meta-level aware­ness would be le­gal con­sid­er­a­tions for a pro­ject. Le­gal con­sid­er­a­tions may seem like a black box when ini­ti­at­ing a pro­ject in un­charted ter­ri­tory, but this does not negate the im­por­tance of tak­ing ac­tion to em­ploy do­main ex­perts and weigh­ing (the po­ten­tial im­por­tance of) le­gal con­sid­er­a­tions ap­pro­pri­ately.

Re­lat­edly, we would like to have seen more at­ten­tion paid to the ini­tial ex­plo­ra­tion of po­ten­tial part­ner­ships with nearby or­ga­ni­za­tions. Yamey has a demon­strated track record of work­ing with other or­ga­ni­za­tions through the Dona­tional plat­form, and be­cause the CAP de­vi­ates from be­ing a straight­for­wardly tech­ni­cal pro­ject deal­ing with out­reach, it would have been benefi­cial to seek col­lab­o­ra­tion with rele­vant groups ear­lier. There is rea­son to be­lieve, how­ever, that ear­lier ap­proaches when the CAP plans were less crys­tal­ized would have been less pru­dent. To his credit once again, OFTW agreed to provide as­sis­tance with the CAP pi­lot to­ward the end of the RG eval­u­a­tion pro­cess once we made the sug­ges­tion.

Takeaways

  • Yamey’s track record demon­strates high fa­cil­ity in nu­mer­ous di­men­sions rele­vant to cre­at­ing a suc­cess­ful pro­ject.

  • We un­cov­ered some po­ten­tial in­di­ca­tions of blind spots and lack of aware­ness re­gard­ing the im­por­tance of blind spots, in­clud­ing not con­duct­ing early calcu­la­tions on the CAP’s vi­a­bil­ity and ex­plor­ing nearby part­ner­ships.

  • Yamey is good at ad­dress­ing iden­ti­fied blind spots and is­sues once they have been iden­ti­fied as such.

  • RG ob­served many in­stances of Yamey’s will­ing­ness to im­ple­ment ma­jor plan changes based on ev­i­dence.

  • Yamey is rea­son­ably well-suited to run a pro­ject of this type. Ex­ist­ing or­ga­ni­za­tions mov­ing into the CAP space is plau­si­bly more at­trac­tive, though none of the ob­vi­ous play­ers plan to im­mi­nently pur­sue this.

Conclusion

In this sec­tion, we did not write out an ex­haus­tive overview of how Dona­tional scores on our Team Strength crite­ria, elect­ing in­stead to high­light cru­cial con­sid­er­a­tions that most plau­si­bly af­fected our eval­u­a­tion in the largest way. Our sub­jec­tive sense af­ter check­ing the CAP pro­gram against our crite­ria is that Dona­tional pre­sents an op­por­tu­nity to test out­reach meth­ods that haven’t yet been ad­e­quately ex­plored, namely work­place out­reach and fundrais­ing. The con­stel­la­tion of crite­ria we have been track­ing has led us to score Team Strength as Medium, though we be­lieve Yamey is on the high end of that cat­e­gory. In­form­ing our recom­men­da­tion for fund­ing a CAP pi­lot is our be­lief in Yamey’s over­all ca­pa­bil­ity as a founder, along with our reser­va­tions about how cer­tain con­straints un­cov­ered in the “life plans con­sid­er­a­tions” sec­tion and po­ten­tial flags within the “com­pe­tency across pro­ject di­men­sions” sec­tion bear on the vi­a­bil­ity of the pro­gram.

To re­cap, it was ob­served that Yamey’s life plans as they per­tain to the pro­ject were very promis­ing, with the ex­cep­tion that cer­tain con­straints would re­quire the CAP model to yield rel­a­tively high re­turns in or­der to be con­sid­ered suffi­ciently im­pact­ful. This re­sulted in a sub­stan­tial plan change, whereby a smaller-scale test of the CAP be­came far more sen­si­ble. Rather than recom­mend fully fund­ing a pro­gram that must hit op­ti­mistic tar­gets in or­der to be im­pact­ful enough, to Yamey’s credit once again, the de­ci­sion was made to get more in­for­ma­tion on the vi­a­bil­ity of this out­reach model via a pi­lot study.

Yamey’s track record sug­gests above-av­er­age fluency in sev­eral di­men­sions of what it takes to bring a pro­ject to fruition. The plan­ning pro­cess for CAP pre­sented some po­ten­tial gaps in aware­ness, how­ever, that would have been quite costly had the pro­ject gone ahead with­out our in­volve­ment. Most en­courag­ing in this re­spect is that Yamey demon­strates an ea­ger­ness to take cor­rec­tions and stead­fast com­mit­ment to iter­at­ing his plans in search of the most effec­tive ver­sion of the CAP pos­si­ble. This dis­po­si­tion and the as­so­ci­ated meta-skills re­quired to course-cor­rect con­sis­tently should be weighted more heav­ily than re­vealed skill gaps. No founder will have ev­ery skill or have ac­cess to all of the rele­vant forms of aware­ness, but we be­lieve Yamey dis­plays an im­pres­sive will­ing­ness to up­date and in­ten­tion to­ward build­ing fur­ther com­pe­tency.

Indi­rect Benefits

Our process

In ad­di­tion to the out­comes in­ten­tion­ally gen­er­ated by a pro­ject, we take into ac­count po­ten­tial in­di­rect effects (in­clud­ing spillover effects and flow-through effects). Indi­rect Benefits are out­comes that are not the cen­tral aim of a pro­ject, but are a pos­i­tive con­se­quence of its im­ple­men­ta­tion. For ex­am­ple, dis­tribut­ing mosquito nets to com­bat malaria may in­crease in­come, or found­ing a new char­ity may give effec­tive al­tru­ists skills that can be used on other pro­jects as well. There is no clear bound­ary be­tween di­rect and in­di­rect effects, and in­di­rect benefits are not in­trin­si­cally less im­por­tant than di­rect ones. How­ever, this prag­matic dis­tinc­tion al­lows us to treat more spec­u­la­tive con­se­quences sep­a­rately, in a way that takes into ac­count their high un­cer­tainty.

To do this, we first work to iden­tify all of the pos­si­ble in­di­rect benefits of a given pro­ject, and then con­sider how good the benefits would be. Speci­fi­cally, we quan­tify each plau­si­ble benefit by con­sid­er­ing the num­ber of in­di­vi­d­u­als that benefit (scale), the mag­ni­tude of the benefit (how much each in­di­vi­d­ual benefits, or effect size), and the like­li­hood that the pro­ject gen­er­ates that benefit (prob­a­bil­ity). We then calcu­late ex­pected benefit points for each in­di­rect benefit by mul­ti­ply­ing the scale, effect size and prob­a­bil­ity scores. Fi­nally, we sum these points to ar­rive at an over­all Indi­rect Benefits score. Tak­ing the sum ac­counts for the fact that caus­ing mul­ti­ple in­di­rect benefits is bet­ter than one.

The num­ber of ex­pected benefit points points as­so­ci­ated with the three qual­i­ta­tive cat­e­gories are shown in the table be­low. We set these thresh­olds to re­flect the fact that a pro­ject that has a high prob­a­bil­ity of caus­ing at least one very im­pact­ful in­di­rect benefit at a large scale should earn a score of High. We be­lieve a pro­ject that has a mod­er­ate prob­a­bil­ity of gen­er­at­ing a mod­er­ately good in­di­rect benefit for a mod­er­ate num­ber of peo­ple should get a score of at least Medium – higher if there are sev­eral in­di­rect benefits. And we be­lieve that many in­di­rect benefits with smaller ex­pected effects can also earn a pro­ject a Medium or High score.

IB thresholds

Our findings

In the case of CAP, we ex­pect that the most sig­nifi­cant in­di­rect benefits will be gen­er­ated as a re­sult of in­creased dona­tions to effec­tive an­i­mal welfare char­i­ties. Speci­fi­cally, we think there’s a chance that ad­di­tional dona­tions to effec­tive an­i­mal welfare char­i­ties con­tribute to re­duc­ing our re­li­ance on fac­tory farm­ing, which in turn would likely re­duce the sever­ity of cli­mate change as fac­tory farm­ing is re­spon­si­ble for a large share of green­house gases emit­ted by mid­dle- and high-in­come coun­tries. Miti­gat­ing some of the worst effects of cli­mate change could sub­stan­tially im­prove the lives of many peo­ple. While it’s quite un­likely that CAP scales to a point where the marginal dona­tions of CAP par­ti­ci­pants end up mak­ing a tan­gible differ­ence in this area, the benefits are large enough that the ex­pected benefit is sig­nifi­cant.

Similarly, re­duc­ing our re­li­ance on fac­tory farm­ing has the in­di­rect benefit of re­duc­ing the risk that an­tibiotic re­sis­tance, driven by the use of an­tibiotics in an­i­mal agri­cul­ture, could lead to a su­per­bug that causes a se­vere pan­demic or epi­demic af­fect­ing many peo­ple. Again, while the chance that the CAP pro­gram makes much of a differ­ence here is quite small, but the ex­pected value may still be non-triv­ial.

In­creas­ing dona­tions to global health char­i­ties may have in­di­rect benefits as well. There’s some ev­i­dence that re­duc­ing the in­ci­dence of malaria, for ex­am­ple, can have tan­gible macro-eco­nomic benefits as well as boost­ing the in­comes of those di­rectly af­fected. This is con­sis­tent with other global health in­ter­ven­tions, such as vac­ci­na­tions, which have been found to con­tribute to eco­nomic growth of the im­pacted re­gion (Jit et al., 2015; Ozawa et al., 2016), lead­ing to ad­di­tional peo­ple be­com­ing some­what less im­pov­er­ished.

There’s also a small prob­a­bil­ity that dona­tions to crim­i­nal jus­tice re­form char­i­ties might have pos­i­tive in­di­rect effects. The mass in­car­cer­a­tion of peo­ple of color likely con­tributes to per­sis­tent racial in­equal­ity in the United States. Crim­i­nal jus­tice re­form could miti­gate some of this.

In ad­di­tion, we think there’s a chance that we get some in­di­rect benefits not from the dona­tions them­selves, but from shift­ing CAP par­ti­ci­pants’ per­spec­tives on giv­ing and al­tru­ism. Speci­fi­cally, we think there’s a mod­er­ate chance that some of the par­ti­ci­pants adopt a more effec­tive­ness-ori­ented ap­proach to giv­ing go­ing for­ward, mul­ti­ply­ing the im­pact of their dona­tions. We also think there’s a chance that CAP ex­pands, reaches a de­cent num­ber of peo­ple, and that all of those peo­ple con­tribute to a more wide­spread cul­ture of giv­ing. Fur­ther­more, there is some chance that a few of those par­ti­ci­pants buy into effec­tive al­tru­ism more deeply and go on to have a sub­stan­tial im­pact, per­haps earn­ing-to-give or us­ing their ca­reer to achieve a lot of good.

Fi­nally, along similar lines, we ex­pect that there’s a small chance that some com­pa­nies par­ti­ci­pat­ing in CAP will for­mal­ize CAP-like pro­grams in their com­pany cul­ture and sys­tem, for ex­am­ple, by cre­at­ing a cor­po­rate char­ity de­duc­tion-type pro­gram. This could in­crease the amount of money go­ing to char­ity over­all, though un­less the com­pa­nies em­pha­size effec­tive char­i­ties, it’s not clear that this would have much of an im­pact.

Our find­ings are sum­marised in the table be­low. Fol­low­ing our scor­ing pro­cess, we end up with a to­tal of 30 ex­pected benefit points, which is just in­side the High cat­e­gory.

IB scores

Indi­rect Harms

Our process

Indi­rect Harms are un­in­tended nega­tive con­se­quences of the pro­ject. For ex­am­ple, a pro­ject that in­creased the in­comes of poor peo­ple may lead to greater con­sump­tion of an­i­mal prod­ucts, which causes non­hu­man an­i­mals to suffer; or an AI re­search or­ga­ni­za­tion might cause harm­ful AI by in­creas­ing the num­ber of peo­ple with rele­vant skills.

We be­lieve it’s im­por­tant to take these po­ten­tial harms into ac­count, and do so us­ing a similar ap­proach to the one used to ac­count for Indi­rect Benefits. We as­sess each po­ten­tial in­di­rect harm by eval­u­at­ing the scale of the harm, its effect size, and the prob­a­bil­ity that the harm is gen­er­ated to gen­er­ate ex­pected harm points.

Like with Indi­rect Benefits, we chose Indi­rect Harms thresh­olds that pe­nal­ize pro­jects that are quite likely to have a large nega­tive im­pact on a lot of peo­ple. Only pro­jects with only a few rel­a­tively small in­di­rect harms, most of which are un­likely and small in scale, earn a Low score.

IH thresholds

Our findings

We ex­pect that the biggest in­di­rect harm gen­er­ated by CAP will come from the fact that many of the dona­tions will likely go to global poverty char­i­ties. There’s a high prob­a­bil­ity that re­duc­ing global poverty leads to a mod­er­ate in­crease in con­sump­tion of fac­tory-farmed an­i­mals, at least in the short- to medium-term. If those an­i­mals gen­er­ally lead very bad lives, this could lead to a lot of ad­di­tional suffer­ing – though some have ar­gued that this prob­lem may be ex­ag­ger­ated.

Similarly, it’s pos­si­ble, though un­likely, that alle­vi­at­ing poverty would ac­cel­er­ate eco­nomic growth enough to in­crease the pace of tech­nolog­i­cal progress and re­duce the amount of time we have to make sure those tech­nolo­gies are safe. If we’re un­able to en­sure the safety of new tech­nolo­gies, those tech­nolo­gies could pose an ex­is­ten­tial threat to an enor­mous num­ber of sen­tient be­ings. This growth is also likely to ex­ac­er­bate global warm­ing, po­ten­tially af­fect­ing a large num­ber of peo­ple – though not, we sus­pect, to a high de­gree in most cases.

Another pos­si­ble effect of poverty alle­vi­a­tion may be to in­crease to­tal pop­u­la­tion size, which may con­tribute to cli­mate change and ex­ac­er­bate prob­lems like poverty, rather than solv­ing them. We think it’s quite un­likely that this is the case, as birth rate gen­er­ally (though not always) de­creases as poverty goes down.


It’s also pos­si­ble that by in­clud­ing mod­er­ately less effec­tive char­i­ties on the Dona­tional giv­ing plat­form, a small num­ber of dona­tions may get di­verted away from the most effec­tive char­i­ties. We think the prob­a­bil­ity of this is fairly low for two rea­sons: first, the Dona­tional plat­form has recom­mended char­i­ties that ‘nudge’ peo­ple into donat­ing where they can do the most good. Yamey has agreed to limit these in fu­ture to char­i­ties recom­mended by GiveWell and An­i­mal Char­ity Eval­u­a­tors, plus one US-fo­cused crim­i­nal jus­tice or­ga­ni­za­tion (most likely the Texas Or­ga­niz­ing Pro­ject). We are not con­fi­dent that the crim­i­nal jus­tice char­ity will be as cost-effec­tive as the oth­ers, but dona­tions through the plat­form so far sug­gest it will re­ceive a rel­a­tively small minor­ity of the dona­tions. More im­por­tantly, we don’t ex­pect that most peo­ple would have been giv­ing to effec­tive char­i­ties at all prior to CAP. We there­fore think it’s un­likely that CAP will cause a sub­stan­tial vol­ume of dona­tions to be di­verted from more to less effec­tive char­i­ties.

Fi­nally, it’s pos­si­ble that some CAP par­ti­ci­pants would even­tu­ally have heard about the prin­ci­ples of effec­tive al­tru­ism from a more com­pel­ling source. It would be a loss to the move­ment if those in­di­vi­d­u­als would have be­come some­what more in­volved with effec­tive al­tru­ism oth­er­wise. How­ever, we think the like­li­hood of this is quite low, pri­mar­ily be­cause effec­tive al­tru­ism is such a small move­ment that very few CAP donors would have be­come highly en­gaged in it any­way.

In to­tal, we have as­signed CAP 30 ex­pected harm points, plac­ing it in the High cat­e­gory.

IH scores

Ro­bust­ness to Mo­ral Uncertainty

Our process

Be­cause we have some moral un­cer­tainty, we want to min­i­mize the prob­a­bil­ity that we fund pro­jects that would be con­sid­ered morally wrong were we to hold a differ­ent wor­ld­view. To ac­count for this, we look for moral as­sump­tions we could make un­der which a given pro­ject would have the po­ten­tial to cause harm, and fa­vor pro­jects that are ro­bust to this con­sid­er­a­tion. In other words, we pre­fer pro­jects that don’t ap­pear sig­nifi­cantly wrong un­der other moral frame­works. To ac­count for the fact that we see some moral po­si­tions as more plau­si­ble than oth­ers, we give differ­ent po­si­tions differ­ent weights. Note that we are us­ing terms like ‘wor­ld­view’ and ‘moral po­si­tion’ loosely and in­ter­change­ably: they can in­clude broad em­piri­cal be­liefs, such as about the effects of free mar­kets ver­sus gov­ern­ment in­ter­ven­tion, as well as nor­ma­tive ones.

As a first step in this pro­cess, we brain­storm moral po­si­tions that, if true, would cause a given pro­gram to look morally wrong. We then con­sider how many in­di­vi­d­u­als would be nega­tively im­pacted if we held that eth­i­cal po­si­tion, how badly they would be af­fected, and the chance that the eth­i­cal po­si­tion is ‘cor­rect’. The scale, effect size, and prob­a­bil­ity of each harm are mul­ti­plied to ob­tain ex­pected harm points, which are summed.

We are quite averse to recom­mend­ing a grant sup­port­ing a pro­ject that would cause a lot of harm to a lot of peo­ple un­der even just one eth­i­cal frame­work that we find highly plau­si­ble, and set the thresh­olds ac­cord­ingly.

RMU thresholds

Our findings

We thought of sev­eral eth­i­cal po­si­tions un­der which CAP could look morally wrong to vary­ing de­grees.

Some the­o­ries, es­pe­cially some forms of con­se­quen­tial­ism, con­sider it wrong to ex­tend lives that are ‘net-nega­tive’ (con­tain more suffer­ing than hap­piness, roughly speak­ing). Opinion differs in the RG team about what pro­por­tion of lives ‘saved’ by the rele­vant char­i­ties (such as the Against Malaria Foun­da­tion and Malaria Con­sor­tium) fall into this cat­e­gory, how bad those lives are, and how plau­si­ble are the rele­vant moral the­o­ries. For con­se­quen­tial­ists with a ‘to­tal’ view of pop­u­la­tion ethics, one im­por­tant con­sid­er­a­tion is that avert­ing a death might not lead to more peo­ple ex­ist­ing in the long run, in which case the main effect of, for ex­am­ple, anti-malaria in­ter­ven­tions would be im­prov­ing qual­ity rather than quan­tity of life. We have ten­ta­tively as­signed this po­ten­tial harm Medium scores for scale and effect size but Low for prob­a­bil­ity. This re­flects our view that these char­i­ties are likely to have a net-pos­i­tive im­pact over­all, even if they do con­sid­er­able harm to some in­di­vi­d­u­als.

Ad­di­tion­ally, a util­i­tar­ian with an av­er­age view of pop­u­la­tion ethics would con­sider it morally wrong to donate to char­i­ties that ex­tend the lives of peo­ple liv­ing in poverty if those peo­ple are liv­ing be­low-av­er­age lives. Ac­cord­ing to this view, CAP would cause sub­stan­tial harm to a mod­er­ate num­ber of peo­ple. We think it’s likely that peo­ple liv­ing in ab­ject poverty live ‘be­low-av­er­age’ lives, but con­sider the av­er­age view of pop­u­la­tion ethics to be im­plau­si­ble.

Utili­tar­i­ans with a to­tal view of pop­u­la­tion ethics may also view end­ing fac­tory farm­ing as net-nega­tive if it turns out that most fac­tory-farmed an­i­mals – an­i­mals who wouldn’t ex­ist with­out fac­tory farm­ing – live net-pos­i­tive lives (lives worth liv­ing). From this per­spec­tive, the rel­a­tively small pro­por­tion of CAP dona­tions that we ex­pect will go to an­i­mal welfare char­i­ties would cause a bit of harm, though not very much as farmed an­i­mals lives are pre­sumed to be (at best) just weakly pos­i­tive, to a mod­er­ate num­ber of in­di­vi­d­u­als. While we find the to­tal view of pop­u­la­tion ethics to be plau­si­ble, we be­lieve the prob­a­bil­ity that most fac­tory farmed an­i­mals live net-pos­i­tive lives to be quite low. More­over, in­ter­ven­tions that im­prove an­i­mal welfare with­out re­duc­ing the num­ber of an­i­mals farmed are not vuln­er­a­ble to this ob­jec­tion.

There are also some so­cial­ist wor­ld­views un­der which CAP looks ac­tively wrong. Many ar­gue that pri­vate philan­thropy will nec­es­sar­ily be in­effi­ca­cious if it does not lead to sys­temic poli­ti­cal change (Ku­per, 2002). But some also ar­gue that pri­vate philan­thropy will be ac­tively harm­ful by act­ing as a smoke­screen for the sys­tem that ul­ti­mately causes the prob­lems pri­vate char­ity pur­ports to fix (Tho­rup, 2015; EIken­berry & Mira­bella, 2018), or by pro­mot­ing nega­tive in­di­vi­d­u­al­is­tic norms that op­pose more rad­i­cal col­lec­tivist ac­tion (Syme, 2019). We grant that there is some plau­si­bil­ity to the view that col­lec­tive poli­ti­cal solu­tions of some kind would be prefer­able to pri­vate philan­thropy, and that the ex­is­tence of pri­vate philan­thropy as a whole may re­duce the ex­tent of poli­ti­cal ac­tion in cer­tain ar­eas (al­though the ev­i­dence for this claim is very limited). Nev­er­the­less, we con­sider it very un­likely that CAP’s philan­thropic efforts on the mar­gin would cause harm­ful effects in this way (e.g. by pro­mot­ing pri­vate philan­thropy as a norm).

Similarly, oth­ers be­lieve that char­i­ties that make top-down de­ci­sions about what the global poor ‘need’ are deny­ing the poor the agency to iden­tify their own needs. Again, this could be a con­cern of both de­on­tol­o­gists and con­se­quen­tial­ists. Ac­cord­ing to this view, which we find some­what plau­si­ble, CAP might be caus­ing a lit­tle bit of harm to a mod­er­ate num­ber of peo­ple by pro­mot­ing dona­tions to global health and poverty char­i­ties (note that dona­tions to GiveDirectly would prob­a­bly be an ex­cep­tion).

Some peo­ple see pun­ish­ment as a moral im­per­a­tive, ei­ther be­cause of a de­on­tolog­i­cal com­mit­ment to re­tribu­tive jus­tice or an em­piri­cal be­lief in its effi­cacy. From this per­spec­tive, CAP may cause harm by pro­mot­ing dona­tions to crim­i­nal jus­tice re­form char­i­ties that aim to re­duce rates of in­car­cer­a­tion. We ex­pect the harm here would be quite small, for sev­eral rea­sons: we don’t ex­pect the crim­i­nal jus­tice re­form char­i­ties to greatly de­crease the use of re­tribu­tive pun­ish­ments; there is ar­guably lit­tle ev­i­dence that the kind of re­forms pro­moted by the rele­vant char­i­ties would in­crease crime or cause other prob­lems; and even from a re­tribu­tive per­spec­tive, many pun­ish­ments in the US jus­tice sys­tem are prob­a­bly dis­pro­por­tionate.

Fi­nally, some peo­ple with rights-based nor­ma­tive views main­tain that kil­ling and eat­ing an­i­mals is a hu­man right. By pro­mot­ing dona­tions to an­i­mal welfare char­i­ties that ex­plic­itly aim to end fac­tory farm­ing, CAP would cause a rel­a­tively mod­est amount of harm. We find the per­spec­tive that kil­ling and eat­ing an­i­mals is a hu­man right ir­re­spec­tive of the suffer­ing caused by this prac­tice to be pretty im­plau­si­ble. Even if there is a right to kill an­i­mals, a shift from fac­tory farm­ing to more hu­mane forms of an­i­mal agri­cul­ture seems un­likely to vi­o­late it.

In to­tal, we as­signed 25 ex­pected harm points, which is in the Medium cat­e­gory.

RMU scores

Pro­ject Po­ten­tial Score

To as­sess the over­all pro­ject po­ten­tial, we ag­gre­gate our ‘qual­i­ta­tive’ scores for all crite­ria. Each team mem­ber as­signs a weight to each crite­rion, based on fac­tors such as how im­por­tant they think the crite­rion it­self is, and how well they think our score cap­tures the crite­rion in this par­tic­u­lar eval­u­a­tion. The av­er­age of team mem­bers’ weights de­ter­mine the weighted scores, which are summed to cre­ate the fi­nal Pro­ject Po­ten­tial Score (PPS). The pro­ject po­ten­tial can be de­scribed as Low, Medium, or High, as shown in the table be­low.

PPS thresholds

Our team mem­bers all gave a large ma­jor­ity of the weight to the cost-effec­tive­ness es­ti­mate. Notwith­stand­ing its many short­com­ings, it was based on a more rigor­ous anal­y­sis than our scores for other crite­ria. This re­sulted in a fi­nal Pro­ject Po­ten­tial Score of 1.27, which is in the Low cat­e­gory.

PPS scores

Grant Recommendation

Our process

The Re­think Grants team votes on ev­ery grant af­ter all team mem­bers have re­viewed the grant eval­u­a­tion and ac­com­pa­ny­ing analy­ses. The num­ber of votes needed to recom­mend that grant de­pends on the Pro­ject Po­ten­tial Score.

PPS votes

Pro­pos­als re­ceiv­ing the req­ui­site num­ber of votes are recom­mended to grant­mak­ers in our net­work. Grants that are not recom­mended for fund­ing are given de­tailed feed­back, in­clud­ing:

  1. The grant eval­u­a­tion re­port, in­clud­ing the cost-effec­tive­ness anal­y­sis (with sen­si­tivity analy­ses high­light­ing the main sources of un­cer­tainty) and scores for each of the crite­ria.

  2. A sum­mary of the pro­ject’s main strengths.

  3. The con­sid­er­a­tions that weighed against the pro­jects. Th­ese in­clude both:

    • A sum­mary of the crite­ria that the pro­ject scored poorly on that the Pro­ject Po­ten­tial Score was highly sen­si­tive to.

    • A sum­mary of the crite­ria stated by the RG team mem­bers that voted against the grant as most im­por­tant to their vote.

  4. A set of key recom­men­da­tions to im­prove the pro­ject based on its most sub­stan­tial short­com­ings.

Our decision

The Re­think Grants team has unan­i­mously de­cided not to recom­mend fund­ing for a full-scale Cor­po­rate Am­bas­sador Pro­gram at this time. This is based heav­ily on our cost-effec­tive­ness anal­y­sis, which sug­gests it is un­likely to be worth­while at any rea­son­able cost-effec­tive­ness thresh­old, at least in its pro­posed form.

How­ever, we have also de­cided by con­sen­sus to recom­mend fund­ing of up to $40,000 to run a pi­lot study. This is pri­mar­ily based on three con­sid­er­a­tions:

  • Our value of in­for­ma­tion anal­y­sis sug­gests the pi­lot would re­solve more than enough un­cer­tainty to jus­tify its cost.

  • There are rea­sons for think­ing our cost-effec­tive­ness es­ti­mate may be con­ser­va­tive, es­pe­cially com­pared to analy­ses of similar pro­grams.

  • There is a good chance that the num­ber of dol­lars donated as a re­sult of the pi­lot would be at least as high as the num­ber spent to run it, which for some donors could make it a low-risk op­por­tu­nity.

Fol­low-up

If Re­think Grants con­tinues as a pro­ject, we will eval­u­ate the re­sults of each grant we make us­ing a Grant Fol­low-Up Plan. This plan out­lines the grant timeline, along with key time­points when we’ll check in with the grantee.

Each of those time­points has an as­so­ci­ated set of met­rics of suc­cess. Th­ese met­rics in­clude in­terim in­di­ca­tors – things like growth in team size – as well as out­come mea­sures like dona­tions moved to effec­tive char­i­ties.

We work with the grantee to set rea­son­able goals for each of those met­rics, and then com­pare those goals with re­al­ity dur­ing the sched­uled check-ins. This helps us un­der­stand the im­pact of our grants, and also helps us iden­tify grantees that would benefit from ad­di­tional sup­port.

Separately, we make a set of pub­lic fore­casts, es­ti­mat­ing the like­li­hood that the grantee achieves the goals out­lined in the Grant Fol­low-Up Plan. This gives us the chance to eval­u­ate our grant eval­u­a­tion pro­cess and judg­ment. Again, this is con­di­tional on the con­tinu­a­tion of RG; due to time con­straints, we will not be do­ing it for this eval­u­a­tion.

Ap­pendix 1: Challenges of elic­it­ing and ag­gre­gat­ing prob­a­bil­ity distributions

This ap­pendix out­lines some of the challenges we en­coun­tered when elic­it­ing, fit­ting, and ag­gre­gat­ing pa­ram­e­ter es­ti­mates, and sug­gests some ways of im­prov­ing the pro­cess.

There were a cou­ple of tech­ni­cal is­sues with fit­ting. First, one of the Re­think Grants team mem­bers (TMs) de­clined to give in­puts for four pa­ram­e­ters but blank cells were not per­mit­ted by the script. We filled those cells with an­other TM’s val­ues, but gave a con­fi­dence score of 0, which effec­tively ex­cludes them from the anal­y­sis. Se­cond, SHELF re­quires pc5 (the 5th per­centile) to be higher than L (the lower plau­si­ble limit), M (me­dian) to be higher than pc5, and so on, but some TMs used the same value for two in­puts. For ex­am­ple, some thought there was a greater than 5% chance of ob­tain­ing no pledges, so they in­put 0 for both L and pc5. The proper way to deal with this is to first elicit the prob­a­bil­ity it is zero, then elicit the in­puts given that it is not zero, but this would have added con­sid­er­able time and com­plex­ity to the pro­cess. In­stead, we sim­ply changed the higher per­centile to a num­ber slightly higher than the lower in­put, such as 0.000001.

More con­cern­ingly, spot-checks re­vealed a con­sid­er­able dis­par­ity be­tween many of the in­puts and the fit­ted per­centiles. In most cases, there was a close match with ei­ther pc5 and M, or M and pc95, but not all three. For ex­am­ple, pre­dicted pledge num­bers of [1, 2, 10] were fit to a log­nor­mal dis­tri­bu­tion with 5th, 50th, and 95th per­centiles of roughly [1, 2, 4], and 1st-year donor churn of [0.125, 0.7, 0.8] be­came a beta with [0.6, 0.7, 0.8]. This was not due to some tech­ni­cal er­ror, but sim­ply be­cause no stan­dard dis­tri­bu­tion would fit all the in­puts.

In or­der to pri­ori­tise fur­ther in­ves­ti­ga­tions, we added columns in­di­cat­ing the size of the great­est dis­par­ity for each pa­ram­e­ter, i.e. the high­est per­centage differ­ence be­tween the in­puts and fit­ted dis­tri­bu­tions for the 5th, 50th, and 95th per­centiles. A sub­stan­tial ma­jor­ity of the dis­par­i­ties fa­vored Do Noth­ing (no in­ter­ven­tion); that is, re­duc­ing the dis­par­ity would in­crease the es­ti­mated cost-effec­tive­ness of CAP, sug­gest­ing the CEA may be ‘bi­ased’ against the pro­gram. To get some idea of the mag­ni­tude of this effect, we cre­ated a copy of the in­puts and (very im­pre­cisely) mod­ified all the ones with a dis­par­ity of greater than 25% for those with a pri­or­ity of 5, and greater than 50% for the rest, so that they were roughly ‘neu­tral’ or fa­vored CAP, e.g. the [1, 2, 10] pa­ram­e­ter men­tioned above was fit to a gamma with per­centiles of [2, 5, 10]. (The pa­ram­e­ters fa­vor­ing Do Noth­ing were left un­changed.) This caused the dona­tion-cost ra­tio and ex­pected value of perfect in­for­ma­tion to in­crease dra­mat­i­cally, sug­gest­ing the re­sults were sen­si­tive to un­cer­tain­ties around the elic­i­ta­tion and fit­ting of in­puts.

We there­fore de­cided to re-es­ti­mate some in­puts. The TMs re­con­sid­ered any in­puts that met the fol­low­ing crite­ria:

  • Pri­or­ity 5 and dis­par­ity >50%

  • Pri­or­ity 4 and dis­par­ity >100%

  • Pri­or­ity 3 and dis­par­ity >200%

  • Pri­or­ity <3 and dis­par­ity >300%

Those who had enough time also did the fol­low­ing:

  • Pri­or­ity 5 and dis­par­ity >25%

  • Pri­or­ity 4 and dis­par­ity >50%

  • Pri­or­ity <4 and dis­par­ity >100%

This time, the TMs cre­ated the dis­tri­bu­tions in SHELF and ad­justed the in­puts so that, where pos­si­ble, the fit­ted dis­tri­bu­tions closely matched their be­liefs. Th­ese are used in the base case anal­y­sis, though it is pos­si­ble to run the anal­y­sis with the origi­nal dis­tri­bu­tions (which are gen­er­ally more pes­simistic) by chang­ing the “Fit­ting switch” cell in the Team In­puts sheet. TM_5 did not have time to redo their in­puts so all those meet­ing the crite­ria above were ex­cluded from the anal­y­sis in the base case, though they can be in­cluded us­ing the “Non-re­fits” switch.

After all of this, we still found a small num­ber of ex­treme out­liers in the to­tal net costs, as high as sev­eral billion dol­lars. We tracked these down to some im­plau­si­ble in­puts by three TMs. In par­tic­u­lar, they gave a lower plau­si­ble limit of 0 or 1 for the num­ber of am­bas­sadors per man­ager (#17); this was in­tended to re­flect the num­ber of vol­un­teers that one man­ager could re­al­is­ti­cally han­dle, but was in­ter­preted as an in­di­ca­tion of the suc­cess of the pro­gram. The to­tal cost of am­bas­sador man­ager salaries is calcu­lated as the num­ber of am­bas­sadors di­vided by the num­ber of am­bos per man­ager, so oc­ca­sion­ally the simu­la­tions would pro­duce a sce­nario like:

  • am­bas­sadors = 500

  • am­bas­sador man­ager salary = 80,000

  • am­bas­sadors per man­ager = 0.05

  • to­tal am­bas­sador man­ager cost = (500/​0.05)*80,000 = $800,000,000

Some­thing similar hap­pened with the vol­ume of dona­tions pro­cessed per hour of de­vel­oper time (#23), which is used to de­ter­mine de­vel­oper costs. TM_4 up­dated their in­puts, but TM_5 and TM_6 did not have time, so theirs were ex­cluded from the base case anal­y­sis. Those in­puts can be in­cluded us­ing the “Im­plau­si­ble” switch in the Team In­puts sheet.

Clearly, the pa­ram­e­ter elic­i­ta­tion and ag­gre­ga­tion pro­cess did not go as smoothly as hoped. This was largely due to the lack of cal­ibra­tion train­ing, lack of time to gather and fully con­sider rele­vant in­for­ma­tion, lack of fa­mil­iar­ity with the SHELF soft­ware, and per­haps lack of clar­ity in some pa­ram­e­ter de­scrip­tions. Below is a small se­lec­tion of al­ter­na­tives that we may con­sider for fu­ture eval­u­a­tions.

  1. Re­duce the num­ber of es­ti­mates. The lead an­a­lyst, and per­haps one or two other TMs, could cre­ate the dis­tri­bu­tions alone. Th­ese could be mod­ified in­for­mally in re­sponse to feed­back from the rest of the team. Or per­haps the whole team could provide dis­tri­bu­tions for the most im­por­tant few pa­ram­e­ters, leav­ing the rest to the lead an­a­lyst.

  2. Divide the pa­ram­e­ters among the team. Much as GiveWell has “pa­ram­e­ter own­ers”, each TM could be put in charge of gath­er­ing rele­vant in­for­ma­tion for a sub­set of the in­puts. Tak­ing this fur­ther, pairs of TMs – one closely in­volved in the eval­u­a­tion and one more de­tached – could be solely in charge of pro­vid­ing the prob­a­bil­ity dis­tri­bu­tions for the pa­ram­e­ters they ‘own’.

  3. Out­source es­ti­mates to cal­ibrated fore­cast­ers. In­di­vi­d­u­als with proven abil­ity to make ac­cu­rate pre­dic­tions may come up with bet­ter in­puts for some pa­ram­e­ters than TMs can.

  4. Use differ­ent soft­ware. Fore­told, an on­go­ing pro­ject by Ozzie Gooen, will provide a more user-friendly in­ter­face for cre­at­ing and com­bin­ing prob­a­bil­ity dis­tri­bu­tions.

  5. In­vest more time. If Re­think Grants con­tinues, it may be worth all those in­volved un­der­go­ing cal­ibra­tion train­ing, be­com­ing com­fortable with the rele­vant soft­ware, and spend­ing con­sid­er­ably longer gath­er­ing rele­vant in­for­ma­tion and cre­at­ing dis­tri­bu­tions, at least for the most sen­si­tive pa­ram­e­ters. Ideally, we would fol­low the full SHELF pro­to­col, which in­volves im­me­di­ate feed­back, dis­cus­sion, and con­struc­tion of a con­sen­sus dis­tri­bu­tion in a work­shop en­vi­ron­ment.

There are ma­jor draw­backs to each of these, and it will take some trial and er­ror to de­ter­mine the best ap­proach.

Ap­pendix 2: Calcu­lat­ing the CEACs, CEAF, and EVPI

This ap­pendix out­lines the steps fol­lowed in this model for calcu­lat­ing the cost-effec­tive­ness ac­cept­abil­ity curves and fron­tier, and the ex­pected value of perfect in­for­ma­tion. Th­ese are eas­ier to un­der­stand while look­ing at the rele­vant sec­tions of the Prob­a­bil­is­tic Anal­y­sis work­sheet.

CEACs

  • A CEAC rep­re­sents the prob­a­bil­ity of an in­ter­ven­tion be­ing cost-effec­tive at a range of cost-effec­tive­ness thresh­olds (in this case min­i­mum ac­cept­able dona­tion-cost ra­tios). The most cost-effec­tive op­tion is defined as the one with the high­est net benefit, which can be ex­pressed ei­ther in terms of the costs (net mon­e­tary benefit) or out­comes (such as net health benefit).

  • We first calcu­lated the net mon­e­tary benefit (NMB) for each simu­la­tion (each row of PSA sam­ples). The NMB is the value of the out­comes – in this case dona­tions – con­verted into the same units as the costs, minus the costs. The value of a unit of out­comes is de­ter­mined by the cost-effec­tive­ness thresh­old, which in prin­ci­ple rep­re­sents the op­por­tu­nity cost, e.g. a minDCR of 3x im­plies that $3 of dona­tions is worth $1 of ex­pen­di­ture. So the for­mula for NMB is [dona­tions]/​[minDCR]-[costs], e.g. at a minDCR of 3x, $1,000 donated at a cost of $500 would be (1000/​3)-500 = -$167; at 1x, it would be (1000/​1)-500 = $500; and at 10x, (1000/​10)-500 = -$400.

  • For each simu­la­tion, we recorded a 1 if the NMB for CAP was pos­i­tive (i.e. cost-effec­tive) and 0 if nega­tive (not cost-effec­tive). The av­er­age of those val­ues across all simu­la­tions rep­re­sents the prob­a­bil­ity that it is cost-effec­tive at the speci­fied minDCR (the “prob.ce” cell).

  • We then used a macro (the “Draw CEAC + CEAF” but­ton) to gen­er­ate a list of prob­a­bil­ities that CAP is cost-effec­tive at differ­ent thresh­olds, from 0.1x to 10x. This was plot­ted on a graph, alongside the CEAC of Do Noth­ing (which is just the mir­ror image of the CAP CEAC, since the CAP figures are all rel­a­tive to no in­ter­ven­tion).

CEAF

  • A CEAF rep­re­sents the prob­a­bil­ity that the op­tion with the high­est prob­a­bil­ity of be­ing cost-effec­tive (as in­di­cated by the CEACs) is op­ti­mal (has the high­est ex­pected net benefit) at var­i­ous cost-effec­tive­ness thresh­olds. In most cases, the in­ter­ven­tion that is most likely to be cost-effec­tive will also max­i­mize ex­pected net benefit, but this is not the case when the dis­tri­bu­tion of net benefit is skewed, with a mean differ­ent from the me­dian, so it is usu­ally worth do­ing both.

  • For the cur­rent thresh­old, if CAP had the high­est mean NMB, we recorded the prob­a­bil­ity that CAP is cost-effec­tive; and if Do Noth­ing had the high­est NMB, we recorded the prob­a­bil­ity that Do Noth­ing was cost-effec­tive (“live.ceaf” cell).

  • We used a macro (“Draw CEAC + CEAF” but­ton) to re­peat this for all thresh­olds be­tween 0.1x and 10x, and those val­ues were plot­ted on a graph.

  • For clar­ity, we also recorded the op­ti­mal op­tion at each thresh­old, and the er­ror prob­a­bil­ity – the chance the op­ti­mal op­tion was not the most cost-effec­tive.

EVPI

  • The EVPI is the ex­pected value of re­mov­ing all un­cer­tainty. It can be thought of as the cost of be­ing wrong, which is the differ­ence be­tween the value of always mak­ing the right choice, and the value of mak­ing the choice im­plied by cur­rent in­for­ma­tion.

  • First, the NMB for Do Noth­ing and CAP were calcu­lated in the same way as for the CEAC. To re­it­er­ate, the NMB de­pends on the minDCR.

  • For clar­ity, the op­ti­mal in­ter­ven­tion (the one with the high­est NMB out of Do Noth­ing and CAP) was recorded for each simu­la­tion, and for the mean NMB.

  • The NMB of the op­ti­mal in­ter­ven­tion was also recorded for each simu­la­tion. The av­er­age of these val­ues (the “max.nb” cell) is the ex­pected value of always mak­ing the right choice of in­ter­ven­tion.

  • The EVPI (“evpi” cell) was then calcu­lated as the NMB of always be­ing right (“max.nb”) minus the NMB of the in­ter­ven­tion that we would choose given cur­rent in­for­ma­tion (the one with the high­est ex­pected NMB). If CAP is not cost-effec­tive (NMB <0), the high­est NMB is 0 (Do Noth­ing), in which case the EVPI and the “max.nb” are the same.

  • We used a macro (“Draw EVPI”) to gen­er­ate the EVPI at differ­ent thresh­olds, from 0.1x to 10x, and plot­ted these val­ues on a graph.

References

Bar­ton, G. R., Briggs, A. H., & Fen­wick, E. A. L. (2008). Op­ti­mal Cost-Effec­tive­ness De­ci­sions: The Role of the Cost-Effec­tive­ness Ac­cept­abil­ity Curve (CEAC), the Cost-Effec­tive­ness Ac­cept­abil­ity Fron­tier (CEAF), and the Ex­pected Value of Perfec­tion In­for­ma­tion (EVPI). Value in Health, 11(5), 886–897. https://​​doi.org/​​10.1111/​​j.1524-4733.2008.00358.x

Black, W. C. (1990). The CE Plane: A Graphic Rep­re­sen­ta­tion of Cost-Effec­tive­ness. Med­i­cal De­ci­sion Mak­ing, 10(3), 212–214. https://​​doi.org/​​10.1177/​​0272989X9001000308

Briggs, A., Sculpher, M., & Clax­ton, K. (2006). De­ci­sion Model­ling for Health Eco­nomic Eval­u­a­tion. OUP Oxford.

Briggs, A. H., We­in­stein, M. C., Fen­wick, E. A. L., Karnon, J., Sculpher, M. J., & Paltiel, A. D. (2012). Model Pa­ram­e­ter Es­ti­ma­tion and Uncer­tainty Anal­y­sis: A Re­port of the ISPOR-SMDM Model­ing Good Re­search Prac­tices Task Force Work­ing Group–6. Med­i­cal De­ci­sion Mak­ing, 32(5), 722–732. https://​​doi.org/​​10.1177/​​0272989X12458348

Cas­sar, G. (2010). Are in­di­vi­d­u­als en­ter­ing self-em­ploy­ment overly op­ti­mistic? an em­piri­cal test of plans and pro­jec­tions on nascent en­trepreneur ex­pec­ta­tions. Strate­gic Man­age­ment Jour­nal, 31(8), 822–840. https://​​doi.org/​​10.1002/​​smj.833

Clax­ton, K. (2008). Ex­plor­ing Uncer­tainty in Cost-Effec­tive­ness Anal­y­sis: Phar­ma­coE­co­nomics, 26(9), 781–798. https://​​doi.org/​​10.2165/​​00019053-200826090-00008

Eiken­berry, A. M., & Mira­bella, R. M. (2018). Ex­treme Philan­thropy: Philan­thro­cap­i­tal­ism, Effec­tive Altru­ism, and the Dis­course of Ne­oliber­al­ism. PS: Poli­ti­cal Science & Poli­tics, 51(1), 43–47. https://​​doi.org/​​10.1017/​​S1049096517001378

Food Safety Author­ity, E. (2014). Guidance on Ex­pert Knowl­edge Elic­i­ta­tion in Food and Feed Safety Risk Assess­ment. EFSA Jour­nal, 12(6). https://​​doi.org/​​10.2903/​​j.efsa.2014.3734

Ha­gan, O. (2006). Uncer­tain Judge­ments: Elic­it­ing Ex­perts’ Prob­a­bil­ities. Lon­don ; Hobo­ken, NJ: John Wiley & Sons.

Jit, M., Hu­tubessy, R., Png, M. E., Sun­daram, N., Audi­mu­lam, J., Salim, S., & Yoong, J. (2015). The broader eco­nomic im­pact of vac­ci­na­tion: Re­view­ing and ap­prais­ing the strength of ev­i­dence. BMC Medicine, 13(1). https://​​doi.org/​​10.1186/​​s12916-015-0446-9

Ku­per, A. (2002). Global Poverty Relief–More Than Char­ity: Cos­mopoli­tan Alter­na­tives to the Singer Solu­tion. Ethics and In­ter­na­tional Af­fairs, 16(1), 107–120. https://​​doi.org/​​10.1111/​​j.1747-7093.2002.tb00378.x

Ozawa, S., Clark, S., Port­noy, A., Gre­wal, S., Bren­zel, L., & Walker, D. G. (2016). Re­turn On In­vest­ment From Child­hood Im­mu­niza­tion In Low- And Mid­dle-In­come Coun­tries, 2011–20. Health Af­fairs, 35(2), 199-207. https://​​doi.org/​​10.1377/​​hlthaff.2015.1086

Peas­good, T., Foster, D., & Dolan, P. (2019). Pri­or­ity Set­ting in Health­care Through the Lens of Hap­piness. In Global Hap­piness and Wel­lbe­ing Policy Re­port 2019 (pp. 28–51).

Strong, M., Oak­ley, J. E., Bren­nan, A., & Breeze, P. (2015). Es­ti­mat­ing the Ex­pected Value of Sam­ple In­for­ma­tion Us­ing the Prob­a­bil­is­tic Sen­si­tivity Anal­y­sis Sam­ple. Med­i­cal De­ci­sion Mak­ing, 35(5), 570–583. https://​​doi.org/​​10.1177/​​0272989X15575286

Syme, T. (2019). Char­ity vs. Revolu­tion: Effec­tive Altru­ism and the Sys­temic Change Ob­jec­tion. Eth­i­cal The­ory and Mo­ral Prac­tice, 22(1), 93–120. https://​​doi.org/​​10.1007/​​s10677-019-09979-5

Tho­rup, M. (2015). Pro Bono? Winch­ester, UK ; Wash­ing­ton, USA: Zero Books.

Wil­son, E. C. F. (2015). A Prac­ti­cal Guide to Value of In­for­ma­tion Anal­y­sis. Phar­ma­coE­co­nomics, 33(2), 105–121. https://​​doi.org/​​10.1007/​​s40273-014-0219-x

Credits

This re­port is a joint pro­ject of Re­think Pri­ori­ties and Re­think Char­ity. It was writ­ten by Derek Foster, Luisa Ro­driguez, and Tee Bar­nett. Spe­cial thanks to Ian Yamey of Dona­tional for pa­tiently work­ing through the en­tire RG eval­u­a­tion pro­cess; and to Wael Mo­hammed for writ­ing or im­prov­ing most of the Ex­cel macros. Thanks also to Mar­cus A. Davis, David Moss, Peter Hur­ford, Ozzie Gooen, Rossa O’Keeffe-O’Dono­van, Rob Struck, Jon Be­har, Jeremy Oak­ley, Matt Steven­son, An­drew Metry, and sev­eral anony­mous in­di­vi­d­u­als for pro­vid­ing valuable in­for­ma­tion, tech­ni­cal as­sis­tance, and feed­back.

If you like our work, please con­sider sub­scribing to the Re­think Pri­ori­ties newslet­ter. You can see all our pub­li­ca­tions to date here.


  1. For ex­am­ple, −500/​10 and 500/​-10 both equal −50, so sav­ing 10 lives with sav­ings of $500 (a very good situ­a­tion) gives the same cost-effec­tive­ness ra­tio as caus­ing 10 deaths at a cost of $500 (a ter­rible situ­a­tion). Even a pos­i­tive CEE can be mis­lead­ing: a ra­tio of two nega­tive num­bers gives a pos­i­tive, so 10 deaths caused with sav­ings of $500 would give the same CEE ($50) as 10 lives saved at a cost of $500. There is no ob­vi­ous way of avoid­ing this prob­lem in Guessti­mate, where the CEE would have to be a prob­a­bil­ity dis­tri­bu­tion that it­self is a ra­tio of dis­tri­bu­tions for costs and effects (ei­ther or both of which could in­clude val­ues be­low zero, even if the means were pos­i­tive). In spread­sheets like Ex­cel, we can mea­sure un­cer­tainty in other ways, as ex­plained later in this anal­y­sis. ↩︎

  2. Note that, to put the costs and effects in the same units, the value of dona­tions must be con­verted into equiv­a­lent dol­lars of ex­pen­di­ture. The ‘ex­change rate’ de­pends on the cost-effec­tive­ness thresh­old: a minDCR of 3 im­plies $3 donated is only ‘worth’ the same as $1 in ex­pen­di­ture, so to calcu­late the net benefit, dona­tions are di­vided by 3 be­fore the costs are sub­tracted. This is ex­plained fur­ther in Ap­pendix 2. ↩︎

  3. For ex­am­ple, see here, here, and here ↩︎

  4. Even and es­pe­cially in cases where a lead­er­ship tran­si­tion re­quires a smooth hand­off. ↩︎