Uncertainty and sensitivity analyses of GiveWell’s cost-effectiveness analyses

(The same con­tent is bro­ken up into three posts and given a very slightly differ­ent pre­sen­ta­tion on my blog.)

Overview

GiveWell mod­els the cost-effec­tive­ness of its top char­i­ties. Be­cause the in­put pa­ram­e­ters are un­cer­tain (How much moral weight should we give to in­creas­ing con­sump­tion? What is the cur­rent in­come of a typ­i­cal GiveDirectly re­cip­i­ent?), the re­sult­ing cost-effec­tive­ness es­ti­mates are also fun­da­men­tally un­cer­tain. By perform­ing un­cer­tainty anal­y­sis, we get a bet­ter sense of just how un­cer­tain the re­sults are. Uncer­tainty anal­y­sis is also the first step on the route to sen­si­tivity anal­y­sis. Sen­si­tivity anal­y­sis re­veals which in­put pa­ram­e­ters each char­ity’s cost-effec­tive­ness es­ti­mate is most sen­si­tive to. That kind of in­for­ma­tion helps us tar­get fu­ture in­ves­ti­ga­tions (i.e. un­cer­tainty re­duc­tion). The fi­nal step is to com­bine the in­di­vi­d­ual char­ity cost-effec­tive­ness es­ti­mates into one gi­ant model. By perform­ing un­cer­tainty and sen­si­tivity anal­y­sis on this gi­ant model, we get a bet­ter sense of which in­put pa­ram­e­ters have the most in­fluence on the rel­a­tive cost-effec­tive­ness of GiveWell’s top char­i­ties—i.e. how the char­i­ties rank against each other.

A key fea­ture of the anal­y­sis out­lined above and performed be­low is that it re­quires the an­a­lyst to spec­ify their un­cer­tainty over each in­put pa­ram­e­ter. Be­cause I didn’t want all of the re­sults here to re­flect my idiosyn­cratic be­liefs, I in­stead pre­tended that each in­put pa­ram­e­ter is equally un­cer­tain. This makes the re­sults “neu­tral” in a cer­tain sense, but it also means that they don’t re­veal much about the real world. To achieve real in­sight, you need to ad­just the in­put pa­ram­e­ters to match your be­liefs. You can do that by head­ing over to the Jupyter note­book, edit­ing the pa­ram­e­ters in the sec­ond cell, and click­ing “Run­time > Run all”. This limi­ta­tion means that all the en­su­ing dis­cus­sion is more akin to an anal­y­sis tem­plate than a true anal­y­sis.

Uncer­tainty anal­y­sis of GiveWell’s cost-effec­tive­ness estimates

Sec­tion overview

GiveWell pro­duces cost-effec­tive­ness mod­els of its top char­i­ties. Th­ese mod­els take as in­puts many un­cer­tain pa­ram­e­ters. In­stead of rep­re­sent­ing those un­cer­tain pa­ram­e­ters with point es­ti­mates—as the cost-effec­tive­ness anal­y­sis spread­sheet does—we can (should) rep­re­sent them with prob­a­bil­ity dis­tri­bu­tions. Feed­ing prob­a­bil­ity dis­tri­bu­tions into the mod­els al­lows us to out­put ex­plicit prob­a­bil­ity dis­tri­bu­tions on the cost-effec­tive­ness of each char­ity.

GiveWell’s cost-effec­tive­ness analysis

GiveWell, an in-depth char­ity eval­u­a­tor, makes their de­tailed spread­sheets mod­els available for pub­lic re­view. Th­ese spread­sheets es­ti­mate the value per dol­lar of dona­tions to their 8 top char­i­ties: GiveDirectly, De­worm the World, Schis­to­so­mi­a­sis Con­trol Ini­ti­a­tive, Sight­savers, Against Malaria Foun­da­tion, Malaria Con­sor­tium, He­len Kel­ler In­ter­na­tional, and the END Fund. For each char­ity, a model is con­structed tak­ing in­put val­ues to an es­ti­mated value per dol­lar of dona­tion to that char­ity. The in­puts to these mod­els vary from pa­ram­e­ters like “malaria prevalence in ar­eas where AMF op­er­ates” to “value as­signed to avert­ing the death of an in­di­vi­d­ual un­der 5″.

Helpfully, GiveWell iso­lates the in­put pa­ram­e­ters it deems as most un­cer­tain. Th­ese can be found in the “User in­puts” and “Mo­ral weights” tabs of their spread­sheet. Out­siders in­ter­ested in the top char­i­ties can reuse GiveWell’s model but sup­ply their own per­spec­tive by ad­just­ing the val­ues of the pa­ram­e­ters in these tabs.

For ex­am­ple, if I go to the “Mo­ral weights” tab and run the calcu­la­tion with a 0.1 value for dou­bling con­sump­tion for one per­son for one year—in­stead of the de­fault value of 1—I see the effect of this mod­ifi­ca­tion on the fi­nal re­sults: de­worm­ing char­i­ties look much less effec­tive since their pri­mary effect is on in­come.

Uncer­tain inputs

GiveWell pro­vides the abil­ity to ad­just these in­put pa­ram­e­ters and ob­serve al­tered out­put be­cause the in­puts are fun­da­men­tally un­cer­tain. But our un­cer­tainty means that pick­ing any par­tic­u­lar value as in­put for the calcu­la­tion mis­rep­re­sents our state of knowl­edge. From a sub­jec­tive Bayesian point of view, the best way to rep­re­sent our state of knowl­edge on the in­put pa­ram­e­ters is with a prob­a­bil­ity dis­tri­bu­tion over the val­ues the pa­ram­e­ter could take. For ex­am­ple, I could say that a nega­tive value for in­creas­ing con­sump­tion seems very im­prob­a­ble to me but that a wide range of pos­i­tive val­ues seem about equally plau­si­ble. Once we spec­ify a prob­a­bil­ity dis­tri­bu­tion, we can feed these dis­tri­bu­tions into the model and, in prin­ci­ple, we’ll end up with a prob­a­bil­ity dis­tri­bu­tion over our re­sults. This prob­a­bil­ity dis­tri­bu­tion on the re­sults helps us un­der­stand the un­cer­tainty con­tained in our es­ti­mates and how liter­ally we should take them.

Is this re­ally nec­es­sary?

Per­haps that sounds com­pli­cated. How are we sup­posed to mul­ti­ply, add and oth­er­wise ma­nipu­late ar­bi­trary prob­a­bil­ity dis­tri­bu­tions in the way our mod­els re­quire? Can we some­how re­duce our un­cer­tain be­liefs about the in­put pa­ram­e­ters to point es­ti­mates and run the calcu­la­tion on those? One can­di­date is to take the sin­gle most likely value of each in­put and us­ing that value in our calcu­la­tions. This is the ap­proach the cur­rent cost-effec­tive­ness anal­y­sis takes (as­sum­ing you provide in­put val­ues se­lected in this way). Un­for­tu­nately, the out­put of run­ning the model on these in­puts is nec­es­sar­ily a point value and gives no in­for­ma­tion about the un­cer­tainty of the re­sults. Be­cause the re­sults are prob­a­bly highly un­cer­tain, los­ing this in­for­ma­tion and be­ing un­able to talk about the un­cer­tainty of the re­sults is a ma­jor loss. A sec­ond pos­si­bil­ity is to take lower bounds on the in­put pa­ram­e­ters and run the calcu­la­tion on these val­ues, and to take the up­per bounds on the in­put pa­ram­e­ters and run the calcu­la­tion on these val­ues. This will pro­duce two bound­ing val­ues on our re­sults, but it’s hard to give them a use­ful mean­ing. If the lower and up­per bounds on our in­puts de­scribe, for ex­am­ple, a 95% con­fi­dence in­ter­val, the lower and up­per bounds on the re­sult don’t (usu­ally) de­scribe a 95% con­fi­dence in­ter­val.

Com­put­ers are nice

If we had to pro­ceed an­a­lyt­i­cally, work­ing with prob­a­bil­ity dis­tri­bu­tions through­out, the model would in­deed be trou­ble­some and we might have to set­tle for one of the above ap­proaches. But we live in the fu­ture. We can use com­put­ers and Monte Carlo meth­ods to nu­mer­i­cally ap­prox­i­mate the re­sults of work­ing with prob­a­bil­ity dis­tri­bu­tions while leav­ing our mod­els clean and un­con­cerned with these prob­a­bil­is­tic de­tails. Guessti­mate is a tool you may have heard of that works along these lines and bills it­self as “A spread­sheet for things that aren’t cer­tain”.

Analysis

We have the be­gin­nings of a plan then. We can im­ple­ment GiveWell’s cost-effec­tive­ness mod­els in a Monte Carlo frame­work (PyMC3 in this case), spec­ify prob­a­bil­ity dis­tri­bu­tions over the in­put pa­ram­e­ters, and fi­nally run the calcu­la­tion and look at the un­cer­tainty that’s been prop­a­gated to the re­sults.

Model

The Python source code im­ple­ment­ing GiveWell’s mod­els can be found on GitHub[1]. The core mod­els can be found in cash.py, nets.py, smc.py, worms.py and vas.py.

Inputs

For the pur­poses of the un­cer­tainty anal­y­sis that fol­lows, it doesn’t make much sense to in­fect the re­sults with my own idiosyn­cratic views on the ap­pro­pri­ate value of the in­put pa­ram­e­ters. In­stead, what I have done is uniformly taken GiveWell’s best guess and added and sub­tracted 20%. Th­ese up­per and lower bounds then be­come the 90% con­fi­dence in­ter­val of a log-nor­mal dis­tri­bu­tion[2]. For ex­am­ple, if GiveWell’s best guess for a pa­ram­e­ter is 0.1, I used a log-nor­mal with a 90% CI from 0.08 to 0.12.

While this ap­proach screens off my in­fluence it also means that the re­sults of the anal­y­sis will pri­mar­ily tell us about the struc­ture of the com­pu­ta­tion rather than in­form­ing us about the world. For­tu­nately, there’s a rem­edy for this prob­lem too. I have set up a Jupyter note­book[3] with the all the in­put pa­ram­e­ters to the calcu­la­tion which you can ma­nipu­late and re­run the anal­y­sis. That is, if you think the moral weight given to in­creas­ing con­sump­tion ought to range from 0.8 to 1.5 in­stead of 0.8 to 1.2, you can make that edit and see the cor­re­spond­ing re­sults. Mak­ing these mod­ifi­ca­tions is es­sen­tial for a re­al­is­tic anal­y­sis be­cause we are not, in fact, equally un­cer­tain about ev­ery in­put pa­ram­e­ter.

It’s also worth not­ing that I have con­sid­er­ably ex­panded the set of in­put pa­ram­e­ters re­ceiv­ing spe­cial scrutiny. The GiveWell cost-effec­tive­ness anal­y­sis is (with good rea­son—it keeps things man­age­able for out­side users) fairly con­ser­va­tive about which pa­ram­e­ters it high­lights as el­i­gible for user ma­nipu­la­tion. In this anal­y­sis, I in­clude any in­put pa­ram­e­ter which is not tau­tolog­i­cally cer­tain. For ex­am­ple, “Re­duc­tion in malaria in­ci­dence for chil­dren un­der 5 (from Len­geler 2004 meta-anal­y­sis)” shows up in the anal­y­sis which fol­lows but is not high­lighted in GiveWell’s “User in­puts” or “Mo­ral weights” tab. Even though we don’t have much in­for­ma­tion with which to sec­ond guess the meta-anal­y­sis, the value it re­ports is still un­cer­tain and our calcu­la­tion ought to re­flect that.

Results

Fi­nally, we get to the part that you ac­tu­ally care about, dear reader: the re­sults. Given in­put pa­ram­e­ters which are each dis­tributed log-nor­mally with a 90% con­fi­dence in­ter­val span­ning ±20% of GiveWell’s best es­ti­mate, here are the re­sult­ing un­cer­tain­ties in the cost-effec­tive­ness es­ti­mates:

Probability distributions of value per dollar for GiveWell's top charities

Prob­a­bil­ity dis­tri­bu­tions of value per dol­lar for GiveWell’s top charities

For refer­ence, here are the point es­ti­mates of value per dol­lar us­ing GiveWell’s val­ues for the char­i­ties:

GiveWell’s cost-effec­tive­ness es­ti­mates for its top charities

Char­ity Value per dol­lar
GiveDirectly 0.0038
The END Fund 0.0222
De­worm the World 0.0738
Schis­to­so­mi­a­sis Con­trol Ini­ti­a­tive 0.0378
Sight­savers 0.0394
Malaria Con­sor­tium 0.0326
He­len Kel­ler In­ter­na­tional 0.0223
Against Malaria Foun­da­tion 0.0247

I’ve also plot­ted a ver­sion in which the re­sults are nor­mal­ized—I di­vided the re­sults for each char­ity by that char­ity’s ex­pected value per dol­lar. In­stead of show­ing the prob­a­bil­ity dis­tri­bu­tion on the value per dol­lar for each char­ity, this nor­mal­ized ver­sion shows the prob­a­bil­ity dis­tri­bu­tion on the per­centage of that char­ity’s ex­pected value that it achieves. This ver­sion of the plot ab­stracts from the ac­tual value per dol­lar and em­pha­sizes the spread of un­cer­tainty. It also reëm­pha­sizes the ear­lier point that—be­cause we use the same spread of un­cer­tainty for each in­put pa­ram­e­ter—the cur­rent re­sults are tel­ling us more about the struc­ture of the model than about the world. For real re­sults, go try the Jupyter note­book!

Probability distributions for percentage of expected value obtained with each of GiveWell's top charities

Prob­a­bil­ity dis­tri­bu­tions for per­centage of ex­pected value ob­tained with each of GiveWell’s top charities

Sec­tion recap

Our pre­limi­nary con­clu­sion is that all of GiveWell’s top char­i­ties cost-effec­tive­ness es­ti­mates have similar un­cer­tainty with GiveDirectly be­ing a bit more cer­tain than the rest. How­ever, this is mostly an ar­ti­fact of pre­tend­ing that we are ex­actly equally un­cer­tain about each in­put pa­ram­e­ter.

Sen­si­tivity anal­y­sis of GiveWell’s cost effec­tive­ness estimates

Sec­tion overview

In the pre­vi­ous sec­tion, we in­tro­duced GiveWell’s cost-effec­tive­ness anal­y­sis which uses a spread­sheet model to take point es­ti­mates of un­cer­tain in­put pa­ram­e­ters to point es­ti­mates of un­cer­tain re­sults. We ad­justed this ap­proach to take prob­a­bil­ity dis­tri­bu­tions on the in­put pa­ram­e­ters and in ex­change got prob­a­bil­ity dis­tri­bu­tions on the re­sult­ing cost-effec­tive­ness es­ti­mates. But this ma­chin­ery lets us do more. Now that we’ve com­pleted an un­cer­tainty anal­y­sis, we can move on to sen­si­tivity anal­y­sis.

The ba­sic idea of sen­si­tivity anal­y­sis is, when work­ing with un­cer­tain val­ues, to see which in­put val­ues most af­fect the out­put when they vary. For ex­am­ple, if you have the equa­tion and each of and varies uniformly over the range from 5 to 10, is much more sen­si­tive to then . A sen­si­tivity anal­y­sis is prac­ti­cally use­ful in that it can offer you guidance as to which pa­ram­e­ters in your model it would be most use­ful to in­ves­ti­gate fur­ther (i.e. to nar­row their un­cer­tainty).

Vi­sual (scat­ter plot) and delta mo­ment-in­de­pen­dent sen­si­tivity anal­y­sis on GiveWell’s cost-effec­tive­ness mod­els show which in­put pa­ram­e­ters the cost-effec­tive­ness es­ti­mates are most sen­si­tive to. Pre­limi­nary re­sults (given our in­put un­cer­tainty) show that some in­put pa­ram­e­ters are much more in­fluen­tial on the fi­nal cost-effec­tive­ness es­ti­mates for each char­ity than oth­ers.

Vi­sual sen­si­tivity analysis

The first kind of sen­si­tivity anal­y­sis we’ll run is just to look at scat­ter plots com­par­ing each in­put pa­ram­e­ter to the fi­nal cost-effec­tive­ness es­ti­mates. We can imag­ine these scat­ter plots as the re­sult of run­ning the fol­low­ing pro­ce­dure many times[4]: sam­ple a sin­gle value from the prob­a­bil­ity dis­tri­bu­tion for each in­put pa­ram­e­ter and run the calcu­la­tion on these val­ues to de­ter­mine a re­sult value. If we re­peat this pro­ce­dure enough times, it starts to ap­prox­i­mate the true val­ues of the prob­a­bil­ity dis­tri­bu­tions.

(One nice fea­ture of this sort of anal­y­sis is that we see how the out­put de­pends on a par­tic­u­lar in­put even in the face of vari­a­tions in all the other in­puts—we don’t hold ev­ery­thing else con­stant. In other words, this is a global sen­si­tivity anal­y­sis.)

(Caveat: We are again pre­tend­ing that we are equally un­cer­tain about each in­put pa­ram­e­ter and the re­sults re­flect this limi­ta­tion. To see the anal­y­sis re­sult for differ­ent in­put un­cer­tain­ties, edit and run the Jupyter note­book.)

Direct cash transfers

GiveDirectly

Scatter plots showing sensitivity of GiveDirectly's cost-effectiveness to each input parameter

Scat­ter plots show­ing sen­si­tivity of GiveDirectly’s cost-effec­tive­ness to each in­put parameter

The scat­ter plots show that, given our choice of in­put un­cer­tainty, the out­put is most sen­si­tive (i.e. the scat­ter plot for these pa­ram­e­ters shows the great­est di­rec­tion­al­ity) to the in­put pa­ram­e­ters:

High­lighted in­put fac­tors to which re­sult is highly sensitive

In­put Type of un­cer­tainty Mean­ing/​im­por­tance
value of in­creas­ing ln con­sump­tion per cap­ita per an­num Mo­ral Deter­mines fi­nal con­ver­sion be­tween em­piri­cal out­comes and value
trans­fer as per­cent of to­tal cost Oper­a­tional Deter­mines cost of re­sults
re­turn on in­vest­ment Op­por­tu­ni­ties available to re­cip­i­ents Deter­mines stream of con­sump­tion over time
baseline con­sump­tion per cap­ita Em­piri­cal Diminish­ing marginal re­turns to con­sump­tion mean that baseline con­sump­tion mat­ters

Deworming

Some use­ful and non-ob­vi­ous con­text for the fol­low­ing is that the pri­mary pu­ta­tive benefit of de­worm­ing is in­creased in­come later in life.

The END Fund

Scatter plots showing sensitivity of the END Fund's cost-effectiveness to each input parameter

Scat­ter plots show­ing sen­si­tivity of the END Fund’s cost-effec­tive­ness to each in­put parameter

Here, it’s a lit­tle harder to iden­tify cer­tain fac­tors as more im­por­tant. It seems that the fi­nal es­ti­mate is (given our in­put un­cer­tainty) the re­sult of many fac­tors of medium effect. It does seem plau­si­ble that the out­put is some­what less sen­si­tive to these fac­tors:

High­lighted in­put fac­tors to which re­sult is min­i­mally sensitive

In­put Type of un­cer­tainty Mean­ing/​(un)im­por­tance
num yrs be­tween de­worm­ing and benefits Fore­cast Affects how much dis­count­ing of fu­ture in­come streams must be done
du­ra­tion of long-term benefits Fore­cast The length of time for a which a per­son works and earns in­come
ex­pected value from lev­er­age and fung­ing Game the­o­retic How much does money donated to the END Fund shift around other money

De­worm the World

Scatter plots showing sensitivity of Deworm the World's cost-effectiveness to each input parameter

Scat­ter plots show­ing sen­si­tivity of the De­worm the World’s cost-effec­tive­ness to each in­put parameter

Again, it’s a lit­tle harder to iden­tify cer­tain fac­tors as more im­por­tant. It seems that the fi­nal es­ti­mate is (given our in­put un­cer­tainty) the re­sult of many fac­tors of medium effect. It does seem plau­si­ble that the out­put is some­what less sen­si­tive to these fac­tors:

High­lighted in­put fac­tors to which re­sult is min­i­mally sensitive

In­put Type of un­cer­tainty Mean­ing/​(un)im­por­tance
num yrs be­tween de­worm­ing and benefits Fore­cast Affects how much dis­count­ing of fu­ture in­come streams must be done
du­ra­tion of long-term benefits Fore­cast The length of time for a which a per­son works and earns in­come
ex­pected value from lev­er­age and fung­ing Game the­o­retic How much does money donated to De­worm the World shift around other money

Schis­to­so­mi­a­sis Con­trol Initiative

Scatter plots showing sensitivity of Schistosomiasis Control Initiative's cost-effectiveness to each input parameter

Scat­ter plots show­ing sen­si­tivity of the Schis­to­so­mi­a­sis Con­trol Ini­ti­a­tive’s cost-effec­tive­ness to each in­put parameter

Again, it’s a lit­tle harder to iden­tify cer­tain fac­tors as more im­por­tant. It seems that the fi­nal es­ti­mate is (given our in­put un­cer­tainty) the re­sult of many fac­tors of medium effect. It does seem plau­si­ble that the out­put is some­what less sen­si­tive to these fac­tors:

High­lighted in­put fac­tors to which re­sult is min­i­mally sensitive

In­put Type of un­cer­tainty Mean­ing/​(un)im­por­tance
num yrs be­tween de­worm­ing and benefits Fore­cast Affects how much dis­count­ing of fu­ture in­come streams must be done
du­ra­tion of long-term benefits Fore­cast The length of time for a which a per­son works and earns in­come
ex­pected value from lev­er­age and fung­ing Game the­o­retic How much does money donated to Schis­to­so­mi­a­sis Con­trol Ini­ti­a­tive shift around other money

Sightsavers

Scatter plots showing sensitivity of Sightsavers' cost-effectiveness to each input parameter

Scat­ter plots show­ing sen­si­tivity of the Sight­savers’ cost-effec­tive­ness to each in­put parameter

Again, it’s a lit­tle harder to iden­tify cer­tain fac­tors as more im­por­tant. It seems that the fi­nal es­ti­mate is (given our in­put un­cer­tainty) the re­sult of many fac­tors of medium effect. It does seem plau­si­ble that the out­put is some­what less sen­si­tive to these fac­tors:

High­lighted in­put fac­tors to which re­sult is min­i­mally sensitive

In­put Type of un­cer­tainty Mean­ing/​(un)im­por­tance
num yrs be­tween de­worm­ing and benefits Fore­cast Affects how much dis­count­ing of fu­ture in­come streams must be done
du­ra­tion of long-term benefits Fore­cast The length of time for a which a per­son works and earns in­come
ex­pected value from lev­er­age and fung­ing Game the­o­retic How much does money donated to Sight­savers shift around other money

Sea­sonal malaria chemoprevention

Malaria Consortium

Scatter plots showing sensitivity of Malaria Consortium's cost-effectiveness to each input parameter

Scat­ter plots show­ing sen­si­tivity of Malaria Con­sor­tium’s cost-effec­tive­ness to each in­put parameter

The scat­ter plots show that, given our choice of in­put un­cer­tainty, the out­put is most sen­si­tive (i.e. the scat­ter plot for these pa­ram­e­ters shows the great­est di­rec­tion­al­ity) to the in­put pa­ram­e­ters:

High­lighted in­put fac­tors to which re­sult is highly sensitive

In­put Type of un­cer­tainty Mean­ing/​im­por­tance
di­rect mor­tal­ity in high trans­mis­sion sea­son Em­piri­cal Frac­tion of over­all malaria mor­tal­ity dur­ing the peak trans­mis­sion sea­son and amenable to SMC
in­ter­nal val­idity ad­just­ment Method­olog­i­cal How much do we trust the re­sults of the un­der­ly­ing SMC stud­ies
ex­ter­nal val­idity ad­just­ment Method­olog­i­cal How much do the re­sults of the un­der­ly­ing SMC stud­ies trans­fer to new set­tings
cov­er­age in tri­als in meta-anal­y­sis His­tor­i­cal/​method­olog­i­cal Deter­mines how much cov­er­age an SMC pro­gram needs to achieve to match stud­ies
value of avert­ing death of a young child Mo­ral Deter­mines fi­nal con­ver­sion be­tween em­piri­cal out­comes and value
cost per child tar­geted Oper­a­tional Affects cost of re­sults

Vi­tamin A supplementation

He­len Kel­ler International

Scatter plots showing sensitivity of Helen Keller International's cost-effectiveness to each input parameter

Scat­ter plots show­ing sen­si­tivity of the He­len Kel­ler In­ter­na­tional’s cost-effec­tive­ness to each in­put parameter

The scat­ter plots show that, given our choice of in­put un­cer­tainty, the out­put is most sen­si­tive to the in­put pa­ram­e­ters:

High­lighted in­put fac­tors to which re­sult is highly sensitive

In­put Type of un­cer­tainty Mean­ing/​im­por­tance
rel­a­tive risk of all-cause mor­tal­ity for young chil­dren in pro­grams Causal How much do VAS pro­grams af­fect mor­tal­ity
cost per child per round Oper­a­tional Affects cost of re­sults
rounds per year Oper­a­tional Affects cost of re­sults

Bednets

Against Malaria Foundation

Scatter plots showing sensitivity of Against Malaria Foundation's cost-effectiveness to each input parameter

Scat­ter plots show­ing sen­si­tivity of Against Malaria Foun­da­tion’s cost-effec­tive­ness to each in­put parameter

The scat­ter plots show that, given our choice of in­put un­cer­tainty, the out­put is most sen­si­tive (i.e. the scat­ter plot for these pa­ram­e­ters shows the great­est di­rec­tion­al­ity) to the in­put pa­ram­e­ters:

High­lighted in­put fac­tors to which re­sult is highly sensitive

In­put Type of un­cer­tainty Mean­ing/​im­por­tance
num LLINs dis­tributed per per­son Oper­a­tional Affects cost of re­sults
cost per LLIN Oper­a­tional Affects cost of re­sults
deaths averted per pro­tected child un­der 5 Causal How effec­tive is the core ac­tivity
lifes­pan of an LLIN Em­piri­cal Deter­mines how many years of benefit ac­crue to each dis­tri­bu­tion
net use ad­just­ment Em­piri­cal Deter­mines benefits from LLIN as me­di­ated by proper and im­proper use
in­ter­nal val­idity ad­just­ment Method­olog­i­cal How much do we trust the re­sults of the un­der­ly­ing stud­ies
per­cent of mor­tal­ity due to malaria in AMF ar­eas vs tri­als Em­piri­cal/​his­tor­i­cal Affects size of the prob­lem
per­cent of pop. un­der 5 Em­piri­cal Affects size of the prob­lem

Delta mo­ment-in­de­pen­dent sen­si­tivity analysis

If eye­bal­ling plots seems a bit un­satis­fy­ing to you as a method for judg­ing sen­si­tivity, not to worry. We also have the re­sults of a more for­mal sen­si­tivity anal­y­sis. This method is called delta mo­ment-in­de­pen­dent sen­si­tivity anal­y­sis.

(the delta mo­ment-in­de­pen­dent sen­si­tivity in­di­ca­tor of pa­ram­e­ter ) “rep­re­sents the nor­mal­ized ex­pected shift in the dis­tri­bu­tion of [the out­put] pro­voked by [that in­put]”. To make this mean­ing more ex­plicit, we’ll start with some no­ta­tion/​defi­ni­tions. Let:

  1. be the ran­dom vari­ables used as in­put parameters

  2. so that is a func­tion from to de­scribing the re­la­tion­ship be­tween in­puts and out­puts—i.e. GiveWell’s cost-effec­tive­ness model

  3. be the den­sity func­tion of the re­sult —i.e. the prob­a­bil­ity dis­tri­bu­tions we’ve already seen show­ing the cost-effec­tive­ness for each charity

  4. be the con­di­tional den­sity of Y with one of the pa­ram­e­ters fixed—i.e. a prob­a­bil­ity dis­tri­bu­tion for the cost-effec­tive­ness of a char­ity while pre­tend­ing that we know one of the in­put val­ues precisely

With these in place, we can define . It is:

.

The in­ner can be in­ter­preted as the to­tal area be­tween prob­a­bil­ity den­sity func­tion and prob­a­bil­ity den­sity func­tion . This is the “shift in the dis­tri­bu­tion of pro­voked by ” we men­tioned ear­lier. Over­all, then says:

  • pick one value for and mea­sure the shift in the out­put dis­tri­bu­tion from the “de­fault” out­put distribution

  • do that for each pos­si­ble and take the expectation

Some use­ful prop­er­ties to point out:

  • ranges from 0 to 1

  • If the out­put is in­de­pen­dent of the in­put, for that in­put is 0

  • The sum of for each in­put con­sid­ered sep­a­rately isn’t nec­es­sar­ily 1 be­cause there can be in­ter­ac­tion effects

In the plots be­low, for each char­ity, we vi­su­al­ize the delta sen­si­tivity (and our un­cer­tainty about that sen­si­tivity) for each in­put pa­ram­e­ter.

Direct cash transfers

GiveDirectly

Delta sensitivities for each input parameter in the GiveDirectly cost-effectiveness calculation

Delta sen­si­tivi­ties for each in­put pa­ram­e­ter in the GiveDirectly cost-effec­tive­ness calculation

Com­fort­ingly, this agrees with the re­sults of our scat­ter plot sen­si­tivity anal­y­sis. For con­ve­nience, I have copied the table from the scat­ter plot anal­y­sis de­scribing the most in­fluen­tial in­puts:

High­lighted in­put fac­tors to which re­sult is highly sensitive

In­put Type of un­cer­tainty Mean­ing/​im­por­tance
value of in­creas­ing ln con­sump­tion per cap­ita per an­num Mo­ral Deter­mines fi­nal con­ver­sion be­tween out­comes and value
trans­fer as per­cent of to­tal cost Oper­a­tional Affects cost of re­sults
re­turn on in­vest­ment Op­por­tu­ni­ties available to re­cip­i­ents Deter­mines stream of con­sump­tion over time
baseline con­sump­tion per cap­ita Em­piri­cal Diminish­ing marginal re­turns to con­sump­tion mean that baseline con­sump­tion mat­ters

Deworming

The END Fund

Delta sensitivities for each input parameter in the END Fund cost-effectiveness calculation

Delta sen­si­tivi­ties for each in­put pa­ram­e­ter in the END Fund cost-effec­tive­ness calculation

Com­fort­ingly, this again agrees with the re­sults of our scat­ter plot sen­si­tivity anal­y­sis[5]. For con­ve­nience, I have copied the table from the scat­ter plot anal­y­sis de­scribing the least in­fluen­tial in­puts:

High­lighted in­put fac­tors to which re­sult is min­i­mally sensitive

In­put Type of un­cer­tainty Mean­ing/​(un)im­por­tance
num yrs be­tween de­worm­ing and benefits Fore­cast Affects how much dis­count­ing of fu­ture in­come streams must be done
du­ra­tion of long-term benefits Fore­cast The length of time for a which a per­son works and earns in­come
ex­pected value from lev­er­age and fung­ing Game the­o­retic How much does money donated to the END Fund shift around other money

De­worm the World

Delta sensitivities for each input parameter in the Deworm the World cost-effectiveness calculation

Delta sen­si­tivi­ties for each in­put pa­ram­e­ter in the De­worm the World cost-effec­tive­ness calculation

For con­ve­nience, I have copied the table from the scat­ter plot anal­y­sis de­scribing the least in­fluen­tial in­puts:

High­lighted in­put fac­tors to which re­sult is min­i­mally sensitive

In­put Type of un­cer­tainty Mean­ing/​(un)im­por­tance
num yrs be­tween de­worm­ing and benefits Fore­cast Affects how much dis­count­ing of fu­ture in­come streams must be done
du­ra­tion of long-term benefits Fore­cast The length of time for a which a per­son works and earns in­come
ex­pected value from lev­er­age and fung­ing Game the­o­retic How much does money donated to De­worm the World shift around other money

Schis­to­so­mi­a­sis Con­trol Initiative

Delta sensitivities for each input parameter in the Schistosomiasis Control Initiative cost-effectiveness calculation

Delta sen­si­tivi­ties for each in­put pa­ram­e­ter in the Schis­to­so­mi­a­sis Con­trol Ini­ti­a­tive cost-effec­tive­ness calculation

For con­ve­nience, I have copied the table from the scat­ter plot anal­y­sis de­scribing the least in­fluen­tial in­puts:

High­lighted in­put fac­tors to which re­sult is min­i­mally sensitive

In­put Type of un­cer­tainty Mean­ing/​(un)im­por­tance
num yrs be­tween de­worm­ing and benefits Fore­cast Affects how much dis­count­ing of fu­ture in­come streams must be done
du­ra­tion of long-term benefits Fore­cast The length of time for a which a per­son works and earns in­come
ex­pected value from lev­er­age and fung­ing Game the­o­retic How much does money donated to Schis­to­so­mi­a­sis Con­trol Ini­ti­a­tive shift around other money

Sightsavers

Delta sensitivities for each input parameter in the Sightsavers cost-effectiveness calculation

Delta sen­si­tivi­ties for each in­put pa­ram­e­ter in the Sight­savers cost-effec­tive­ness calculation

For con­ve­nience, I have copied the table from the scat­ter plot anal­y­sis de­scribing the least in­fluen­tial in­puts:

High­lighted in­put fac­tors to which re­sult is min­i­mally sensitive

In­put Type of un­cer­tainty Mean­ing/​(un)im­por­tance
num yrs be­tween de­worm­ing and benefits Fore­cast Affects how much dis­count­ing of fu­ture in­come streams must be done
du­ra­tion of long-term benefits Fore­cast The length of time for a which a per­son works and earns in­come
ex­pected value from lev­er­age and fung­ing Game the­o­retic How much does money donated to Sight­savers shift around other money

De­worm­ing comment

That we get sub­stan­tially iden­ti­cal re­sults in terms of delta sen­si­tivi­ties for each de­worm­ing char­ity is not sur­pris­ing: The struc­ture of each calcu­la­tion is the same and (for the sake of not taint­ing the anal­y­sis with my idiosyn­cratic per­spec­tive) the un­cer­tainty on each in­put pa­ram­e­ter is the same.

Sea­sonal malaria chemoprevention

Malaria Consortium

Delta sensitivities for each input parameter in the Malaria Consortium cost-effectiveness calculation

Delta sen­si­tivi­ties for each in­put pa­ram­e­ter in the Malaria Con­sor­tium cost-effec­tive­ness calculation

Again, there seems to be good agree­ment be­tween the delta sen­si­tivity anal­y­sis and the scat­ter plot sen­si­tivity anal­y­sis though there is per­haps a bit of re­order­ing in the top fac­tor. For con­ve­nience, I have copied the table from the scat­ter plot anal­y­sis de­scribing the most in­fluen­tial in­puts:

High­lighted in­put fac­tors to which re­sult is highly sensitive

In­put Type of un­cer­tainty Mean­ing/​im­por­tance
in­ter­nal val­idity ad­just­ment Method­olog­i­cal How much do we trust the re­sults of the un­der­ly­ing SMC stud­ies)
di­rect mor­tal­ity in high trans­mis­sion sea­son Em­piri­cal Frac­tion of over­all malaria mor­tal­ity dur­ing the peak trans­mis­sion sea­son and amenable to SMC
cost per child tar­geted Oper­a­tional Afffects cost of re­sults
ex­ter­nal val­idity ad­just­ment Method­olog­i­cal How much do the re­sults of the un­der­ly­ing SMC stud­ies trans­fer to new set­tings
cov­er­age in tri­als in meta-anal­y­sis His­tor­i­cal/​method­olog­i­cal Deter­mines how much cov­er­age an SMC pro­gram needs to achieve to match stud­ies
value of avert­ing death of a young child Mo­ral Deter­mines fi­nal con­ver­sion be­tween out­comes and value

Vi­tamin A supplementation

Hel­len Kel­ler International

Delta sensitivities for each input parameter in the Helen Keller International cost-effectiveness calculation

Delta sen­si­tivi­ties for each in­put pa­ram­e­ter in the He­len Kel­ler In­ter­na­tional cost-effec­tive­ness calculation

Again, there’s broad agree­ment be­tween the scat­ter plot anal­y­sis and this one. This anal­y­sis per­haps makes the cru­cial im­por­tance of the rel­a­tive risk of all-cause mor­tal­ity for young chil­dren in VAS pro­grams even more ob­vi­ous. For con­ve­nience, I have copied the table from the scat­ter plot anal­y­sis de­scribing the most in­fluen­tial in­puts:

High­lighted in­put fac­tors to which re­sult is highly sensitive

In­put Type of un­cer­tainty Mean­ing/​im­por­tance
rel­a­tive risk of all-cause mor­tal­ity for young chil­dren in pro­grams Causal How much do VAS pro­grams af­fect mor­tal­ity
cost per child per round Oper­a­tional Affects the to­tal cost re­quired to achieve effect
rounds per year Oper­a­tional Affects the to­tal cost re­quired to achieve effect

Bednets

Against Malaria Foundation

Delta sensitivities for each input parameter in the Against Malaria Foundation cost-effectiveness calculation

Delta sen­si­tivi­ties for each in­put pa­ram­e­ter in the Against Malaria Foun­da­tion cost-effec­tive­ness calculation

Again, there’s broad agree­ment be­tween the scat­ter plot anal­y­sis and this one. For con­ve­nience, I have copied the table from the scat­ter plot anal­y­sis de­scribing the most in­fluen­tial in­puts:

High­lighted in­put fac­tors to which re­sult is highly sensitive

In­put Type of un­cer­tainty Mean­ing/​im­por­tance
num LLINs dis­tributed per per­son Oper­a­tional Affects the to­tal cost re­quired to achieve effect
cost per LLIN Oper­a­tional Affects the to­tal cost re­quired to achieve effect
deaths averted per pro­tected child un­der 5 Causal How effec­tive is the core ac­tivity
lifes­pan of an LLIN Em­piri­cal Deter­mines how many years of benefit ac­crue to each dis­tri­bu­tion
net use ad­just­ment Em­piri­cal Affects benefits from LLIN as me­di­ated by proper and im­proper use
in­ter­nal val­idity ad­just­ment Method­olog­i­cal How much do we trust the re­sults of the un­der­ly­ing stud­ies
per­cent of mor­tal­ity due to malaria in AMF ar­eas vs tri­als Em­piri­cal/​his­tor­i­cal Affects size of the prob­lem
per­cent of pop. un­der 5 Em­piri­cal Affects size of the prob­lem

Sec­tion recap

We performed vi­sual (scat­ter plot) sen­si­tivity analy­ses and delta mo­ment-in­de­pen­dent sen­si­tivity analy­ses on GiveWell’s top char­i­ties. Con­ve­niently, these two meth­ods gen­er­ally agreed as to which in­put fac­tors had the biggest in­fluence on the out­put. For each char­ity, we found that there were clear differ­ences in the sen­si­tivity in­di­ca­tors for differ­ent in­puts.

This sug­gests that cer­tain in­puts are bet­ter tar­gets than oth­ers for un­cer­tainty re­duc­tion. For ex­am­ple, the over­all es­ti­mate of the cost-effec­tive­ness of He­len Kel­ler In­ter­na­tional’s vi­tamin A sup­ple­men­ta­tion pro­gram de­pends much more on the rel­a­tive risk of all-cause mor­tal­ity for chil­dren in VAS pro­grams than it does on the ex­pected value from lev­er­age and fung­ing. If the cost of in­ves­ti­gat­ing each were the same, it would be bet­ter to spend time on the former.

An im­por­tant caveat to re­mem­ber is that these re­sults still re­flect my fairly ar­bi­trary (but scrupu­lously neu­tral) de­ci­sion to pre­tend that we equally un­cer­tain about each in­put pa­ram­e­ter. To rem­edy this flaw, head over to the Jupyter note­book and tweak the in­put dis­tri­bu­tions.

Uncer­tainty and sen­si­tivity anal­y­sis of GiveWell’s ranking

Sec­tion overview

In the last two sec­tions, we performed un­cer­tainty and sen­si­tivity analy­ses on GiveWell’s char­ity cost-effec­tive­ness es­ti­mates. Our out­puts were, re­spec­tively:

  • prob­a­bil­ity dis­tri­bu­tions de­scribing our un­cer­tainty about the value per dol­lar ob­tained for each char­ity and

  • es­ti­mates of how sen­si­tive each char­ity’s cost-effec­tive­ness is to each of its in­put parameters

One prob­lem with this is that we are not sup­posed to take the cost-effec­tive­ness es­ti­mates liter­ally. Ar­guably, the real pur­pose of GiveWell’s anal­y­sis is not to pro­duce ex­act num­bers but to as­sess the rel­a­tive qual­ity of each char­ity eval­u­ated.

Another is­sue is that by treat­ing each cost-effec­tive­ness es­ti­mate as in­de­pen­dent we un­der­weight pa­ram­e­ters which are shared across many mod­els. For ex­am­ple, the moral weight that ought to be as­signed to in­creas­ing con­sump­tion shows up in many mod­els. If we con­sider all the char­ity-spe­cific mod­els to­gether, this in­put seems to be­come more im­por­tant.

Our solu­tion to these prob­lems will be to use dis­tance met­rics on the over­all char­ity rank­ings. By us­ing dis­tance met­rics across these mul­ti­di­men­sional out­puts, we can perform un­cer­tainty and sen­si­tivity anal­y­sis to an­swer ques­tions about:

  • how un­cer­tain we are about the over­all rel­a­tive cost-effec­tive­ness of the charities

  • which in­put pa­ram­e­ters this over­all rel­a­tive cost-effec­tive­ness is most sen­si­tive to

Met­rics on rankings

Our first step on the path to a solu­tion is to ab­stract away from par­tic­u­lar val­ues in the cost-effec­tive­ness anal­y­sis and look at the over­all rank­ings re­turned. That is we want to trans­form:

GiveWell’s cost-effec­tive­ness es­ti­mates for its top charities

Char­ity Value per $10,000 donated
GiveDirectly 38
The END Fund 222
De­worm the World 738
Schis­to­so­mi­a­sis Con­trol Ini­ti­a­tive 378
Sight­savers 394
Malaria Con­sor­tium 326
Against Malaria Foun­da­tion 247
He­len Kel­ler In­ter­na­tional 223

into:

Givewell’s top char­i­ties ranked from most cost-effec­tive to least

  • De­worm the World

  • Sightsavers

  • Schis­to­so­mi­a­sis Con­trol Initiative

  • Malaria Consortium

  • Against Malaria Foundation

  • He­len Kel­ler International

  • The END Fund

  • GiveDirectly

But how do we use­fully ex­press prob­a­bil­ities over rank­ings[6] (rather than prob­a­bil­ities over sim­ple cost-effec­tiv­ness num­bers)? The ap­proach we’ll fol­low be­low is to char­ac­ter­ize a rank­ing pro­duced by a run of the model by com­put­ing its dis­tance from the refer­ence rank­ing listed above (i.e. GiveWell’s cur­rent best es­ti­mate). Our out­put prob­a­bil­ity dis­tri­bu­tion will then ex­press how far we ex­pect to be from the refer­ence rank­ing—how much we might learn about the rank­ing with more in­for­ma­tion on the in­puts. For ex­am­ple, if the dis­tri­bu­tion is nar­row and near 0, that means our un­cer­tain in­put pa­ram­e­ters mostly pro­duce re­sults similar to the refer­ence rank­ing. If the dis­tri­bu­tion is wide and far from 0, that means our un­cer­tain in­put pa­ram­e­ters pro­duce re­sults that are highly un­cer­tain and not nec­es­sar­ily similar to the refer­ence rank­ing.

Spear­man’s footrule

What is this mys­te­ri­ous dis­tance met­ric be­tween rank­ings that en­ables the above ap­proach? One such met­ric is called Spear­man’s footrule dis­tance. It’s defined as:

where:

  • and are rank­ings,

  • varies over all the el­e­ments of the rank­ings and

  • re­turns the in­te­ger po­si­tion of item in rank­ing .

In other words, the footrule dis­tance be­tween two rank­ings is the sum over all items of the (ab­solute) differ­ence in po­si­tions for each item. (We also add a nor­mal­iza­tion fac­tor so that the dis­tance varies ranges from 0 to 1 but omit that trivia here.)

So the dis­tance be­tween A, B, C and A, B, C is 0; the (un­nor­mal­ized) dis­tance be­tween A, B, C and C, B, A is 4; and the (un­nor­mal­ized) dis­tance be­tween A, B, C and B, A, C is 2.

Ken­dall’s tau

Another com­mon dis­tance met­ric be­tween rank­ings is Ken­dall’s tau. It’s defined as:

where:

  • and are again rank­ings,

  • and are items in the set of un­ordered pairs of dis­tinct el­e­ments in and

  • if and are in the same or­der (con­cor­dant) in and and oth­er­wise (dis­cor­dant)

In other words, the Ken­dall tau dis­tance looks at all pos­si­ble pairs across items in the rank­ings and counts up the ones where the two rank­ings dis­agree on the or­der­ing of these items. (There’s also a nor­mal­iza­tion fac­tor that we’ve again omit­ted so that the dis­tance ranges from 0 to 1.)

So the dis­tance be­tween A, B, C and A, B, C is 0; the (un­nor­mal­ized) dis­tance be­tween A, B, C and C, B, A is 3; and the (un­nor­mal­ized) dis­tance be­tween A, B, C and B, A, C is 1.

An­gu­lar distance

One draw­back of the above met­rics is that they throw away in­for­ma­tion in go­ing from the table with cost-effec­tive­ness es­ti­mates to a sim­ple rank­ing. What would be ideal is to keep that in­for­ma­tion and find some other dis­tance met­ric that still em­pha­sizes the re­la­tion­ship be­tween the var­i­ous num­bers rather than their pre­cise val­ues.

An­gu­lar dis­tance is a met­ric which satis­fies these crite­ria. We can re­gard the table of char­i­ties and cost-effec­tive­ness val­ues as an 8-di­men­sional vec­tor. When our out­put pro­duces an­other vec­tor of cost-effec­tive­ness es­ti­mates (one for each char­ity), we can com­pare this to our refer­ence vec­tor by find­ing the an­gle be­tween the two[7].

Results

Uncertainties

To re­cap, what we’re about to see next is the re­sult of run­ning our model many times with differ­ent sam­pled in­put val­ues. In each run, we com­pute the cost-effec­tive­ness es­ti­mates for each char­ity and com­pare those es­ti­mates to the refer­ence rank­ing (GiveWell’s best es­ti­mate) us­ing each of the tau, footrule and an­gu­lar dis­tance met­rics. Again, the plots be­low are from run­ning the anal­y­sis while pre­tend­ing that we’re equally un­cer­tain about each in­put pa­ram­e­ter. To avoid this limi­ta­tion, go to the Jupyter note­book and ad­just the in­put dis­tri­bu­tions.

Probability distributions of value per dollar for each of GiveWell's top charity and probability distributions for the distance between model results and the reference results

Prob­a­bil­ity dis­tri­bu­tions of value per dol­lar for each of GiveWell’s top char­ity and prob­a­bil­ity dis­tri­bu­tions for the dis­tance be­tween model re­sults and the refer­ence results

We see that our in­put un­cer­tainty does mat­ter even for these high­est level re­sults—there are some in­put val­ues which cause the or­der­ing of best char­i­ties to change. If the gaps be­tween the cost-effec­tive­ness es­ti­mates had been very large or our in­put un­cer­tainty had been very small, we would have ex­pected es­sen­tially all of the prob­a­bil­ity mass to be con­cen­trated at 0 be­cause no change in in­puts would have been enough to mean­ingfully change the rel­a­tive cost-effec­tive­ness of the char­i­ties.

Vi­sual sen­si­tivity analysis

We can now re­peat our vi­sual sen­si­tivity anal­y­sis but us­ing our dis­tance met­rics from the refer­ence as our out­come of in­ter­est in­stead of in­di­vi­d­ual cost-effec­tive­ness es­ti­mates. What these plots show is how sen­si­tive the rel­a­tive cost-effec­tive­ness of the differ­ent char­i­ties is to each of the in­put pa­ram­e­ters used in any of the cost-effec­tive­ness mod­els (so, yes, there are a lot of pa­ram­e­ters/​plots). We have three big plots, one for each dis­tance met­ric—footrule, tau and an­gle. In each plot, there’s a sub­plot cor­re­spond­ing to each in­put fac­tor used any­where in the GiveWell’s cost-effec­tive­ness anal­y­sis.

Scat­ter plots show­ing sen­si­tivity of the footrule dis­tance with re­spect to each in­put parameter

Scatter plots showing sensitivity of the footrule distance with respect to each input parameter

Scat­ter plots show­ing sen­si­tivity of the tau dis­tance with re­spect to each in­put parameter

Scatter plots showing sensitivity of the tau distance with respect to each input parameter

Scat­ter plots show­ing sen­si­tivity of the an­gu­lar dis­tance with re­spect to each in­put parameter

Scatter plots showing sensitivity of the angular distance with respect to each input parameter

(The band­ing in the tau and footrule plots is just an ar­ti­fact of those dis­tance met­rics re­turn­ing in­te­gers (be­fore nor­mal­iza­tion) rather than re­als.)

Th­ese re­sults might be a bit sur­pris­ing at first. Why are there so many char­ity-spe­cific fac­tors with ap­par­ently high sen­si­tivity in­di­ca­tors? Shouldn’t in­put pa­ram­e­ters which af­fect all mod­els have the biggest in­fluence on the over­all re­sult? Also, why do so few of the fac­tors that showed up as most in­fluen­tial in the char­ity-spe­cific sen­si­tivity analy­ses from last sec­tion make it to the top?

How­ever, af­ter re­flect­ing for a bit, this makes sense. Be­cause we’re in­ter­ested in the rel­a­tive perfor­mance of the char­i­ties, any fac­tor which af­fects them all equally is of lit­tle im­por­tance here. In­stead, we want fac­tors that have a strong in­fluence on only a few char­i­ties. When we go back to the ear­lier char­ity-by-char­ity sen­si­tivity anal­y­sis, we see that many of the in­put pa­ram­e­ters we iden­ti­fied as most in­fluen­tial where shared across char­i­ties (es­pe­cially across the de­worm­ing char­i­ties). Non-shared fac­tors that made it to the top of the char­ity-by-char­ity lists—like the rel­a­tive risk of all-cause mor­tal­ity for young chil­dren in VAS pro­grams—show up some­what high here too.

But it’s hard to eye­ball the sen­si­tivity when there are so many fac­tors and most are of small effect. So let’s quickly move on to the delta anal­y­sis.

Delta mo­ment-in­de­pen­dent sen­si­tivity analysis

Again, we’ll have three big plots, one for each dis­tance met­ric—footrule, tau and an­gle. In each plot, there’s an es­ti­mate of the delta mo­ment-in­de­pen­dent sen­si­tivity for each in­put fac­tor used any­where in the GiveWell’s cost-effec­tive­ness anal­y­sis (and an in­di­ca­tion of how con­fi­dent that sen­si­tivity es­ti­mate is).

Delta sen­si­tivi­ties for each in­put pa­ram­e­ter in footrule dis­tance analysis

Delta sensitivities for each input parameter in footrule distance analysis

Delta sen­si­tivi­ties for each in­put pa­ram­e­ter in the tau dis­tance analysis

Delta sensitivities for each input parameter in the tau distance analysis

Delta sen­si­tivi­ties for each in­put pa­ram­e­ter in an­gu­lar dis­tance analysis

Delta sensitivities for each input parameter in angular distance analysis

So these delta sen­si­tivi­ties cor­rob­o­rate the sus­pi­cion that arose dur­ing the vi­sual sen­si­tivity anal­y­sis—char­ity-spe­cific in­put pa­ram­e­ters have the high­est sen­si­tivity in­di­ca­tors.

The other note­wor­thy re­sult is which char­ity-spe­cific fac­tors are the most in­fluen­tial de­pends some­what on which dis­tance met­ric we use. The two rank-based met­rics—tau and footrule dis­tance—both sug­gest that the fi­nal char­ity rank­ing (given these in­puts) is most sen­si­tive to the worm in­ten­sity ad­just­ment and cost per cap­ita per an­num of Sight­savers and the END Fund. Th­ese in­put pa­ram­e­ters are a bit fur­ther down (though still fairly high) in the list ac­cord­ing to the an­gu­lar dis­tance met­ric.

Needs more meta

It would be nice to check that our dis­tance met­rics don’t pro­duce to­tally con­tra­dic­tory re­sults. How can we ac­com­plish this? Well, the plots above already or­der the in­put fac­tors ac­cord­ing to their sen­si­tivity in­di­ca­tors… That means we have rank­ings of the sen­si­tivi­ties of the in­put fac­tors and we can com­pare the rank­ings us­ing Ken­dall’s tau and Spear­man’s footrule dis­tance. If that sounds con­fus­ing hope­fully the table clears things up:

Us­ing Ken­dall’s tau and Spear­man’s footrule dis­tance to as­sess the similar­ity of sen­si­tivity rank­ings gen­er­ated un­der differ­ent dis­tance metrics

Delta sen­si­tivity rank­ings com­pared Tau dis­tance Footrule dis­tance
Tau and footrule 0.358 0.469
Tau and an­gle 0.365 0.516
An­gle and footrule 0.430 0.596

So it looks like the three rank­ings have mid­dling agree­ment. Sen­si­tivi­ties ac­cord­ing to tau and footrule agree the most while sen­si­tivi­ties ac­cord­ing to an­gle and footrule agree the least. The dis­agree­ment prob­a­bly also re­flects ran­dom noise since the con­fi­dence in­ter­vals for many of the vari­ables’ sen­si­tivity in­di­ca­tors over­lap. We could pre­sum­ably shrink these con­fi­dence in­ter­vals and re­duce the noise by in­creas­ing the num­ber of sam­ples used dur­ing our anal­y­sis.

To the ex­tent that the dis­agree­ment isn’t just noise, it’s not en­tirely sur­pris­ing—part of the point of us­ing differ­ent dis­tance met­rics is to cap­ture differ­ent no­tions of dis­tance, each of which might be more or less suit­able for a given pur­pose. But the di­ver­gence does mean that we’ll need to care­fully pick which met­ric to pay at­ten­tion to de­pend­ing on the pre­cise ques­tions we’re try­ing to an­swer. For ex­am­ple, if we just want to pick the sin­gle top char­ity and donate all our money to that, fac­tors with high sen­si­tivity in­di­ca­tors ac­cord­ing to footrule dis­tance might be the most im­por­tant to pin down. On the other hand, if we want to dis­tribute our money in pro­por­tion to each char­ity’s es­ti­mated cost-effec­tive­ness, an­gu­lar dis­tance is per­haps a bet­ter met­ric to guide our in­ves­ti­ga­tions.

Sec­tion recap

We started with a cou­ple of prob­lems with our pre­vi­ous anal­y­sis: we were tak­ing cost-effec­tive­ness es­ti­mates liter­ally and look­ing at them in­de­pen­dently in­stead of as parts of a co­he­sive anal­y­sis. We ad­dressed these prob­lems by re­do­ing our anal­y­sis while look­ing at dis­tance from the cur­rent best cost-effec­tive­ness es­ti­mates. We found that our in­put un­cer­tainty is con­se­quen­tial even when look­ing only at the rel­a­tive cost-effec­tive­ness of the char­i­ties. We also found that in­put pa­ram­e­ters which are im­por­tant but unique to a par­tic­u­lar char­ity of­ten af­fect the fi­nal rel­a­tive cost-effec­tive­ness sub­stan­tially.

Fi­nally, we have the same caveat as last time: these re­sults still re­flect my fairly ar­bi­trary (but scrupu­lously neu­tral) de­ci­sion to pre­tend that we equally un­cer­tain about each in­put pa­ram­e­ter. To rem­edy this flaw and get re­sults which are ac­tu­ally mean­ingful, head over to the Jupyter note­book and tweak the in­put dis­tri­bu­tions.

Recap

GiveWell mod­els the cost-effec­tive­ness of its top char­i­ties with point es­ti­mates in a spread­sheet. We in­sisted that work­ing with prob­a­bil­ity dis­tri­bu­tions in­stead of point es­ti­mates more fully re­flects our state of knowl­edge. By perform­ing un­cer­tainty anal­y­sis, we got a bet­ter sense of how un­cer­tain the re­sults are (e.g. GiveDirectly is the most cer­tain given our in­puts). After un­cer­tainty anal­y­sis, we pro­ceeded to sen­si­tivity anal­y­sis and found that in­deed there were some in­put pa­ram­e­ters that were more in­fluen­tial than oth­ers. The most in­fluen­tial pa­ram­e­ters are likely tar­gets for fur­ther in­ves­ti­ga­tion and re­fine­ment. The fi­nal step we took was to com­bine the in­di­vi­d­ual char­ity cost-effec­tive­ness es­ti­mates into one gi­ant model. By look­ing at how far (us­ing three differ­ent dis­tance met­rics) these re­sults de­vi­ated from the cur­rent over­all cost-effec­tive­ness anal­y­sis, we ac­com­plished two things. First, we con­firmed that our in­put un­cer­tainty is in­deed con­se­quen­tial—there are some plau­si­ble in­put val­ues which might re­order the top char­i­ties in terms of cost-effec­tive­ness. Se­cond, we iden­ti­fied which in­put pa­ram­e­ters (given our un­cer­tainty) have the high­est sen­si­tivity in­di­ca­tors and there­fore are the best tar­gets for fur­ther scrutiny. We also found that this fi­nal sen­si­tivity anal­y­sis was fairly sen­si­tive to which dis­tance met­ric we use so it’s im­por­tant to pick a dis­tance met­ric tai­lored to the ques­tion of in­ter­est.

Fi­nally, through­out, I re­minded you that this is more of a tem­plate for an anal­y­sis than an ac­tual anal­y­sis be­cause we pre­tended to be equally un­cer­tain about each in­put pa­ram­e­ter. To get a more use­ful anal­y­sis, you’ll have to edit the in­put un­cer­tainty to re­flect your ac­tual be­liefs and run the Jupyter note­book.

Appendix

Sobol in­dices for per-char­ity cost-effectiveness

I also did a var­i­ance-based sen­si­tivity anal­y­sis with Sobol in­dices. Those plots fol­low.

The vari­able or­der in each plot is from the in­put pa­ram­e­ter with the high­est sen­si­tivity to the in­put pa­ram­e­ter with the low­est sen­si­tivity. That makes it straight­for­ward to com­pare the or­der­ing of sen­si­tivi­ties ac­cord­ing to the delta mo­ment-in­de­pen­dent method and ac­cord­ing to the Sobol method. We see that there is broad—but not perfect—agree­ment be­tween the meth­ods.

Sobol sensitivities for each input parameter in the GiveDirectly cost-effectiveness calculation

Sobol sen­si­tivi­ties for each in­put pa­ram­e­ter in the GiveDirectly cost-effec­tive­ness calculation

Sobol sensitivities for each input parameter in the END Fund cost-effectiveness calculation

Sobol sen­si­tivi­ties for each in­put pa­ram­e­ter in the END Fund cost-effec­tive­ness calculation

Sobol sensitivities for each input parameter in the Deworm the World cost-effectiveness calculation

Sobol sen­si­tivi­ties for each in­put pa­ram­e­ter in the De­worm the World cost-effec­tive­ness calculation

Sobol sensitivities for each input parameter in the Schistosomiasis Control Initiative cost-effectiveness calculation

Sobol sen­si­tivi­ties for each in­put pa­ram­e­ter in the Schis­to­so­mi­a­sis Con­trol Ini­ti­a­tive cost-effec­tive­ness calculation

Sobol sensitivities for each input parameter in the Sightsavers cost-effectiveness calculation

Sobol sen­si­tivi­ties for each in­put pa­ram­e­ter in the Sight­savers cost-effec­tive­ness calculation

Sobol sensitivities for each input parameter in the Malaria Consortium cost-effectiveness calculation

Sobol sen­si­tivi­ties for each in­put pa­ram­e­ter in the Malaria Con­sor­tium cost-effec­tive­ness calculation

Sobol sensitivities for each input parameter in the Helen Keller International cost-effectiveness calculation

Sobol sen­si­tivi­ties for each in­put pa­ram­e­ter in the He­len Kel­ler In­ter­na­tional cost-effec­tive­ness calculation

Sobol sensitivities for each input parameter in the Against Malaria Foundation cost-effectiveness calculation

Sobol sen­si­tivi­ties for each in­put pa­ram­e­ter in the Against Malaria Foun­da­tion cost-effec­tive­ness calculation

Sobol in­dices for rel­a­tive cost-effec­tive­ness of charities

The vari­able or­der in each plot is from the in­put pa­ram­e­ter with the high­est sen­si­tivity to the in­put pa­ram­e­ter with the low­est sen­si­tivity. That makes it straight­for­ward to com­pare the or­der­ing of sen­si­tivi­ties ac­cord­ing to the delta mo­ment-in­de­pen­dent method and ac­cord­ing to the Sobol method. We see that there is broad—but not perfect—agree­ment be­tween the differ­ent meth­ods.

Sobol sen­si­tivi­ties for each in­put pa­ram­e­ter in footrule dis­tance analysis

Sobol sensitivities for each input parameter in footrule distance analysis

Sobol sen­si­tivi­ties for each in­put pa­ram­e­ter in tau dis­tance analysis

Sobol sensitivities for each input parameter in tau distance analysis

Sobol sen­si­tivi­ties for each in­put pa­ram­e­ter in an­gu­lar dis­tance analysis

Sobol sensitivities for each input parameter in angular distance analysis


  1. Un­for­tu­nately, the code im­ple­ments the 2019 V4 cost-effec­tive­ness anal­y­sis in­stead of the most re­cent V5 be­cause I just worked off the V4 tab I’d had lurk­ing in my browser for months and didn’t think to check for a new ver­sion un­til too late. I also de­vi­ated from the spread­sheet in one place be­cause I think there’s an er­ror (Up­date: The er­ror will be fixed in GiveWell’s next pub­li­cally-re­leased ver­sion). ↩︎

  2. Log-nor­mal strikes me as a rea­son­able de­fault dis­tri­bu­tion for this task: be­cause it’s sup­port is (0, +∞) which fits many of our pa­ram­e­ters well (they’re all pos­i­tive but some are ac­tu­ally bounded above by 1); and be­cause “A log-nor­mal pro­cess is the statis­ti­cal re­al­iza­tion of the mul­ti­plica­tive product of many in­de­pen­dent ran­dom vari­ables” which also seems rea­son­able here. ↩︎

  3. When you fol­low the link, you should see a Jupyter note­book with three “cells”. The first is a pream­ble set­ting things up. The sec­ond has all the pa­ram­e­ters with lower and up­per bounds. This is the part you want to edit. Once you’ve ed­ited it, find and click “Run­time > Run all” in the menu. You should even­tu­ally see the note­book pro­duce a se­ri­ous of plots. ↩︎

  4. This is, in fact, ap­prox­i­mately what Monte Carlo meth­ods do so this is a very con­ve­nient anal­y­sis to run. ↩︎

  5. I swear I didn’t cheat by just pick­ing the re­sults on the scat­ter plot that match the delta sen­si­tivi­ties! ↩︎

  6. If we just look at the prob­a­bil­ity for each pos­si­ble rank­ing in­de­pen­dently, we’ll be over­whelmed by the num­ber of per­mu­ta­tions and it will be hard to find any use­ful struc­ture in our re­sults. ↩︎

  7. The an­gle be­tween the vec­tors is a bet­ter met­ric here than the dis­tance be­tween the vec­tors’ end­points be­cause we’re in­ter­ested in the rel­a­tive cost-effec­tive­ness of the char­i­ties and how those change. If our re­sults show that each char­ity is twice as effec­tive as in the refer­ence vec­tor, our met­ric should re­turn a dis­tance of 0 be­cause noth­ing has changed in the rel­a­tive cost-effec­tive­ness of each char­ity. ↩︎