Expected value estimates we (cautiously) took literally—Oxford Prioritisation Project

By Tom Sittler

Cross-posted from the Oxford Pri­ori­ti­sa­tion Pro­ject blog. We’re cen­tral­is­ing all dis­cus­sion on the Effec­tive Altru­ism fo­rum. To dis­cuss this post, please com­ment here.

Sum­mary: This post de­scribes how we turned the cost-effec­tive­ness es­ti­mates of our four short­listed or­gani­sa­tions into a fi­nal de­ci­sion. In or­der to give ad­e­quately more weight to more ro­bust es­ti­mates, we use the four model out­puts to up­date a prior dis­tri­bu­tion over grantee or­gani­sa­tion im­pacts.

Introduction

In­spired by Michael Dick­ens’ ex­am­ple, we de­cided to for­mal­ise the in­tu­ition that more ro­bust es­ti­mates should get higher weights. We did this by treat­ing the out­puts of our mod­els as pro­vid­ing an ev­i­dence dis­tri­bu­tion, which we use to up­date a prior dis­tri­bu­tion over the cost-effec­tive­ness of po­ten­tial grantee or­gani­sa­tions.

Code

The code we used to com­pute the Bayesian pos­te­rior es­ti­mates is here.

The prior distribution

This is the un­con­di­tional (prior) prob­a­bil­ity dis­tri­bu­tion of the cost-effec­tive­ness of po­ten­tial grantees. We use a log­nor­mal dis­tri­bu­tion. A the­o­ret­i­cal jus­tifi­ca­tion for this is that we ex­pect cost-effec­tive­ness to be the re­sult of a mul­ti­plica­tive rather than ad­di­tive pro­cess. A pos­si­ble em­piri­cal jus­tifi­ca­tion could be the dis­tri­bu­tion of cost-effec­tive­ness of DCP-2 in­ter­ven­tions. Again, this has been dis­cussed at length el­se­where.

The pa­ram­e­ters of a log­nor­mal dis­tri­bu­tion are its scale and lo­ca­tion. The scale is equal to the stan­dard de­vi­a­tion of the nat­u­ral log­a­r­ithm of the val­ues, and the lo­ca­tion is equal to the mean of the nat­u­ral log­a­r­ithm of the val­ues. The me­dian of a log­nor­mal dis­tri­bu­tion is the ex­po­nen­tial of its lo­ca­tion.

We choose the lo­ca­tion pa­ram­e­ter such that the me­dian of the dis­tri­bu­tion is as cost-effec­tive as highly effec­tive global health in­ter­ven­tions such a those recom­mended by GiveWell, which we es­ti­mate to provide a QALY for $50. In­tu­itively, this means that the set of or­gani­sa­tions we were con­sid­er­ing fund­ing at the start of the pro­ject had a me­dian cost-effec­tive­ness of 0.02 QALYs/​$.

We set the scale pa­ram­e­ter as 0.5, which means that the stan­dard de­vi­a­tion of the nat­u­ral log­a­r­ithm of our prior is 25 times the mean of the nat­u­ral log­a­r­ithm of our prior. This is a rel­a­tively poorly in­formed guess, which we ar­rived at mostly by look­ing at the choices of Michael Dick­ens and check­ing that they did not in­tu­itively seem ab­surd to team mem­bers.

Had we cho­sen a scale pa­ram­e­ter more than about 2.2 times as large, the Ma­chine In­tel­li­gence Re­search In­sti­tute would have had the high­est pos­te­rior cost-effec­tive­ness es­ti­mate.

The ev­i­dence distribution

We fit the out­puts of our mod­els, which are lists of num­bers, to a log­nor­mal prob­a­bil­ity dis­tri­bu­tion. The fit is ex­cel­lent, as you can see from the graphs be­low. On the log scale, the prob­a­bil­ity den­sity func­tion of our origi­nal data ap­pears in black and the prob­a­bil­ity den­sity func­tion of data ran­domly gen­er­ated from the log­nor­mal dis­tri­bu­tion we fit­ted to the origi­nal data ap­pears in red.

This is the graph for StrongMinds:

And the graph for MIRI:

The other graphs look very similar, so I’m not in­clud­ing them here. You can gen­er­ate them us­ing the code I provide.

What about nega­tive val­ues?

The mod­els for An­i­mal Char­ity Eval­u­a­tors and Ma­chine In­tel­li­gence Re­search in­sti­tute con­tain nega­tive val­ues, so they can­not be fit­ted to a log­nor­mal dis­tri­bu­tion.

In­stead, we split the data into a pos­i­tive and a nega­tive log­nor­mal, which we up­date sep­a­rately on a pos­i­tive and a nega­tive prior.

In­tu­itively, we think that both in­ter­ven­tions that do a large amount of good (in the tail of the pos­i­tive prior) and in­ter­ven­tions that do a large amount of hard (in the tail of the nega­tive prior) are un­likely in pri­ors.

Up­dat­ing when dis­tri­bu­tions are lognormal

In my other post, I de­rive a closed-form solu­tion to the prob­lem of up­dat­ing a log­nor­mal prior us­ing a log­nor­mal ev­i­dence dis­tri­bu­tion.

Units

A word on units: in­side each of the four mod­els, we con­vert all es­ti­mates to “Hu­man-equiv­a­lent well-be­ing-ad­justed life-years” (HEWALYs). One HEWALY is a QALY, or a year of life as a fully healthy, mod­ern-day hu­man. If an ac­tion pro­duces zero HEWALYs, we are in­differ­ent be­tween do­ing it and not do­ing it. Nega­tive HEWALYs cor­re­spond to lives not worth liv­ing, and −1 HEWALY is as bad as 1 HEWALY is good. In other words, we are in­differ­ent be­tween caus­ing 0 HEWALYs and caus­ing both 1 HEWALY and −1 HEWALY.

A be­ing can ac­crue more than 1 HEWALY per year, be­cause life can be bet­ter than life as a fully healthy mod­ern-day hu­man. Sym­met­ri­cally, a be­ing can ac­crue less than −1 HEWALY per year.

Results

You can view the re­sults and code here. If you dis­agree with our prior pa­ram­e­ters, we en­courage you to try our own val­ues and see what you come up with, in the style of GiveWell, who provide their pa­ram­e­ters as es­ti­mated by each staff mem­ber. We also in­clude com­mented-out code to vi­su­al­ise how the pos­te­rior es­ti­mates de­pend on the prior pa­ram­e­ters.

In­ter­est­ing phenomena

Our prior strongly pun­ishes MIRI. While the mean of its ev­i­dence dis­tri­bu­tion is 2,053,690,000 HEWALYs/​$10,000, the pos­te­rior mean is only 180.8 HEWALYs/​$10,000. If we set the prior scale pa­ram­e­ter to larger than about 1.09, the pos­te­rior es­ti­mate for MIRI is greater than 1038 HEWALYs/​$10,000, thus beat­ing 80,000 Hours.

Our es­ti­mate of StrongMinds is lower than our prior. The StrongMinds ev­i­dence dis­tri­bu­tion had a mean 17.9 HEWALYs/​$10,000 which is lower than the pos­te­rior of 18.5 HEWALYs/​$10,000. We can in­ter­pret this in the fol­low­ing way: we found ev­i­dence that StrongMinds is has sur­pris­ingly (rel­a­tive to our prior) low cost-effec­tive­ness, so tak­ing into ac­count the prior leads us to in­crease our es­ti­mate of StrongMinds.