Hi folks! I am entering this into the GiveWell Change Our Mind contest, and am looking to have any critique of the essay above. Would love to know your thoughts, suggestions, etc, and will happily credit you.
As a quick summary, GiveWell’s decision rules are based on whether their “best guess” about the cost-effectiveness of an intervention reaches some threshold, ignoring how much uncertainty is made in their estimate. Rather than making their decisions risk-neutral, this strategy makes them risk-seeking and biased toward worse interventions overall.
In this essay, I use a combination of interactive tools and examples to demonstrate that:
Without an uncertainty framework, GW’s models are
Over confident
Biased toward more uncertain interventions
Biased toward weaker interventions
The uncertainty problem is large, serious, and cannot be ignored
A probabilistic sensitivity analysis-based decision process addresses these issues
A probabilistic sensitivity analysis is very doable within GiveWell’s workflow without enormous additional burden or changes to model building infrastructure. As part of this essay, much of that infrastructure work has already been done.
READ ME FIRST:
This document is being made public for the purposes of openness and to general comments, critique, and discussion. It is unfinished at this time. If you leave a substantive comment, have suggestions, etc, please leave your name in the comments so you can be credited with your contribution.
In order to make this fair (i.e. not “cheating” by eliciting comments) and to encourage others to collaborate without reservation, I am forgoing my portion of any prize money that may be received from this essay (save for a reasonable time reimbursement for working time on this). Any contributors and collaborators have a say in which charities the remaining portion of the prize money will go. The goal is simply to make the best essay possible to address these issues by encouraging many contributors, regardless of winning or losing a prize money. The important part is making change.
General request for comments: If you suggest an addition, please also suggest 2x that amount I should cut or trim from somewhere else.
Good entry. I am worried that the bite of the quantitative conclusions is coming from the normal distribution assumption—that the true cost-effectiveness of programs is normally distributed around 1. This creates an enormous mass of programs around 1 with a small difference between them, so that measurement error can have a substantial influence.
However, this assumption is implausible.
it implies that GiveDirectly is the mean of the cost-effectiveness distribution—we know GiveDirectly is one of the best giving opportunities, and only looks bad compared to the very best.
Cost-effectiveness is generated by the ratio of two random variables (benefits/costs), and the ratio of random variables tends to follow a heavy-tailed distribution.
Why does this matter? Because with heavy-tailed distributions, the gap between true cost-effectiveness of Program X vs Program Y can be much larger than if you assume normal distribution. For example, you do the exercise of simulating 2000 programs from a normal with a mean cost-effectiveness of 1 and a SD of 1.5. Then the difference between the 95th and the 99th percentile is 3.47 vs 4.48, a difference of 1 unit.
However, let’s say we used the same parameters for a log-normal distribution. Then the difference between the 95th and 99th percentile is 20.38 vs 46.95, which is a whopping 26 units of difference!This is an unreasonable distribution, so instead let’s think about a more conservative lognormal distribution that maybe fits the data better: mean = 0.01, sd = 1.5. Then the difference between the 95th and 99th percentiles is 7.5 vs 17.5, which is still an enormous 10 units of difference.In the first case, measurement error can plausibly make the worse program look like it is better. In the second case, the gap between true cost-effectiveness is so large that normally-distributed measurement error can’t possibly drive those differences.
So while it’s conceptually true that measurement error biases us to pick uncertain interventions over certain interventions, the actual harm that could cause is probably small if true cost-effectiveness follows a heavy-tailed distribution.
Edit: this point is made more formally and demonstrated in this paper.
Hi Karthik, Thanks! The selection of distributions is really important here. To be clear, the demonstration section is using normal distibutions for convenience and ease of understanding.
The actual implemented PSA, however, has no such restriction. In the mini PSA exercise, you can see that the resulting uncertainty distributions are not distributed normally, and are fat tailed as you say. In the full PSA, if implemented, I would strongly suspect that that resulting distributions will mostly be very fat tailed, and highly skewed rightward. The decision rules I suggested specifically combat this due to the decision rules not being sensitive to how heavy the right tail is, but very sensitive to the left tail.
I will see if I can add a note to clarify this, thanks!
I like this sort of initiative, thanks for sharing your draft!
To encourage more reviews, you may want to briefly summarise your claims on your post, and note that:
Good idea, thanks! I’ll edit it in in a moment.