Good entry. I am worried that the bite of the quantitative conclusions is coming from the normal distribution assumption—that the true cost-effectiveness of programs is normally distributed around 1. This creates an enormous mass of programs around 1 with a small difference between them, so that measurement error can have a substantial influence.
However, this assumption is implausible.
it implies that GiveDirectly is the mean of the cost-effectiveness distribution—we know GiveDirectly is one of the best giving opportunities, and only looks bad compared to the very best.
Cost-effectiveness is generated by the ratio of two random variables (benefits/costs), and the ratio of random variables tends to follow a heavy-tailed distribution.
Why does this matter? Because with heavy-tailed distributions, the gap between true cost-effectiveness of Program X vs Program Y can be much larger than if you assume normal distribution. For example, you do the exercise of simulating 2000 programs from a normal with a mean cost-effectiveness of 1 and a SD of 1.5. Then the difference between the 95th and the 99th percentile is 3.47 vs 4.48, a difference of 1 unit. However, let’s say we used the same parameters for a log-normal distribution. Then the difference between the 95th and 99th percentile is 20.38 vs 46.95, which is a whopping 26 units of difference! This is an unreasonable distribution, so instead let’s think about a more conservative lognormal distribution that maybe fits the data better: mean = 0.01, sd = 1.5. Then the difference between the 95th and 99th percentiles is 7.5 vs 17.5, which is still an enormous 10 units of difference.
In the first case, measurement error can plausibly make the worse program look like it is better. In the second case, the gap between true cost-effectiveness is so large that normally-distributed measurement error can’t possibly drive those differences.
So while it’s conceptually true that measurement error biases us to pick uncertain interventions over certain interventions, the actual harm that could cause is probably small if true cost-effectiveness follows a heavy-tailed distribution.
Edit: this point is made more formally and demonstrated in this paper.
Hi Karthik, Thanks! The selection of distributions is really important here. To be clear, the demonstration section is using normal distibutions for convenience and ease of understanding.
The actual implemented PSA, however, has no such restriction. In the mini PSA exercise, you can see that the resulting uncertainty distributions are not distributed normally, and are fat tailed as you say. In the full PSA, if implemented, I would strongly suspect that that resulting distributions will mostly be very fat tailed, and highly skewed rightward. The decision rules I suggested specifically combat this due to the decision rules not being sensitive to how heavy the right tail is, but very sensitive to the left tail.
I will see if I can add a note to clarify this, thanks!
Good entry. I am worried that the bite of the quantitative conclusions is coming from the normal distribution assumption—that the true cost-effectiveness of programs is normally distributed around 1. This creates an enormous mass of programs around 1 with a small difference between them, so that measurement error can have a substantial influence.
However, this assumption is implausible.
it implies that GiveDirectly is the mean of the cost-effectiveness distribution—we know GiveDirectly is one of the best giving opportunities, and only looks bad compared to the very best.
Cost-effectiveness is generated by the ratio of two random variables (benefits/costs), and the ratio of random variables tends to follow a heavy-tailed distribution.
Why does this matter? Because with heavy-tailed distributions, the gap between true cost-effectiveness of Program X vs Program Y can be much larger than if you assume normal distribution. For example, you do the exercise of simulating 2000 programs from a normal with a mean cost-effectiveness of 1 and a SD of 1.5. Then the difference between the 95th and the 99th percentile is 3.47 vs 4.48, a difference of 1 unit.
However, let’s say we used the same parameters for a log-normal distribution. Then the difference between the 95th and 99th percentile is 20.38 vs 46.95, which is a whopping 26 units of difference!This is an unreasonable distribution, so instead let’s think about a more conservative lognormal distribution that maybe fits the data better: mean = 0.01, sd = 1.5. Then the difference between the 95th and 99th percentiles is 7.5 vs 17.5, which is still an enormous 10 units of difference.In the first case, measurement error can plausibly make the worse program look like it is better. In the second case, the gap between true cost-effectiveness is so large that normally-distributed measurement error can’t possibly drive those differences.
So while it’s conceptually true that measurement error biases us to pick uncertain interventions over certain interventions, the actual harm that could cause is probably small if true cost-effectiveness follows a heavy-tailed distribution.
Edit: this point is made more formally and demonstrated in this paper.
Hi Karthik, Thanks! The selection of distributions is really important here. To be clear, the demonstration section is using normal distibutions for convenience and ease of understanding.
The actual implemented PSA, however, has no such restriction. In the mini PSA exercise, you can see that the resulting uncertainty distributions are not distributed normally, and are fat tailed as you say. In the full PSA, if implemented, I would strongly suspect that that resulting distributions will mostly be very fat tailed, and highly skewed rightward. The decision rules I suggested specifically combat this due to the decision rules not being sensitive to how heavy the right tail is, but very sensitive to the left tail.
I will see if I can add a note to clarify this, thanks!