I like this kind of observation. :) I think the claim that, broadly speaking, the few most cost-effective interventions are much, much more cost-effective than the bulk of ‘typical’ interventions—also in contexts other than global health—is an important and sometimes underappreciated ‘foundational assumption’ of EA. See also my notes here.
One question I have: why should I believe that the distribution you’re describing “is” an exponential distribution? Indeed, in your introductory paragraph (“However, what also struck me …”) one could swap out ‘exponential distribution’ with ‘log-normal distribution’ or ‘Pareto distribution’/‘power law’ everywhere, and it would still be true! :)
In fact, I think a more typical EA reaction to the DCP2 and other data is: look, a log-normal distribution! Or even: look, a power law!
These distributions would suggest even stronger returns to identifying top interventions: as you say, due to the memoryless property, an exponential distribution says that, no matter the ‘cutoff’, shifting from the median to the 75th percentile is as good as shifting from the 75th percentile to the 87.5th one. But for a log-normal distribution the second shift would have even larger returns than the first one. And that difference would be even more pronounced for a power law. (Or at least I think so—I haven’t actually done the maths.)
For data that looks reasonably ‘heavy-tailed’, we can get a good fit using a variety of different distributions: exponential, Weibull, log-normal, Pareto/power law, …
We need to do somewhat sophisticated statistics to ‘distinguish’ between these distributions—and often such tests will simply be inconclusive based on ‘brute empiricism’/curve fitting alone.
Therefore, in the absence of a mechanistic theory of the causal origins of the phenomenon we’re examining (from which we might be able to derive a particular distribution), it is actually somewhat unclear what we mean when we say “X has an exponential distribution”. Often we could just as well say “X looks like a log-normal” etc. If it’s just about describing data we’ve seen we might therefore be better off saying something like “this looks quite heavy-tailed”. Claims involving specific distributions have the downside that they’re claims both about the data we’ve seen and about what we should expect when we extrapolate beyond the range of observed data; but often we can’t know based on observations how to extrapolate (since it is precisely here where the differences between e.g. exponentials and log-normals becomes highly relevant). And so we shouldn’t imply that we can.
(Some related discussion in this appendix of the research Ben Todd and I discuss here.)
Thanks Max, you make a good point about differentiating between exponentials, Paretos, and log-normals. It does seem like log-normals are the norm when it comes to these skewed distributions, especially with things like world income. Still, keeping an open mind as to which of the skewed distributions best fits the data can hopefully make these models more robust.
You mentioned the challenges of distinguishing between these heavy-tailed distributions, and I would only add that this challenge increases when viewing these outcomes as intervals rather than points. For the sake of creating graphable data here, I only used the midpoint of the ranges listed on DCP3, but some of the intervals did span orders of magnitudes.
Finally, I’m not sure exactly what you mean about the complications of interpreting exponential distributions beyond some cutoff, but if the question was about applying the memoryless property to exponentials (and the tail only depending on the rate parameter), there’s a short derivation on Wikipedia. Again, not sure if that’s what you were getting at, but maybe it’ll clear things up.
Good observation, and I would add one other implication: this distribution of returns supports “hits-based evaluation” (in the spirit of hits-based giving) for interventions in global health and development. For example, consider two strategies for spending a $10 million on RCTs to evaluate some intervention:
we can evaluate 10 interventions for $1 million each, developing large samples with high statistical power.
we can evaluate 100 interventions for $100,000 each, developing smaller samples with lower statistical power.
The traditional scientific/economic approach favors 1), since it values precisely estimating the effects of an intervention. But we care about identifying the most effective interventions—and if the returns follow an exponential distribution, the biggest effects are an order of magnitude larger than the average effects, so they will be detected even with low statistical power. Instead, we would rather maximize coverage of interventions to maximize our probably of evaluating those high-impact interventions. If the casualty is that 60 interventions with modest effects go undetected, so be it.
I like this kind of observation. :) I think the claim that, broadly speaking, the few most cost-effective interventions are much, much more cost-effective than the bulk of ‘typical’ interventions—also in contexts other than global health—is an important and sometimes underappreciated ‘foundational assumption’ of EA. See also my notes here.
One question I have: why should I believe that the distribution you’re describing “is” an exponential distribution? Indeed, in your introductory paragraph (“However, what also struck me …”) one could swap out ‘exponential distribution’ with ‘log-normal distribution’ or ‘Pareto distribution’/‘power law’ everywhere, and it would still be true! :)
In fact, I think a more typical EA reaction to the DCP2 and other data is: look, a log-normal distribution! Or even: look, a power law!
These distributions would suggest even stronger returns to identifying top interventions: as you say, due to the memoryless property, an exponential distribution says that, no matter the ‘cutoff’, shifting from the median to the 75th percentile is as good as shifting from the 75th percentile to the 87.5th one. But for a log-normal distribution the second shift would have even larger returns than the first one. And that difference would be even more pronounced for a power law. (Or at least I think so—I haven’t actually done the maths.)
(There are also some subtle complications on how to interpret exponential distributions when modeling only the ‘tail’ of some distribution beyond some positive cutoff. - Again unless I messed up the maths, which I wasn’t super careful about …)
FWIW, my current impression is:
For data that looks reasonably ‘heavy-tailed’, we can get a good fit using a variety of different distributions: exponential, Weibull, log-normal, Pareto/power law, …
We need to do somewhat sophisticated statistics to ‘distinguish’ between these distributions—and often such tests will simply be inconclusive based on ‘brute empiricism’/curve fitting alone.
Therefore, in the absence of a mechanistic theory of the causal origins of the phenomenon we’re examining (from which we might be able to derive a particular distribution), it is actually somewhat unclear what we mean when we say “X has an exponential distribution”. Often we could just as well say “X looks like a log-normal” etc. If it’s just about describing data we’ve seen we might therefore be better off saying something like “this looks quite heavy-tailed”. Claims involving specific distributions have the downside that they’re claims both about the data we’ve seen and about what we should expect when we extrapolate beyond the range of observed data; but often we can’t know based on observations how to extrapolate (since it is precisely here where the differences between e.g. exponentials and log-normals becomes highly relevant). And so we shouldn’t imply that we can.
(Some related discussion in this appendix of the research Ben Todd and I discuss here.)
Thanks Max, you make a good point about differentiating between exponentials, Paretos, and log-normals. It does seem like log-normals are the norm when it comes to these skewed distributions, especially with things like world income. Still, keeping an open mind as to which of the skewed distributions best fits the data can hopefully make these models more robust.
You mentioned the challenges of distinguishing between these heavy-tailed distributions, and I would only add that this challenge increases when viewing these outcomes as intervals rather than points. For the sake of creating graphable data here, I only used the midpoint of the ranges listed on DCP3, but some of the intervals did span orders of magnitudes.
Finally, I’m not sure exactly what you mean about the complications of interpreting exponential distributions beyond some cutoff, but if the question was about applying the memoryless property to exponentials (and the tail only depending on the rate parameter), there’s a short derivation on Wikipedia. Again, not sure if that’s what you were getting at, but maybe it’ll clear things up.
Thanks for the comments!
Good observation, and I would add one other implication: this distribution of returns supports “hits-based evaluation” (in the spirit of hits-based giving) for interventions in global health and development. For example, consider two strategies for spending a $10 million on RCTs to evaluate some intervention:
we can evaluate 10 interventions for $1 million each, developing large samples with high statistical power.
we can evaluate 100 interventions for $100,000 each, developing smaller samples with lower statistical power.
The traditional scientific/economic approach favors 1), since it values precisely estimating the effects of an intervention. But we care about identifying the most effective interventions—and if the returns follow an exponential distribution, the biggest effects are an order of magnitude larger than the average effects, so they will be detected even with low statistical power. Instead, we would rather maximize coverage of interventions to maximize our probably of evaluating those high-impact interventions. If the casualty is that 60 interventions with modest effects go undetected, so be it.
Really interesting, thanks for pointing this out!