I like this kind of observation. :) I think the claim that, broadly speaking, the few most cost-effective interventions are much, much more cost-effective than the bulk of ‘typical’ interventions—also in contexts other than global health—is an important and sometimes underappreciated ‘foundational assumption’ of EA. See also my notes here.
One question I have: why should I believe that the distribution you’re describing “is” an exponential distribution? Indeed, in your introductory paragraph (“However, what also struck me …”) one could swap out ‘exponential distribution’ with ‘log-normal distribution’ or ‘Pareto distribution’/‘power law’ everywhere, and it would still be true! :)
In fact, I think a more typical EA reaction to the DCP2 and other data is: look, a log-normal distribution! Or even: look, a power law!
These distributions would suggest even stronger returns to identifying top interventions: as you say, due to the memoryless property, an exponential distribution says that, no matter the ‘cutoff’, shifting from the median to the 75th percentile is as good as shifting from the 75th percentile to the 87.5th one. But for a log-normal distribution the second shift would have even larger returns than the first one. And that difference would be even more pronounced for a power law. (Or at least I think so—I haven’t actually done the maths.)
For data that looks reasonably ‘heavy-tailed’, we can get a good fit using a variety of different distributions: exponential, Weibull, log-normal, Pareto/power law, …
We need to do somewhat sophisticated statistics to ‘distinguish’ between these distributions—and often such tests will simply be inconclusive based on ‘brute empiricism’/curve fitting alone.
Therefore, in the absence of a mechanistic theory of the causal origins of the phenomenon we’re examining (from which we might be able to derive a particular distribution), it is actually somewhat unclear what we mean when we say “X has an exponential distribution”. Often we could just as well say “X looks like a log-normal” etc. If it’s just about describing data we’ve seen we might therefore be better off saying something like “this looks quite heavy-tailed”. Claims involving specific distributions have the downside that they’re claims both about the data we’ve seen and about what we should expect when we extrapolate beyond the range of observed data; but often we can’t know based on observations how to extrapolate (since it is precisely here where the differences between e.g. exponentials and log-normals becomes highly relevant). And so we shouldn’t imply that we can.
(Some related discussion in this appendix of the research Ben Todd and I discuss here.)
Thanks Max, you make a good point about differentiating between exponentials, Paretos, and log-normals. It does seem like log-normals are the norm when it comes to these skewed distributions, especially with things like world income. Still, keeping an open mind as to which of the skewed distributions best fits the data can hopefully make these models more robust.
You mentioned the challenges of distinguishing between these heavy-tailed distributions, and I would only add that this challenge increases when viewing these outcomes as intervals rather than points. For the sake of creating graphable data here, I only used the midpoint of the ranges listed on DCP3, but some of the intervals did span orders of magnitudes.
Finally, I’m not sure exactly what you mean about the complications of interpreting exponential distributions beyond some cutoff, but if the question was about applying the memoryless property to exponentials (and the tail only depending on the rate parameter), there’s a short derivation on Wikipedia. Again, not sure if that’s what you were getting at, but maybe it’ll clear things up.
I like this kind of observation. :) I think the claim that, broadly speaking, the few most cost-effective interventions are much, much more cost-effective than the bulk of ‘typical’ interventions—also in contexts other than global health—is an important and sometimes underappreciated ‘foundational assumption’ of EA. See also my notes here.
One question I have: why should I believe that the distribution you’re describing “is” an exponential distribution? Indeed, in your introductory paragraph (“However, what also struck me …”) one could swap out ‘exponential distribution’ with ‘log-normal distribution’ or ‘Pareto distribution’/‘power law’ everywhere, and it would still be true! :)
In fact, I think a more typical EA reaction to the DCP2 and other data is: look, a log-normal distribution! Or even: look, a power law!
These distributions would suggest even stronger returns to identifying top interventions: as you say, due to the memoryless property, an exponential distribution says that, no matter the ‘cutoff’, shifting from the median to the 75th percentile is as good as shifting from the 75th percentile to the 87.5th one. But for a log-normal distribution the second shift would have even larger returns than the first one. And that difference would be even more pronounced for a power law. (Or at least I think so—I haven’t actually done the maths.)
(There are also some subtle complications on how to interpret exponential distributions when modeling only the ‘tail’ of some distribution beyond some positive cutoff. - Again unless I messed up the maths, which I wasn’t super careful about …)
FWIW, my current impression is:
For data that looks reasonably ‘heavy-tailed’, we can get a good fit using a variety of different distributions: exponential, Weibull, log-normal, Pareto/power law, …
We need to do somewhat sophisticated statistics to ‘distinguish’ between these distributions—and often such tests will simply be inconclusive based on ‘brute empiricism’/curve fitting alone.
Therefore, in the absence of a mechanistic theory of the causal origins of the phenomenon we’re examining (from which we might be able to derive a particular distribution), it is actually somewhat unclear what we mean when we say “X has an exponential distribution”. Often we could just as well say “X looks like a log-normal” etc. If it’s just about describing data we’ve seen we might therefore be better off saying something like “this looks quite heavy-tailed”. Claims involving specific distributions have the downside that they’re claims both about the data we’ve seen and about what we should expect when we extrapolate beyond the range of observed data; but often we can’t know based on observations how to extrapolate (since it is precisely here where the differences between e.g. exponentials and log-normals becomes highly relevant). And so we shouldn’t imply that we can.
(Some related discussion in this appendix of the research Ben Todd and I discuss here.)
Thanks Max, you make a good point about differentiating between exponentials, Paretos, and log-normals. It does seem like log-normals are the norm when it comes to these skewed distributions, especially with things like world income. Still, keeping an open mind as to which of the skewed distributions best fits the data can hopefully make these models more robust.
You mentioned the challenges of distinguishing between these heavy-tailed distributions, and I would only add that this challenge increases when viewing these outcomes as intervals rather than points. For the sake of creating graphable data here, I only used the midpoint of the ranges listed on DCP3, but some of the intervals did span orders of magnitudes.
Finally, I’m not sure exactly what you mean about the complications of interpreting exponential distributions beyond some cutoff, but if the question was about applying the memoryless property to exponentials (and the tail only depending on the rate parameter), there’s a short derivation on Wikipedia. Again, not sure if that’s what you were getting at, but maybe it’ll clear things up.
Thanks for the comments!