Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Here is one speculation of what could explain the prevalence of heavy-tailed distributions in what you call ‘hierarchical classification schemes’:
Suppose that the causal mechanism generating the relevant classes has the property that new members are more likely to be added to a class the more members it already has (e.g. if people were more likely to move to a city the higher its population). Under some such conditions it’s then a mathematical consequence that the size distribution (i.e. number of members across classes) will converge to a power law.
This has been popularized as “preferential attachment” by the ‘network science’ people around Barabasi since the end of the 1990s to explain various observations on the Internet and other networks (though the related academic cottage industry has recently received some good criticism). But the maths and potential real-world cases are much older; indeed, the perhaps first major case for which this explanation was proposed was one of your examples—biological taxa -, by Yule in the 1920s.
For a survey of other generating mechanisms of power laws, see Newman (2005) and this blog post by Terence Tao.
More minor nitpick:
You point to skewness as the key mathematical concept. However, I think that most of what you discuss (e.g. “outliers which drive up the mean”) is better captured by kurtosis or other measures of ‘heavy-tailedness’ (and sometimes we may care more about the more familiar concept of variance).
In practice these will often be similar, so whether we look for e.g. “highly skewed” or “heavy-tailed” distributions often won’t make a difference.
However, conceptually they are quite different. Skewness is related to the third central moment, i.e. the third power of deviations from the mean. This means it largely measures whether we have more outliers toward the right or toward the left of the mean, and thus is a measure of ‘asymmetry’.
But arguably we often care more about how much weight outliers in either direction have compared to ‘typical’ values, or by how much future outliers might exceed the maximum value we’ve seen so far. I.e. ‘absolute’ measures of heavy-tailedness that are not defined by way of comparison to the tail on the other end of the distribution. Then we may be more interested in a measure of heavy-tailedness that is applied to just one tail of the distribution; or in kurtosis, which is related to the fourth central moment, i.e. the fourth square of deviations from the mean and thus a measure of how much weight outliers on either side have compared to typical values.
Concretely, skewness can be unintuitive for distributions that are multimodal (which can drive up skewness for a quite different reason than the distribution having a heavy or long tail on one side), or for distributions that have heavy/long tails on both sides but to different degrees.
Thanks for this post, I’m glad to see more attention on what I believe is an important topics (see also here).
I’m actually about to publish a related piece I wrote with Ben Todd specifically on whether the distribution of impact-related metrics across people is heavy-tailed. So this post is a nice complement. I particularly appreciate the specific data sets you point to.
For now I’m just going to make a couple of quick points, split between comments to allow for a more organized discussion.
--
One thing I would highly encourage you to do if you haven’t already is to take a look at Power-law distributions in empirical data by Clauset et al. In particular, as explained in this paper:
I’m not sure what exactly you did when saying “The (least-squares estimate of the) power parameter p equals 0,6 for the education interventions, 1,02 for the health interventions and 1,8 for the climate policies.”, but note that one needs to be careful here. In particular, doing a standard linear regression / least-squares fit in a log-log plot is generally not a valid way of estimating the power law parameters. You need to use something like maximum likelihood estimation.
More broadly, I think you put too much emphasis on power laws specifically, as opposed to just saying that many cost-effectiveness distributions look very heavy-tailed. I.e. I’d recommend describing the key observation in a way that’s more agnostic between power laws, lognormal, and other heavy-tailed distributions. It is very hard to empirically distinguish these distributions from one another. This doesn’t matter much if you just want to describe the data we’ve seen, but can make a large difference when extrapolating beyond the range of observed data.
I’m aware that you do mention that if something looks like an approximately straight line in a log-log plot it could be either a power law or a log normal. I just think this and related points would best be more prominent and also be visible in the high-level framing.
“Impartiality or cause-neutrality means that in order to be more effective, one should only look at the top level in the hierarchical classification, i.e. consider the whole world (instead of a specific country), all beings (instead of members from a specific species), and all diseases (instead of a specific type of diseases such as cancers).” That is why a theoretical and practical organization based on a global systematic approach is required for optimizing the alleviation of suffering in the world.