[WIP, not comprehensive] Collection of existing material on ‘impact being heavy-tailed’
Conceptual foundations
Newman (2005) provides a good introduction to power laws, and reviews several mechanisms generating them, including: combinations of exponentials; inverses of quantities; random walks; the Yule process [also known as preferential attachment]; phase transitions and critical phenomena; self-organized criticality.
Clauset et al. (2009[2007]) explain why it is very difficult to empirically distinguish power laws from other heavy-tailed distributions (e.g. log-normal). In particular, seeing a roughly straight line in a log-log plot is not sufficient to identify a power law, despite such inferences being popular in the literature. Referring to power-law claims by others, they find that “the distributions for birds, books, cities, religions, wars, citations, papers, proteins, and terrorism are plausible power laws, but they are also plausible log-normals and stretched exponentials.” (p. 26) [NB on citations Golosovsky & Solomon, 2012, claim to settle the question in favor of a power law—and in fact an extreme tail even heavier than that -, and they are clearly aware of the issues pointed out by Clauset et al. On the other hand, Brzezinski, 2014, using a larger data set, seems to confirm the results of Clauset et al., finding that a pure power law is the single best fit only for physics & astronomy papers, while in all other disciplines we either can’t empirically distinguish between several heavy-tailed distributions or a power law can be statistically rejected.]
Lyon (2014) argues that, unlike commonly believed, the Central Limit Theorem cannot explain why normal distributions are so common. (The critique also applies to the analog explanation for the log-normal distribution, an example of a heavy-tailed distribution.) Instead, an appeal to the principle of maximum entropy is suggested.
Recorded GWWC donations are extremely heavy-tailed. From here:
Less than 1% of our donors account for 50% of our recorded donations. This amounts to dozens of people, while the next 40% of donations (from both pledge donors and non-pledge donors) is distributed among hundreds. This suggests that most of our impact comes from a small-to-medium-size group of large donors (rather than from a very small group of very large donors, or from a large group of small donors).[6]
Discussion on this Shortform between me, Buck, and others, on whether “longtermism is bottlenecked by ‘great people’”.
Global health
Toby Ord’s classic paper The Moral Imperative toward Cost-Effectiveness in Global Health, in particular his observation that “According to the DCP2 data, if we funded all of these [health] interventions equally, 80 percent of the benefits would be produced by the top 20 percent of the interventions.” (p. 3)
Will MacAskill, in an interview by Lynette Bye, on the distribution of impact across work time for a fixed individual: “Maybe, let’s say the first three hours are like two thirds the value of the whole eight-hour day. And then, especially if I’m working six days a week, I’m not convinced the difference between eight and ten hours is actually adding anything in the long term.”
I did some initial research/thinking on this before the pandemic came and distracted me completely. Here’s a very broad outline that might be helpful.
You’re probably aware of this, but Anders Sandberg has done some thinking about this. Also presumably David Roodman based on his public writings (though I have not contacted him myself).
More broadly, I’m guessing that anybody who either you’ve referenced above, or who I’ve linked in my doc, would be helpful, though of course many of them are very busy.
[Mathematical definitions of heavy-tailedness. Currently mostly notes to myself—I might turn these into a more accessible post in the future. None of this is original, and might indeed be routine for a maths undergraduate specializing in statistics.]
There are different definitions of when a probability distribution is said to have a heavy tail, and several closely related terms. They are not extensionally equivalent. I.e. there are distributions that are heavy-tailed according to some, but not all common definitions; this is for example true for the log-normal distribution.
Here I’ll collect all definitions I encounter, and what I know about how they relate to each other.
I don’t think the differences matter for most EA purposes, where the weakest definition that includes e.g. log-normals seems safe to use (except maybe #0 below, which might be too weak). I’m mainly collecting the definitions because I’m curious and because I think they can be an avoidable source of confusion for someone trying to understand discussions involving heavy-tailedness. (The differences might matter for more technical purposes, e.g. when deciding which statistical method to use to analyze certain data.)
There is also a less interesting way in which definitions can differ: a distribution can have a heavy right tail, a heavy left tail, or both. Some definitions thus come in three variants. I’m for now going to ignore this, stating only one variant per definition.
List of definitions
X will always denote a random variable.
0. X is leptokurtic (or super-Gaussian) iff its kurtosisis strictly larger than 3 (which is the kurtosis of e.g. all normal distributions), i.e. µ_4/σ^4 > 3, where µ_4 = E[(X—E[X])^4] is the fourth central moment and σ is the standard deviation.
1. X has a heavy right tail iff the moment-generating function of X is infinite at all t > 0.
2. X is heavy-tailed iff it has an infinite nth moment for some n.
3. X is heavy-tailed iff it has infinite variance (i.e. infinite 2nd central moment).
4. X has a long right tail iff for all real numbers t the conditional probability P[X > x + t | X > x] converges to 1 as x goes to infinity.
4b. X has a heavy right tail iff there is a real number x_0 such that the conditional mean exceedance (CME) E[X—x | X > x] is a strictly increasing function of x for x > x_0. (This is a definition by Bryson, 1974, who may have coined the term ‘heavy-tailed’ and shows that distributions with constant CME are precisely the exponential distributions.)
5. X is subexponential (or fulfills the catastrophe principle)iff for all n > 0 and i.i.d. random variables X_1, …, X_n with the same distribution as X the quotients of probabilities P[X_1 + … + X_n > x] / P[max(X_1, …, X_n)] converges to 1 as x goes to infinity.
6. X has a regularly varying right tail with tail index 0 < α ≤ 2 iff there is a slowly varying function L: (0,+∞) → (0,+∞) such that for all x > 0 we have P[X > x] = x^(-α) * L(x). (L is slowly varying iff, for all a > 0, the quotient L(ax)/L(x) converges to 1 as x goes to infinity.)
Relationships between definitions
(Note that even for those I state without caveats I haven’t convinced myself of a proof in detail.)
I’ll use #0 to refer to the clause on the right hand side of the “iff” statement in definition 0, and so on.
(For some of these one might have to use the suitable versions of heavy right tail / left tail etc. - e.g. perhaps #1 needs to be replaced with “heavy right and left tail” or “heavy right or left tail” etc.)
I suspect that #0 is the weakest condition, i.e. that all other definitions imply that X is super-Gaussian.
I suspect that #6 is the strongest condition, i.e. implies all others.
I think that: #3 ⇒ #2 ⇒ #1 and #5 ⇒ #4 ⇒ #1 (where ‘=>’ denotes implications).
Why I think that:
#0 weakest: Heuristically, many other definitions state or imply that some higher moments don’t exist, or are at least “close” to such a condition (e.g. #1). By contrast, #0 merely requires that a certain moment is larger than for the normal distribution. Also, the exponential distribution is super-Gaussian but not usually considered to be heavy-tailed—in fact, “heavy-tailed” is sometimes loosely explained to mean “having heavier tails than an exponential distribution”.
#6 strongest: The condition basically says that the distribution behaves like a Pareto distribution (or “power law”) as we look further down the tail. And for Pareto distributions with α ≤ 2 it’s well known and easy to see that the variance doesn’t exist, i.e. #3 holds. Similarly, I’ve seen power laws being cited as examples of distributions fulfilling the catastrophe principle, i.e. #5.
#3 ⇒ #2 is obvious.
#2 ⇒ #1: A statement very close to the contrapositive is well known: if the moment-generating function exists in an open neighborhood around some value, then the nth moments about that value are given by the nth derivative of the moment-generating function at that value. (I’m not sure if there can be weird cases where the moment-generating function exists in some points but no open interval.)
This is a good link-list. It seems undiscoverable here though. I think thinking on how you can make such lists discoverable is useful. Making it a top-level post seems an obvious improvement.
Thanks for the suggestion. I plan to make this list more discoverable once I feel like it’s reasonably complete, e.g. by turning it into its own top-level post or appending it to a top-level post writeup of my research on this topic.
[See this research proposal for context. I’d appreciate pointers to other material.]
[WIP, not comprehensive] Collection of existing material on ‘impact being heavy-tailed’
Conceptual foundations
Newman (2005) provides a good introduction to power laws, and reviews several mechanisms generating them, including: combinations of exponentials; inverses of quantities; random walks; the Yule process [also known as preferential attachment]; phase transitions and critical phenomena; self-organized criticality.
Terence Tao, in Benford’s law, Zipf’s law, and the Pareto distribution, offers a partial explanation for why heavy-tailed distributions are so common empirically.
Clauset et al. (2009[2007]) explain why it is very difficult to empirically distinguish power laws from other heavy-tailed distributions (e.g. log-normal). In particular, seeing a roughly straight line in a log-log plot is not sufficient to identify a power law, despite such inferences being popular in the literature. Referring to power-law claims by others, they find that “the distributions for birds, books, cities, religions, wars, citations, papers, proteins, and terrorism are plausible power laws, but they are also plausible log-normals and stretched exponentials.” (p. 26) [NB on citations Golosovsky & Solomon, 2012, claim to settle the question in favor of a power law—and in fact an extreme tail even heavier than that -, and they are clearly aware of the issues pointed out by Clauset et al. On the other hand, Brzezinski, 2014, using a larger data set, seems to confirm the results of Clauset et al., finding that a pure power law is the single best fit only for physics & astronomy papers, while in all other disciplines we either can’t empirically distinguish between several heavy-tailed distributions or a power law can be statistically rejected.]
Lyon (2014) argues that, unlike commonly believed, the Central Limit Theorem cannot explain why normal distributions are so common. (The critique also applies to the analog explanation for the log-normal distribution, an example of a heavy-tailed distribution.) Instead, an appeal to the principle of maximum entropy is suggested.
Impact in general / cause-agnostic
Kokotajlo & Oprea (2020) argue that there also is a heavy left tail of harmful interventions.
Tobias Baumann, Is most expected suffering due to worst-case scenarios?
Open Phil’s approach of hits-based giving, in which they “expect the very few best projects to account for much (or most) of our impact”.
Owen Cotton-Barratt’s talk Prospecting for gold, section Heavy-tailed distributions
Brian Tomasik, Why Charities Usually Don’t Differ Astronomically in Expected Cost-Effectiveness
Discussion between Ben Pace and Richard Ngo on “how valuable it is for people to go into AI policy and strategy work”—but one of Ben’s premises is that impact generally has been heavy-tailed since the Industrial Revolution.
Recorded GWWC donations are extremely heavy-tailed. From here:
EA community building
CEA models of community building, especially the discussion of How much might these factors vary? in the ‘three-factor model’.
CEA’s current thinking [no longer endorsed by CEA], particularly Talent is high variance
Discussion between Ben Pace and Jan Kulveit on “whether you should expect to be able to identify individuals who will most shape the long term future of humanity”.
Discussion on this Shortform between me, Buck, and others, on whether “longtermism is bottlenecked by ‘great people’”.
Global health
Toby Ord’s classic paper The Moral Imperative toward Cost-Effectiveness in Global Health, in particular his observation that “According to the DCP2 data, if we funded all of these [health] interventions equally, 80 percent of the benefits would be produced by the top 20 percent of the interventions.” (p. 3)
Jeff Kaufman, The Unintuitive Power Laws of Giving (mostly based on the same DCP2 data)
Agenty Duck, Eva Vivalt Did Not Show QALYs/$ of Interventions Follow a Gaussian Curve
Jeff Kaufman, Effectiveness: Gaussian? (pushing back against the same claim by Robin Hanson as the Agenty Duck blog post above)
Misc
Will MacAskill, in an interview by Lynette Bye, on the distribution of impact across work time for a fixed individual: “Maybe, let’s say the first three hours are like two thirds the value of the whole eight-hour day. And then, especially if I’m working six days a week, I’m not convinced the difference between eight and ten hours is actually adding anything in the long term.”
I did some initial research/thinking on this before the pandemic came and distracted me completely. Here’s a very broad outline that might be helpful.
Great, thank you!
I saw that you asked Howie for input—are there other people you think it would be good to talk to on this topic?
You’re probably aware of this, but Anders Sandberg has done some thinking about this. Also presumably David Roodman based on his public writings (though I have not contacted him myself).
More broadly, I’m guessing that anybody who either you’ve referenced above, or who I’ve linked in my doc, would be helpful, though of course many of them are very busy.
[Mathematical definitions of heavy-tailedness. Currently mostly notes to myself—I might turn these into a more accessible post in the future. None of this is original, and might indeed be routine for a maths undergraduate specializing in statistics.]
There are different definitions of when a probability distribution is said to have a heavy tail, and several closely related terms. They are not extensionally equivalent. I.e. there are distributions that are heavy-tailed according to some, but not all common definitions; this is for example true for the log-normal distribution.
Here I’ll collect all definitions I encounter, and what I know about how they relate to each other.
I don’t think the differences matter for most EA purposes, where the weakest definition that includes e.g. log-normals seems safe to use (except maybe #0 below, which might be too weak). I’m mainly collecting the definitions because I’m curious and because I think they can be an avoidable source of confusion for someone trying to understand discussions involving heavy-tailedness. (The differences might matter for more technical purposes, e.g. when deciding which statistical method to use to analyze certain data.)
There is also a less interesting way in which definitions can differ: a distribution can have a heavy right tail, a heavy left tail, or both. Some definitions thus come in three variants. I’m for now going to ignore this, stating only one variant per definition.
List of definitions
X will always denote a random variable.
0. X is leptokurtic (or super-Gaussian) iff its kurtosis is strictly larger than 3 (which is the kurtosis of e.g. all normal distributions), i.e. µ_4/σ^4 > 3, where µ_4 = E[(X—E[X])^4] is the fourth central moment and σ is the standard deviation.
1. X has a heavy right tail iff the moment-generating function of X is infinite at all t > 0.
2. X is heavy-tailed iff it has an infinite nth moment for some n.
3. X is heavy-tailed iff it has infinite variance (i.e. infinite 2nd central moment).
4. X has a long right tail iff for all real numbers t the conditional probability P[X > x + t | X > x] converges to 1 as x goes to infinity.
4b. X has a heavy right tail iff there is a real number x_0 such that the conditional mean exceedance (CME) E[X—x | X > x] is a strictly increasing function of x for x > x_0. (This is a definition by Bryson, 1974, who may have coined the term ‘heavy-tailed’ and shows that distributions with constant CME are precisely the exponential distributions.)
5. X is subexponential (or fulfills the catastrophe principle) iff for all n > 0 and i.i.d. random variables X_1, …, X_n with the same distribution as X the quotients of probabilities P[X_1 + … + X_n > x] / P[max(X_1, …, X_n)] converges to 1 as x goes to infinity.
6. X has a regularly varying right tail with tail index 0 < α ≤ 2 iff there is a slowly varying function L: (0,+∞) → (0,+∞) such that for all x > 0 we have P[X > x] = x^(-α) * L(x). (L is slowly varying iff, for all a > 0, the quotient L(ax)/L(x) converges to 1 as x goes to infinity.)
Relationships between definitions
(Note that even for those I state without caveats I haven’t convinced myself of a proof in detail.)
I’ll use #0 to refer to the clause on the right hand side of the “iff” statement in definition 0, and so on.
(For some of these one might have to use the suitable versions of heavy right tail / left tail etc. - e.g. perhaps #1 needs to be replaced with “heavy right and left tail” or “heavy right or left tail” etc.)
I suspect that #0 is the weakest condition, i.e. that all other definitions imply that X is super-Gaussian.
I suspect that #6 is the strongest condition, i.e. implies all others.
I think that: #3 ⇒ #2 ⇒ #1 and #5 ⇒ #4 ⇒ #1 (where ‘=>’ denotes implications).
Why I think that:
#0 weakest: Heuristically, many other definitions state or imply that some higher moments don’t exist, or are at least “close” to such a condition (e.g. #1). By contrast, #0 merely requires that a certain moment is larger than for the normal distribution. Also, the exponential distribution is super-Gaussian but not usually considered to be heavy-tailed—in fact, “heavy-tailed” is sometimes loosely explained to mean “having heavier tails than an exponential distribution”.
#6 strongest: The condition basically says that the distribution behaves like a Pareto distribution (or “power law”) as we look further down the tail. And for Pareto distributions with α ≤ 2 it’s well known and easy to see that the variance doesn’t exist, i.e. #3 holds. Similarly, I’ve seen power laws being cited as examples of distributions fulfilling the catastrophe principle, i.e. #5.
#3 ⇒ #2 is obvious.
#2 ⇒ #1: A statement very close to the contrapositive is well known: if the moment-generating function exists in an open neighborhood around some value, then the nth moments about that value are given by the nth derivative of the moment-generating function at that value. (I’m not sure if there can be weird cases where the moment-generating function exists in some points but no open interval.)
#5 ⇒ #4 and #4 ⇒ #1 are stated on Wikipedia.
This is a good link-list. It seems undiscoverable here though. I think thinking on how you can make such lists discoverable is useful. Making it a top-level post seems an obvious improvement.
Thanks for the suggestion. I plan to make this list more discoverable once I feel like it’s reasonably complete, e.g. by turning it into its own top-level post or appending it to a top-level post writeup of my research on this topic.