[Mathematical definitions of heavy-tailedness. Currently mostly notes to myself—I might turn these into a more accessible post in the future. None of this is original, and might indeed be routine for a maths undergraduate specializing in statistics.]
There are different definitions of when a probability distribution is said to have a heavy tail, and several closely related terms. They are not extensionally equivalent. I.e. there are distributions that are heavy-tailed according to some, but not all common definitions; this is for example true for the log-normal distribution.
Here I’ll collect all definitions I encounter, and what I know about how they relate to each other.
I don’t think the differences matter for most EA purposes, where the weakest definition that includes e.g. log-normals seems safe to use (except maybe #0 below, which might be too weak). I’m mainly collecting the definitions because I’m curious and because I think they can be an avoidable source of confusion for someone trying to understand discussions involving heavy-tailedness. (The differences might matter for more technical purposes, e.g. when deciding which statistical method to use to analyze certain data.)
There is also a less interesting way in which definitions can differ: a distribution can have a heavy right tail, a heavy left tail, or both. Some definitions thus come in three variants. I’m for now going to ignore this, stating only one variant per definition.
List of definitions
X will always denote a random variable.
0. X is leptokurtic (or super-Gaussian) iff its kurtosisis strictly larger than 3 (which is the kurtosis of e.g. all normal distributions), i.e. µ_4/σ^4 > 3, where µ_4 = E[(X—E[X])^4] is the fourth central moment and σ is the standard deviation.
1. X has a heavy right tail iff the moment-generating function of X is infinite at all t > 0.
2. X is heavy-tailed iff it has an infinite nth moment for some n.
3. X is heavy-tailed iff it has infinite variance (i.e. infinite 2nd central moment).
4. X has a long right tail iff for all real numbers t the conditional probability P[X > x + t | X > x] converges to 1 as x goes to infinity.
4b. X has a heavy right tail iff there is a real number x_0 such that the conditional mean exceedance (CME) E[X—x | X > x] is a strictly increasing function of x for x > x_0. (This is a definition by Bryson, 1974, who may have coined the term ‘heavy-tailed’ and shows that distributions with constant CME are precisely the exponential distributions.)
5. X is subexponential (or fulfills the catastrophe principle)iff for all n > 0 and i.i.d. random variables X_1, …, X_n with the same distribution as X the quotients of probabilities P[X_1 + … + X_n > x] / P[max(X_1, …, X_n)] converges to 1 as x goes to infinity.
6. X has a regularly varying right tail with tail index 0 < α ≤ 2 iff there is a slowly varying function L: (0,+∞) → (0,+∞) such that for all x > 0 we have P[X > x] = x^(-α) * L(x). (L is slowly varying iff, for all a > 0, the quotient L(ax)/L(x) converges to 1 as x goes to infinity.)
Relationships between definitions
(Note that even for those I state without caveats I haven’t convinced myself of a proof in detail.)
I’ll use #0 to refer to the clause on the right hand side of the “iff” statement in definition 0, and so on.
(For some of these one might have to use the suitable versions of heavy right tail / left tail etc. - e.g. perhaps #1 needs to be replaced with “heavy right and left tail” or “heavy right or left tail” etc.)
I suspect that #0 is the weakest condition, i.e. that all other definitions imply that X is super-Gaussian.
I suspect that #6 is the strongest condition, i.e. implies all others.
I think that: #3 ⇒ #2 ⇒ #1 and #5 ⇒ #4 ⇒ #1 (where ‘=>’ denotes implications).
Why I think that:
#0 weakest: Heuristically, many other definitions state or imply that some higher moments don’t exist, or are at least “close” to such a condition (e.g. #1). By contrast, #0 merely requires that a certain moment is larger than for the normal distribution. Also, the exponential distribution is super-Gaussian but not usually considered to be heavy-tailed—in fact, “heavy-tailed” is sometimes loosely explained to mean “having heavier tails than an exponential distribution”.
#6 strongest: The condition basically says that the distribution behaves like a Pareto distribution (or “power law”) as we look further down the tail. And for Pareto distributions with α ≤ 2 it’s well known and easy to see that the variance doesn’t exist, i.e. #3 holds. Similarly, I’ve seen power laws being cited as examples of distributions fulfilling the catastrophe principle, i.e. #5.
#3 ⇒ #2 is obvious.
#2 ⇒ #1: A statement very close to the contrapositive is well known: if the moment-generating function exists in an open neighborhood around some value, then the nth moments about that value are given by the nth derivative of the moment-generating function at that value. (I’m not sure if there can be weird cases where the moment-generating function exists in some points but no open interval.)
[Mathematical definitions of heavy-tailedness. Currently mostly notes to myself—I might turn these into a more accessible post in the future. None of this is original, and might indeed be routine for a maths undergraduate specializing in statistics.]
There are different definitions of when a probability distribution is said to have a heavy tail, and several closely related terms. They are not extensionally equivalent. I.e. there are distributions that are heavy-tailed according to some, but not all common definitions; this is for example true for the log-normal distribution.
Here I’ll collect all definitions I encounter, and what I know about how they relate to each other.
I don’t think the differences matter for most EA purposes, where the weakest definition that includes e.g. log-normals seems safe to use (except maybe #0 below, which might be too weak). I’m mainly collecting the definitions because I’m curious and because I think they can be an avoidable source of confusion for someone trying to understand discussions involving heavy-tailedness. (The differences might matter for more technical purposes, e.g. when deciding which statistical method to use to analyze certain data.)
There is also a less interesting way in which definitions can differ: a distribution can have a heavy right tail, a heavy left tail, or both. Some definitions thus come in three variants. I’m for now going to ignore this, stating only one variant per definition.
List of definitions
X will always denote a random variable.
0. X is leptokurtic (or super-Gaussian) iff its kurtosis is strictly larger than 3 (which is the kurtosis of e.g. all normal distributions), i.e. µ_4/σ^4 > 3, where µ_4 = E[(X—E[X])^4] is the fourth central moment and σ is the standard deviation.
1. X has a heavy right tail iff the moment-generating function of X is infinite at all t > 0.
2. X is heavy-tailed iff it has an infinite nth moment for some n.
3. X is heavy-tailed iff it has infinite variance (i.e. infinite 2nd central moment).
4. X has a long right tail iff for all real numbers t the conditional probability P[X > x + t | X > x] converges to 1 as x goes to infinity.
4b. X has a heavy right tail iff there is a real number x_0 such that the conditional mean exceedance (CME) E[X—x | X > x] is a strictly increasing function of x for x > x_0. (This is a definition by Bryson, 1974, who may have coined the term ‘heavy-tailed’ and shows that distributions with constant CME are precisely the exponential distributions.)
5. X is subexponential (or fulfills the catastrophe principle) iff for all n > 0 and i.i.d. random variables X_1, …, X_n with the same distribution as X the quotients of probabilities P[X_1 + … + X_n > x] / P[max(X_1, …, X_n)] converges to 1 as x goes to infinity.
6. X has a regularly varying right tail with tail index 0 < α ≤ 2 iff there is a slowly varying function L: (0,+∞) → (0,+∞) such that for all x > 0 we have P[X > x] = x^(-α) * L(x). (L is slowly varying iff, for all a > 0, the quotient L(ax)/L(x) converges to 1 as x goes to infinity.)
Relationships between definitions
(Note that even for those I state without caveats I haven’t convinced myself of a proof in detail.)
I’ll use #0 to refer to the clause on the right hand side of the “iff” statement in definition 0, and so on.
(For some of these one might have to use the suitable versions of heavy right tail / left tail etc. - e.g. perhaps #1 needs to be replaced with “heavy right and left tail” or “heavy right or left tail” etc.)
I suspect that #0 is the weakest condition, i.e. that all other definitions imply that X is super-Gaussian.
I suspect that #6 is the strongest condition, i.e. implies all others.
I think that: #3 ⇒ #2 ⇒ #1 and #5 ⇒ #4 ⇒ #1 (where ‘=>’ denotes implications).
Why I think that:
#0 weakest: Heuristically, many other definitions state or imply that some higher moments don’t exist, or are at least “close” to such a condition (e.g. #1). By contrast, #0 merely requires that a certain moment is larger than for the normal distribution. Also, the exponential distribution is super-Gaussian but not usually considered to be heavy-tailed—in fact, “heavy-tailed” is sometimes loosely explained to mean “having heavier tails than an exponential distribution”.
#6 strongest: The condition basically says that the distribution behaves like a Pareto distribution (or “power law”) as we look further down the tail. And for Pareto distributions with α ≤ 2 it’s well known and easy to see that the variance doesn’t exist, i.e. #3 holds. Similarly, I’ve seen power laws being cited as examples of distributions fulfilling the catastrophe principle, i.e. #5.
#3 ⇒ #2 is obvious.
#2 ⇒ #1: A statement very close to the contrapositive is well known: if the moment-generating function exists in an open neighborhood around some value, then the nth moments about that value are given by the nth derivative of the moment-generating function at that value. (I’m not sure if there can be weird cases where the moment-generating function exists in some points but no open interval.)
#5 ⇒ #4 and #4 ⇒ #1 are stated on Wikipedia.