Yeah, I think we agree on the maths, and I’m quite sympathetic to your recommendations regarding framing based on this. In fact, emphasizing “top x% share” as metric and avoiding any suggestion that it’s practically useful to treat “heavy-tailed” as a binary property were my key goals for the last round of revisions I made to the summary—but it seems like I didn’t fully succeed.
FWIW, I maybe wouldn’t go quite as far as you suggest in some places. I think the issue of “mathematically ‘heavy-tailed’ distributions may not be heavy-tailed in practice in the everyday sense” is an instance of a broader issue that crops up whenever one uses mathematical concepts that are defined in asymptotic terms in applied contexts.
To give just one example, consider that we often talk of “linear growth”, “exponential growth”, etc. I think this is quite useful, and that it would overall be bad to ‘taboo’ these terms and always replace them with some ‘model-agnostic’ metric that can be calculated for finitely many data points. But there we have the analog issue that depending on the parameters an e.g. exponential function can for practical purposes look very much like a linear function over the relevant finite range of data.
Another example would be computational complexity, e.g. when we talk about algorithms being “polynomial” or “exponential” regarding how many steps they require as function of the size of their inputs.
Yet another example would be attractors in dynamical systems.
In these and many other cases we encounter the same phenomenon that we often talk in terms of mathematical concepts that by definition only tell us that some property holds “eventually”, i.e. in the limit of arbitrarily long amounts of time, arbitrarily much data, or similar.
Of course, being aware of this really is important. In practice it often is crucial to have an intuition or more precise quantitative bounds on e.g. whether we have enough data points to be able to use some computational method that’s only guaranteed to work in the limit of infinite data. And sometimes we are better off using some algorithm that for sufficiently large inputs would be slower than some alternative, etc.
But on the other hand, talk in terms of ‘asymptotic’ concepts often is useful as well. I think one reason for why is that in practice when e.g. we say that something “looks like a heavy-tailed distribution” or that something “looks like exponential growth” we tend to mean “the top 1% share is relatively large / it would be hard to fit e.g. a normal distribution” or “it would be hard to fit a straight line to this data” etc., as opposed to just e.g. “there is a mathematically heavy-tailed distribution that with the right parameters provides a reasonable fit” or “there is an exponential function that with the right parameters provides a reasonable fit”. That is, the conventions for the use of these terms are significantly influenced by “practical” considerations (and things like Grice’s communication maxims) rather than just their mathematical definition.
So e.g. concretely when in practice we say that something is “log-normally distributed” we often do mean that it looks more heavy-tailed in the everyday sense than a normal distribution (even though it is a mathematical fact that there are log-normal distributions that are relatively thin-tailed in the everyday sense—indeed we can make most types of distributions arbitrarily thin-tailed or heavy-tailed in this sense!).
So taking a step back for a second, I think the primary point of collaborative written or spoken communication is to take the picture or conceptual map in my head and put it in your head, as accurately as possible. Use of any terms should, in my view, be assessed against whether those terms are likely to create the right picture in a reader’s or listener’s head. I appreciate this is a somewhat extreme position.
If everytime you use the term heavy-tailed (and it’s used a lot—a quick CTRL + F tells me it’s in the OP 25 times) I have to guess from context whether you mean the mathematical or commonsense definitions, it’s more difficult to parse what you actually mean in any given sentence. If someone is reading and doesn’t even know that those definitions substantially differ, they’ll probably come away with bad conclusions.
This isn’t a hypothetical corner case—I keep seeing people come to bad (or at least unsupported) conclusions in exactly this way, while thinking that their reasoning is mathematically sound and thus nigh-incontrovertible. To quote myself above:
The above, in my opinion, highlights the folly of ever thinking ‘well, log-normal distributions are heavy-tailed, and this should be log-normal because things got multiplied together, so the top 1% must be at least a few percent of the overall value’.
If I noticed that use of terms like ‘linear growth’ or ‘exponential growth’ were similarly leading to bad conclusions, e.g. by being extrapolated too far beyond the range of data in the sample, I would be similarly opposed to their use. But I don’t, so I’m not.
If I noticed that engineers at firms I have worked for were obsessed with replacing exponential algorithms with polynomial algorithms because they are better in some limit case, but worse in the actual use cases, I would point this out and suggest they stop thinking in those terms. But this hasn’t happened, so I haven’t ever done so.
I do notice that use of the term heavy-tailed (as a binary) in EA, especially with reference to the log-normal distribution, is causing people to make claims about how we should expect this to be ‘a heavy-tailed distribution’ and how important it therefore is to attract the top 1%, and so...you get the idea.
Still, a full taboo is unrealistic and was intended as an aside; closer to ‘in my ideal world’ or ‘this is what I aim for my own writing’, rather than a practical suggestion to others. As I said, I think the actual suggestions made in this summary are good—replacing the question ‘is this heavy-tailed or not’ with ‘how heavy-tailed is this’ should do the trick- and hope to see them become more widely adopted.
I’m not sure how extreme your general take on communication is, and I think at least I have a fairly similar view.
I agree that the kind of practical experiences you mention can be a good reason to be more careful with the use of some mathematical concepts but not others. I think I’ve seen fewer instances of people making fallacious inferences based on something being log-normal, but if I had I think I might have arrived at similar aspirations as you regarding how to frame things.
(An invalid type of argument I have seen frequently is actually the “things multiply, so we get a log-normal” part. But as you have pointed out in your top-level comment, if we multiply a small number of thin-tailed and low-variance factors we’ll get something that’s not exactly a ‘paradigmatic example’ of a log-normal distribution even though we could reasonably approximate it with one. On the other hand, if the conditions of the ‘multiplicative CLT’ aren’t fulfilled we can easily get something with heavier tails than a log-normal. See also fn26 in our doc:
We’ve sometimes encountered the misconception that products of light-tailed factors always converge to a log-normal distribution. However, in fact, depending on the details the limit can also be another type of heavy-tailed distribution, such as a power law (see, e.g., Mitzenmacher 2004, sc. 5-7 for an accessible discussion and examples). Relevant details include whether there is a strictly positive minimum value beyond which products can’t fall (ibid., sc. 5.1), random variation in the number of factors (ibid., sc. 7), and correlations between factors.
Yeah, I think we agree on the maths, and I’m quite sympathetic to your recommendations regarding framing based on this. In fact, emphasizing “top x% share” as metric and avoiding any suggestion that it’s practically useful to treat “heavy-tailed” as a binary property were my key goals for the last round of revisions I made to the summary—but it seems like I didn’t fully succeed.
FWIW, I maybe wouldn’t go quite as far as you suggest in some places. I think the issue of “mathematically ‘heavy-tailed’ distributions may not be heavy-tailed in practice in the everyday sense” is an instance of a broader issue that crops up whenever one uses mathematical concepts that are defined in asymptotic terms in applied contexts.
To give just one example, consider that we often talk of “linear growth”, “exponential growth”, etc. I think this is quite useful, and that it would overall be bad to ‘taboo’ these terms and always replace them with some ‘model-agnostic’ metric that can be calculated for finitely many data points. But there we have the analog issue that depending on the parameters an e.g. exponential function can for practical purposes look very much like a linear function over the relevant finite range of data.
Another example would be computational complexity, e.g. when we talk about algorithms being “polynomial” or “exponential” regarding how many steps they require as function of the size of their inputs.
Yet another example would be attractors in dynamical systems.
In these and many other cases we encounter the same phenomenon that we often talk in terms of mathematical concepts that by definition only tell us that some property holds “eventually”, i.e. in the limit of arbitrarily long amounts of time, arbitrarily much data, or similar.
Of course, being aware of this really is important. In practice it often is crucial to have an intuition or more precise quantitative bounds on e.g. whether we have enough data points to be able to use some computational method that’s only guaranteed to work in the limit of infinite data. And sometimes we are better off using some algorithm that for sufficiently large inputs would be slower than some alternative, etc.
But on the other hand, talk in terms of ‘asymptotic’ concepts often is useful as well. I think one reason for why is that in practice when e.g. we say that something “looks like a heavy-tailed distribution” or that something “looks like exponential growth” we tend to mean “the top 1% share is relatively large / it would be hard to fit e.g. a normal distribution” or “it would be hard to fit a straight line to this data” etc., as opposed to just e.g. “there is a mathematically heavy-tailed distribution that with the right parameters provides a reasonable fit” or “there is an exponential function that with the right parameters provides a reasonable fit”. That is, the conventions for the use of these terms are significantly influenced by “practical” considerations (and things like Grice’s communication maxims) rather than just their mathematical definition.
So e.g. concretely when in practice we say that something is “log-normally distributed” we often do mean that it looks more heavy-tailed in the everyday sense than a normal distribution (even though it is a mathematical fact that there are log-normal distributions that are relatively thin-tailed in the everyday sense—indeed we can make most types of distributions arbitrarily thin-tailed or heavy-tailed in this sense!).
So taking a step back for a second, I think the primary point of collaborative written or spoken communication is to take the picture or conceptual map in my head and put it in your head, as accurately as possible. Use of any terms should, in my view, be assessed against whether those terms are likely to create the right picture in a reader’s or listener’s head. I appreciate this is a somewhat extreme position.
If everytime you use the term heavy-tailed (and it’s used a lot—a quick CTRL + F tells me it’s in the OP 25 times) I have to guess from context whether you mean the mathematical or commonsense definitions, it’s more difficult to parse what you actually mean in any given sentence. If someone is reading and doesn’t even know that those definitions substantially differ, they’ll probably come away with bad conclusions.
This isn’t a hypothetical corner case—I keep seeing people come to bad (or at least unsupported) conclusions in exactly this way, while thinking that their reasoning is mathematically sound and thus nigh-incontrovertible. To quote myself above:
If I noticed that use of terms like ‘linear growth’ or ‘exponential growth’ were similarly leading to bad conclusions, e.g. by being extrapolated too far beyond the range of data in the sample, I would be similarly opposed to their use. But I don’t, so I’m not.
If I noticed that engineers at firms I have worked for were obsessed with replacing exponential algorithms with polynomial algorithms because they are better in some limit case, but worse in the actual use cases, I would point this out and suggest they stop thinking in those terms. But this hasn’t happened, so I haven’t ever done so.
I do notice that use of the term heavy-tailed (as a binary) in EA, especially with reference to the log-normal distribution, is causing people to make claims about how we should expect this to be ‘a heavy-tailed distribution’ and how important it therefore is to attract the top 1%, and so...you get the idea.
Still, a full taboo is unrealistic and was intended as an aside; closer to ‘in my ideal world’ or ‘this is what I aim for my own writing’, rather than a practical suggestion to others. As I said, I think the actual suggestions made in this summary are good—replacing the question ‘is this heavy-tailed or not’ with ‘how heavy-tailed is this’ should do the trick- and hope to see them become more widely adopted.
I’m not sure how extreme your general take on communication is, and I think at least I have a fairly similar view.
I agree that the kind of practical experiences you mention can be a good reason to be more careful with the use of some mathematical concepts but not others. I think I’ve seen fewer instances of people making fallacious inferences based on something being log-normal, but if I had I think I might have arrived at similar aspirations as you regarding how to frame things.
(An invalid type of argument I have seen frequently is actually the “things multiply, so we get a log-normal” part. But as you have pointed out in your top-level comment, if we multiply a small number of thin-tailed and low-variance factors we’ll get something that’s not exactly a ‘paradigmatic example’ of a log-normal distribution even though we could reasonably approximate it with one. On the other hand, if the conditions of the ‘multiplicative CLT’ aren’t fulfilled we can easily get something with heavier tails than a log-normal. See also fn26 in our doc:
)