Hi Max and Ben, a few related thoughts below. Many of these are mentioned in various places in the doc, so seem to have been understood, but nonetheless have implications for your summary and qualitative commentary, which I sometimes think misses the mark.
Many distributions are heavy-tailed mathematically, but not in the common use of that term, which I think is closer to âhow concentrated is the thing into the top 0.1%/â1%/âetc.â, and thus âhow important is it I find top performersâ or âhow important is it to attract the top performersâ. For example, you write the following:
What share of total output should we expect to come from the small fraction of people weâre most optimistic about (say, the top 1% or top 0.1%) â that is, how heavy-tailed is the distribution of ex-ante performance?
Often, you canât derive this directly from the distributionâs mathematical type. In particular, you cannot derive it from whether a distribution is heavy-tailed in the mathematical sense.
Log-normal distributions are particuarly common and are a particular offender here, because they tend to occur whenever lots of independent factors are multiplied together. But here is the approximate* fraction of value that comes from the top 1% in a few different log-normal distributions: EXP(N(0,0.0001)) â 1.02% EXP(N(0,0001)) â 1.08% EXP(N(0,0.01)) â 1.28% EXP(N(0,0.1)) â 2.22% EXP(N(0,1)) â 9.5%
For a real-world example, geometric brownian motion is the most common model of stock prices, and produces a log-normal distribution of prices, but models based on GBM actually produce pretty thin tails in the commonsense use, which are in turn much thinner than the tails in real stock markets, as (in?)famously chronicled in Talebâs Black Swan among others. Since Iâm a finance person who came of age right as that book was written, Iâm particularly used to thinking of the log-normal distribution as âthe stupidly-thin-tailed oneâ, and have a brief moment of confusion every time it is referred to as âheavy-tailedâ.
The above, in my opinion, highlights the folly of ever thinking âwell, log-normal distributions are heavy-tailed, and this should be log-normal because things got multiplied together, so the top 1% must be at least a few percent of the overall valueâ. Log-normal distributions with low variance are practically indistinguishable from normal distributions. In fact, as I understand it many oft-used examples of normal distributions, such as height and other biological properties, are actually believed to follow a log-normal distribution.
***
Iâd guess we agree on the above, though if not Iâd welcome a correction. But Iâll go ahead and flag bits of your summary that look weird to me assuming we agree on the mathematical facts:
By contrast, a large meta-analysis reports âthin-tailedâ (Gaussian) distributions for ex-post performance in less complex jobs such as cook or mail carrier [1]: the top 1% account for 3-3.7% of the total.
I havenât read the meta-analysis, but Iâd tentatively bet that much like biological properties these jobs actually follow log-normal distributions and they just couldnât tell (and werenât trying to tell) the difference.
These figures illustrate that the difference between âthin-tailedâ and âheavy-tailedâ distributions can be modest in the range that matters in practice
I agree with the direction of this statement, but itâs actually worse than that: depending on the tail of interest âheavy-tailed distributionsâ can have thinner tails than âthin-tailed distributionsâ! For example, compare my numbers for the top 1% of various log-normal distributions to the right-hand-side of a standard N(0,1) normal distribution where we cut off negative values (~3.5% in top 1%).
Itâs also somewhat common to see comments like this from 80k staff (This from Ben Todd elsewhere in this thread):
You can get heavy tailed outcomes if performance is the product of two normally distributed factors (e.g. intelligence x effort).
You indeed can, but like the log-normal distribution this will tend to have pretty thin tails in the common use of the term. For example, multipling two N(100,225) distributions together, chosen because this is roughly the distribution of IQ, gets you a distribution where the top 1% account for 1.6% of the total. Looping back to my above thought, Iâd also guess that performance on jobs like cook and mail-carrier look very close to this, and empirically were observed to have similarly thin tails (aptitude x intelligence x effort might in fact be the right framing for these jobs).
***
Ultimately, the recommendation I would give is much the same as the bottom line presented, which I was very happy to see. Indeed, Iâm mostly grumbling because I want to discourage anything which treats heavy-tailed as a binary property**, as parts of the summary/âcommentary tend to, see above.
Some advice for how to work with these concepts in practice:
In practice, donât treat âheavy-tailedâ as a binary property. Instead, ask how heavy the tails of some quantity of interest are, for instance by identifying the frequency of outliers youâre interested in (e.g. top 1%, top 0.1%, âŠ) and comparing them to the median or looking at their share of the total. [2]
Carefully choose the underlying population and the metric for performance, in a way thatâs tailored to the purpose of your analysis. In particular, be mindful of whether youâre looking at the full distribution or some tail (e.g. wealth of all citizens vs. wealth of billionaires).
*Approximate because I was lazy and just simulated 10000 values to get these and other quoted numbers. AFAIK the true values are not sufficiently different to affect the point Iâm making.
**If it were up to me, Iâd taboo the term âheavy-tailedâ entirely, because having an oft-used term whose mathematical and commonsense notions differ is an obvious recipe for miscommunication in a STEM-heavy community like this one.
Yeah, I think we agree on the maths, and Iâm quite sympathetic to your recommendations regarding framing based on this. In fact, emphasizing âtop x% shareâ as metric and avoiding any suggestion that itâs practically useful to treat âheavy-tailedâ as a binary property were my key goals for the last round of revisions I made to the summaryâbut it seems like I didnât fully succeed.
FWIW, I maybe wouldnât go quite as far as you suggest in some places. I think the issue of âmathematically âheavy-tailedâ distributions may not be heavy-tailed in practice in the everyday senseâ is an instance of a broader issue that crops up whenever one uses mathematical concepts that are defined in asymptotic terms in applied contexts.
To give just one example, consider that we often talk of âlinear growthâ, âexponential growthâ, etc. I think this is quite useful, and that it would overall be bad to âtabooâ these terms and always replace them with some âmodel-agnosticâ metric that can be calculated for finitely many data points. But there we have the analog issue that depending on the parameters an e.g. exponential function can for practical purposes look very much like a linear function over the relevant finite range of data.
Another example would be computational complexity, e.g. when we talk about algorithms being âpolynomialâ or âexponentialâ regarding how many steps they require as function of the size of their inputs.
Yet another example would be attractors in dynamical systems.
In these and many other cases we encounter the same phenomenon that we often talk in terms of mathematical concepts that by definition only tell us that some property holds âeventuallyâ, i.e. in the limit of arbitrarily long amounts of time, arbitrarily much data, or similar.
Of course, being aware of this really is important. In practice it often is crucial to have an intuition or more precise quantitative bounds on e.g. whether we have enough data points to be able to use some computational method thatâs only guaranteed to work in the limit of infinite data. And sometimes we are better off using some algorithm that for sufficiently large inputs would be slower than some alternative, etc.
But on the other hand, talk in terms of âasymptoticâ concepts often is useful as well. I think one reason for why is that in practice when e.g. we say that something âlooks like a heavy-tailed distributionâ or that something âlooks like exponential growthâ we tend to mean âthe top 1% share is relatively large /â it would be hard to fit e.g. a normal distributionâ or âit would be hard to fit a straight line to this dataâ etc., as opposed to just e.g. âthere is a mathematically heavy-tailed distribution that with the right parameters provides a reasonable fitâ or âthere is an exponential function that with the right parameters provides a reasonable fitâ. That is, the conventions for the use of these terms are significantly influenced by âpracticalâ considerations (and things like Griceâs communication maxims) rather than just their mathematical definition.
So e.g. concretely when in practice we say that something is âlog-normally distributedâ we often do mean that it looks more heavy-tailed in the everyday sense than a normal distribution (even though it is a mathematical fact that there are log-normal distributions that are relatively thin-tailed in the everyday senseâindeed we can make most types of distributions arbitrarily thin-tailed or heavy-tailed in this sense!).
So taking a step back for a second, I think the primary point of collaborative written or spoken communication is to take the picture or conceptual map in my head and put it in your head, as accurately as possible. Use of any terms should, in my view, be assessed against whether those terms are likely to create the right picture in a readerâs or listenerâs head. I appreciate this is a somewhat extreme position.
If everytime you use the term heavy-tailed (and itâs used a lotâa quick CTRL + F tells me itâs in the OP 25 times) I have to guess from context whether you mean the mathematical or commonsense definitions, itâs more difficult to parse what you actually mean in any given sentence. If someone is reading and doesnât even know that those definitions substantially differ, theyâll probably come away with bad conclusions.
This isnât a hypothetical corner caseâI keep seeing people come to bad (or at least unsupported) conclusions in exactly this way, while thinking that their reasoning is mathematically sound and thus nigh-incontrovertible. To quote myself above:
The above, in my opinion, highlights the folly of ever thinking âwell, log-normal distributions are heavy-tailed, and this should be log-normal because things got multiplied together, so the top 1% must be at least a few percent of the overall valueâ.
If I noticed that use of terms like âlinear growthâ or âexponential growthâ were similarly leading to bad conclusions, e.g. by being extrapolated too far beyond the range of data in the sample, I would be similarly opposed to their use. But I donât, so Iâm not.
If I noticed that engineers at firms I have worked for were obsessed with replacing exponential algorithms with polynomial algorithms because they are better in some limit case, but worse in the actual use cases, I would point this out and suggest they stop thinking in those terms. But this hasnât happened, so I havenât ever done so.
I do notice that use of the term heavy-tailed (as a binary) in EA, especially with reference to the log-normal distribution, is causing people to make claims about how we should expect this to be âa heavy-tailed distributionâ and how important it therefore is to attract the top 1%, and so...you get the idea.
Still, a full taboo is unrealistic and was intended as an aside; closer to âin my ideal worldâ or âthis is what I aim for my own writingâ, rather than a practical suggestion to others. As I said, I think the actual suggestions made in this summary are goodâreplacing the question âis this heavy-tailed or notâ with âhow heavy-tailed is thisâ should do the trick- and hope to see them become more widely adopted.
Iâm not sure how extreme your general take on communication is, and I think at least I have a fairly similar view.
I agree that the kind of practical experiences you mention can be a good reason to be more careful with the use of some mathematical concepts but not others. I think Iâve seen fewer instances of people making fallacious inferences based on something being log-normal, but if I had I think I might have arrived at similar aspirations as you regarding how to frame things.
(An invalid type of argument I have seen frequently is actually the âthings multiply, so we get a log-normalâ part. But as you have pointed out in your top-level comment, if we multiply a small number of thin-tailed and low-variance factors weâll get something thatâs not exactly a âparadigmatic exampleâ of a log-normal distribution even though we could reasonably approximate it with one. On the other hand, if the conditions of the âmultiplicative CLTâ arenât fulfilled we can easily get something with heavier tails than a log-normal. See also fn26 in our doc:
Weâve sometimes encountered the misconception that products of light-tailed factors always converge to a log-normal distribution. However, in fact, depending on the details the limit can also be another type of heavy-tailed distribution, such as a power law (see, e.g., Mitzenmacher 2004, sc. 5-7 for an accessible discussion and examples). Relevant details include whether there is a strictly positive minimum value beyond which products canât fall (ibid., sc. 5.1), random variation in the number of factors (ibid., sc. 7), and correlations between factors.
As an aside, for a good and philosophically rigorous criticism of cavalier assumptions of normality or (arguably) pseudo-explanations that involve the central limit theorem, Iâd recommend Lyon (2014), Why are Normal Distributions Normal?
Basically I think that whenever we are in the business of understanding how things actually work/ââwhyâ weâre seeing the data distributions weâre seeing, often-invoked explanations like the CLT or âmultiplicativeâ CLT are kind of the tip of the iceberg that provides the âactualâ explanation (rather then being literally correct by themselves), this iceberg having to do with the principle of maximum entropy /â the tendency for entropy to increase /â âuniversalityâ and the fact that certain types of distributions are âattractorsâ for a wide range of generating processes. Iâm too much of an âabstract algebra personâ to have a clear sense of whatâs going on, but I think itâs fairly clear that the folk story of âa lot of things âareâ normally distributed because of âtheâ central limit theoremâ is at best an âapproximationâ and at worst misleading.
(One âmathematicalâ way to see this is that itâs fishy that there are so many different versions of the CLT rather than one clear âcanonicalâ or âmaximally generalâ one. I guess stuff like this also is why I tend to find common introductions to statistics horribly unaesthetic and have had a hard time engaging with them.)
I havenât read the meta-analysis, but Iâd tentatively bet that much like biological properties these jobs actually follow log-normal distributions and they just couldnât tell (and werenât trying to tell) the difference.
I kind of agree with this (and this is why I deliberately said that âthey report a Gaussian distributionâ rather than e.g. âperformance is normally distributedâ). In particular, yes, they just assumed a normal distribution and then ran with this in all cases in which it didnât lead to obvious problems/âbad fits no matter the parameters. They did not compare Gaussian with other models.
I still think itâs accurate and useful to say that they were using (and didnât reject) a normal distribution as model for low- and medium-complexity jobs as this does tell you something about how the data looks like. (Since there is a lot of possible data where no normal distribution is a reasonable fit.)
I also agree that probably a log-normal model is âcloser to the truthâ than a normal one. But on the other hand I think itâs pretty clear that actually neither a normal nor a log-normal model is fully correct. Indeed, what would it mean that âjobs actually follow a certain type of distributionâ? If weâre just talking about fitting a distribution to data, we will never get a perfect fit, and all we can do is providing goodness-of-fit statistics for different models (which usually wonât conclusively identify any single one). This kind of brute/ânaive empiricism just wonât and canât get us to âhow things actually workâ. On the other hand, if we try to build a model of the causal generating mechanism of job performance it seems clear that the âtruthâ will be much more complex and messyâwe will only have finitely many contributing things (and a log-normal distribution is something weâd get at best âin the limitâ), the contributing factors wonât all be independent etc. etc. Indeed, âprobability distributionâ to me basically seems like the wrong type to talk about when weâre in the business of understanding âhow things actually workââwhat we want then is really a richer and more complex model (in the sense that we could have several different models that would yield the same approximate data distribution but that would paint a fairly different picture of âhow things actually workâ; basically Iâm saying that things like âquantum mechanicsâ or âthe Solow growth modelâ or whatever have much more structure and are not a single probability distribution).
Briefly on this, I think my issue becomes clearer if you look at the full section.
If we agree that log-normal is more likely than normal, and log-normal distributions are heavy-tailed, then saying âBy contrast, [performance in these jobs] is thin-tailedâ is just incorrect? Assuming you meant the mathematical senses of heavy-tailed and thin-tailed here, which I guess Iâm not sure if you did.
This uncertainty and resulting inability to assess whether this section is true or false obviously loops back to why I would prefer not to use the term âheavy-tailedâ at all, which I will address in more detail in my reply to your other comment.
Ex-post performance appears âheavy-tailedâ in many relevant domains, but with very large differences in how heavy-tailed: the top 1% account for between 4% to over 80% of the total. For instance, we find âheavy-tailedâ distributions (e.g. log-normal, power law) of scientific citations, startup valuations, income, and media sales. By contrast, a large meta-analysis reports âthin-tailedâ (Gaussian) distributions for ex-post performance in less complex jobs such as cook or mail carrier
I think the main takeaway here is that you find that section confusing, and thatâs not something one can âargue awayâ, and does point to room for improvement in my writing. :)
With that being said, note that we in fact donât say anywhere that anything âis thin-tailedâ. We just say that some paper âreportsâ a thin-tailed distribution, which seems uncontroversially true. (OTOH I can totally see that the âby contrastâ is confusing on some readings. And I also agree that it basically doesnât matter what we say literallyâif people read what we say as claiming that something is thin-tailed, then thatâs a problem.)
FWIW, from my perspective the key observations (which I apparently failed to convey in a clear way at least for you) here are:
The top 1% share of ex-post âperformanceâ [though see elsewhere that maybe thatâs not the ideal term] data reported in the literature varies a lot, at least between 3% and 80%. So usually youâll want to know roughly where on the spectrum you are for the job/âtask/âsituation relevant to you rather than just whether or not some binary property holds.
The range of top 1% shares is almost as large for data for which the sources used a mathematically âheavy-tailedâ type of distribution as model. In particular, there are some cases where we some source reports a mathematically âheavy-tailedâ distribution but where the top 1% share is barely larger than for other data based on a mathematically âthin-tailedâ distribution.
(As discussed elsewhere, itâs of course mathematically possible to have a mathematically âthin-tailedâ distribution with a larger top 1% share than a mathematically âheavy-tailedâ distribution. But the above observation is about what we in fact find in the literature rather than about whatâs mathematically possible. I think the key point here is not so much that we havenât found a âthin-tailedâ distribution with larger top 1% share than some âheavy-tailedâ distribution. but that the mathematical âheavy-tailedâ property doesnât cleanly distinguish data/âdistributions by their top 1% share even in practice.)
So donât look at whether the type of distribution used is âthin-tailedâ or âheavy-tailedâ in the mathematical sense, ask how heavy-tailed in the everyday sense (as operationalized by top 1% share or whatever you care about) your data/âdistribution is.
So basically what I tried to do is mentioning that we find both mathematically thin-tailed and mathematically heavy-tailed distributions reported in the literature in order to point out that this arguably isnât the key thing to pay attention to. (But yeah I can totally see that this is not coming across in the summary as currently worded.)
As I tried to explain in my previous comment, I think the question whether performance in some domain is actually âthin-tailedâ or âheavy-tailedâ in the mathematical sense is closer to ill-posed or meaningless than true or false. Hence why I set aside the issue of whether a normal distribution or similar-looking log-normal distribution is the better model.
Hi Max and Ben, a few related thoughts below. Many of these are mentioned in various places in the doc, so seem to have been understood, but nonetheless have implications for your summary and qualitative commentary, which I sometimes think misses the mark.
Many distributions are heavy-tailed mathematically, but not in the common use of that term, which I think is closer to âhow concentrated is the thing into the top 0.1%/â1%/âetc.â, and thus âhow important is it I find top performersâ or âhow important is it to attract the top performersâ. For example, you write the following:
Often, you canât derive this directly from the distributionâs mathematical type. In particular, you cannot derive it from whether a distribution is heavy-tailed in the mathematical sense.
Log-normal distributions are particuarly common and are a particular offender here, because they tend to occur whenever lots of independent factors are multiplied together. But here is the approximate* fraction of value that comes from the top 1% in a few different log-normal distributions:
EXP(N(0,0.0001)) â 1.02%
EXP(N(0,0001)) â 1.08%
EXP(N(0,0.01)) â 1.28%
EXP(N(0,0.1)) â 2.22%
EXP(N(0,1)) â 9.5%
For a real-world example, geometric brownian motion is the most common model of stock prices, and produces a log-normal distribution of prices, but models based on GBM actually produce pretty thin tails in the commonsense use, which are in turn much thinner than the tails in real stock markets, as (in?)famously chronicled in Talebâs Black Swan among others. Since Iâm a finance person who came of age right as that book was written, Iâm particularly used to thinking of the log-normal distribution as âthe stupidly-thin-tailed oneâ, and have a brief moment of confusion every time it is referred to as âheavy-tailedâ.
The above, in my opinion, highlights the folly of ever thinking âwell, log-normal distributions are heavy-tailed, and this should be log-normal because things got multiplied together, so the top 1% must be at least a few percent of the overall valueâ. Log-normal distributions with low variance are practically indistinguishable from normal distributions. In fact, as I understand it many oft-used examples of normal distributions, such as height and other biological properties, are actually believed to follow a log-normal distribution.
***
Iâd guess we agree on the above, though if not Iâd welcome a correction. But Iâll go ahead and flag bits of your summary that look weird to me assuming we agree on the mathematical facts:
I havenât read the meta-analysis, but Iâd tentatively bet that much like biological properties these jobs actually follow log-normal distributions and they just couldnât tell (and werenât trying to tell) the difference.
I agree with the direction of this statement, but itâs actually worse than that: depending on the tail of interest âheavy-tailed distributionsâ can have thinner tails than âthin-tailed distributionsâ! For example, compare my numbers for the top 1% of various log-normal distributions to the right-hand-side of a standard N(0,1) normal distribution where we cut off negative values (~3.5% in top 1%).
Itâs also somewhat common to see comments like this from 80k staff (This from Ben Todd elsewhere in this thread):
You indeed can, but like the log-normal distribution this will tend to have pretty thin tails in the common use of the term. For example, multipling two N(100,225) distributions together, chosen because this is roughly the distribution of IQ, gets you a distribution where the top 1% account for 1.6% of the total. Looping back to my above thought, Iâd also guess that performance on jobs like cook and mail-carrier look very close to this, and empirically were observed to have similarly thin tails (aptitude x intelligence x effort might in fact be the right framing for these jobs).
***
Ultimately, the recommendation I would give is much the same as the bottom line presented, which I was very happy to see. Indeed, Iâm mostly grumbling because I want to discourage anything which treats heavy-tailed as a binary property**, as parts of the summary/âcommentary tend to, see above.
*Approximate because I was lazy and just simulated 10000 values to get these and other quoted numbers. AFAIK the true values are not sufficiently different to affect the point Iâm making.
**If it were up to me, Iâd taboo the term âheavy-tailedâ entirely, because having an oft-used term whose mathematical and commonsense notions differ is an obvious recipe for miscommunication in a STEM-heavy community like this one.
Yeah, I think we agree on the maths, and Iâm quite sympathetic to your recommendations regarding framing based on this. In fact, emphasizing âtop x% shareâ as metric and avoiding any suggestion that itâs practically useful to treat âheavy-tailedâ as a binary property were my key goals for the last round of revisions I made to the summaryâbut it seems like I didnât fully succeed.
FWIW, I maybe wouldnât go quite as far as you suggest in some places. I think the issue of âmathematically âheavy-tailedâ distributions may not be heavy-tailed in practice in the everyday senseâ is an instance of a broader issue that crops up whenever one uses mathematical concepts that are defined in asymptotic terms in applied contexts.
To give just one example, consider that we often talk of âlinear growthâ, âexponential growthâ, etc. I think this is quite useful, and that it would overall be bad to âtabooâ these terms and always replace them with some âmodel-agnosticâ metric that can be calculated for finitely many data points. But there we have the analog issue that depending on the parameters an e.g. exponential function can for practical purposes look very much like a linear function over the relevant finite range of data.
Another example would be computational complexity, e.g. when we talk about algorithms being âpolynomialâ or âexponentialâ regarding how many steps they require as function of the size of their inputs.
Yet another example would be attractors in dynamical systems.
In these and many other cases we encounter the same phenomenon that we often talk in terms of mathematical concepts that by definition only tell us that some property holds âeventuallyâ, i.e. in the limit of arbitrarily long amounts of time, arbitrarily much data, or similar.
Of course, being aware of this really is important. In practice it often is crucial to have an intuition or more precise quantitative bounds on e.g. whether we have enough data points to be able to use some computational method thatâs only guaranteed to work in the limit of infinite data. And sometimes we are better off using some algorithm that for sufficiently large inputs would be slower than some alternative, etc.
But on the other hand, talk in terms of âasymptoticâ concepts often is useful as well. I think one reason for why is that in practice when e.g. we say that something âlooks like a heavy-tailed distributionâ or that something âlooks like exponential growthâ we tend to mean âthe top 1% share is relatively large /â it would be hard to fit e.g. a normal distributionâ or âit would be hard to fit a straight line to this dataâ etc., as opposed to just e.g. âthere is a mathematically heavy-tailed distribution that with the right parameters provides a reasonable fitâ or âthere is an exponential function that with the right parameters provides a reasonable fitâ. That is, the conventions for the use of these terms are significantly influenced by âpracticalâ considerations (and things like Griceâs communication maxims) rather than just their mathematical definition.
So e.g. concretely when in practice we say that something is âlog-normally distributedâ we often do mean that it looks more heavy-tailed in the everyday sense than a normal distribution (even though it is a mathematical fact that there are log-normal distributions that are relatively thin-tailed in the everyday senseâindeed we can make most types of distributions arbitrarily thin-tailed or heavy-tailed in this sense!).
So taking a step back for a second, I think the primary point of collaborative written or spoken communication is to take the picture or conceptual map in my head and put it in your head, as accurately as possible. Use of any terms should, in my view, be assessed against whether those terms are likely to create the right picture in a readerâs or listenerâs head. I appreciate this is a somewhat extreme position.
If everytime you use the term heavy-tailed (and itâs used a lotâa quick CTRL + F tells me itâs in the OP 25 times) I have to guess from context whether you mean the mathematical or commonsense definitions, itâs more difficult to parse what you actually mean in any given sentence. If someone is reading and doesnât even know that those definitions substantially differ, theyâll probably come away with bad conclusions.
This isnât a hypothetical corner caseâI keep seeing people come to bad (or at least unsupported) conclusions in exactly this way, while thinking that their reasoning is mathematically sound and thus nigh-incontrovertible. To quote myself above:
If I noticed that use of terms like âlinear growthâ or âexponential growthâ were similarly leading to bad conclusions, e.g. by being extrapolated too far beyond the range of data in the sample, I would be similarly opposed to their use. But I donât, so Iâm not.
If I noticed that engineers at firms I have worked for were obsessed with replacing exponential algorithms with polynomial algorithms because they are better in some limit case, but worse in the actual use cases, I would point this out and suggest they stop thinking in those terms. But this hasnât happened, so I havenât ever done so.
I do notice that use of the term heavy-tailed (as a binary) in EA, especially with reference to the log-normal distribution, is causing people to make claims about how we should expect this to be âa heavy-tailed distributionâ and how important it therefore is to attract the top 1%, and so...you get the idea.
Still, a full taboo is unrealistic and was intended as an aside; closer to âin my ideal worldâ or âthis is what I aim for my own writingâ, rather than a practical suggestion to others. As I said, I think the actual suggestions made in this summary are goodâreplacing the question âis this heavy-tailed or notâ with âhow heavy-tailed is thisâ should do the trick- and hope to see them become more widely adopted.
Iâm not sure how extreme your general take on communication is, and I think at least I have a fairly similar view.
I agree that the kind of practical experiences you mention can be a good reason to be more careful with the use of some mathematical concepts but not others. I think Iâve seen fewer instances of people making fallacious inferences based on something being log-normal, but if I had I think I might have arrived at similar aspirations as you regarding how to frame things.
(An invalid type of argument I have seen frequently is actually the âthings multiply, so we get a log-normalâ part. But as you have pointed out in your top-level comment, if we multiply a small number of thin-tailed and low-variance factors weâll get something thatâs not exactly a âparadigmatic exampleâ of a log-normal distribution even though we could reasonably approximate it with one. On the other hand, if the conditions of the âmultiplicative CLTâ arenât fulfilled we can easily get something with heavier tails than a log-normal. See also fn26 in our doc:
)
As an aside, for a good and philosophically rigorous criticism of cavalier assumptions of normality or (arguably) pseudo-explanations that involve the central limit theorem, Iâd recommend Lyon (2014), Why are Normal Distributions Normal?
Basically I think that whenever we are in the business of understanding how things actually work/ââwhyâ weâre seeing the data distributions weâre seeing, often-invoked explanations like the CLT or âmultiplicativeâ CLT are kind of the tip of the iceberg that provides the âactualâ explanation (rather then being literally correct by themselves), this iceberg having to do with the principle of maximum entropy /â the tendency for entropy to increase /â âuniversalityâ and the fact that certain types of distributions are âattractorsâ for a wide range of generating processes. Iâm too much of an âabstract algebra personâ to have a clear sense of whatâs going on, but I think itâs fairly clear that the folk story of âa lot of things âareâ normally distributed because of âtheâ central limit theoremâ is at best an âapproximationâ and at worst misleading.
(One âmathematicalâ way to see this is that itâs fishy that there are so many different versions of the CLT rather than one clear âcanonicalâ or âmaximally generalâ one. I guess stuff like this also is why I tend to find common introductions to statistics horribly unaesthetic and have had a hard time engaging with them.)
I kind of agree with this (and this is why I deliberately said that âthey report a Gaussian distributionâ rather than e.g. âperformance is normally distributedâ). In particular, yes, they just assumed a normal distribution and then ran with this in all cases in which it didnât lead to obvious problems/âbad fits no matter the parameters. They did not compare Gaussian with other models.
I still think itâs accurate and useful to say that they were using (and didnât reject) a normal distribution as model for low- and medium-complexity jobs as this does tell you something about how the data looks like. (Since there is a lot of possible data where no normal distribution is a reasonable fit.)
I also agree that probably a log-normal model is âcloser to the truthâ than a normal one. But on the other hand I think itâs pretty clear that actually neither a normal nor a log-normal model is fully correct. Indeed, what would it mean that âjobs actually follow a certain type of distributionâ? If weâre just talking about fitting a distribution to data, we will never get a perfect fit, and all we can do is providing goodness-of-fit statistics for different models (which usually wonât conclusively identify any single one). This kind of brute/ânaive empiricism just wonât and canât get us to âhow things actually workâ. On the other hand, if we try to build a model of the causal generating mechanism of job performance it seems clear that the âtruthâ will be much more complex and messyâwe will only have finitely many contributing things (and a log-normal distribution is something weâd get at best âin the limitâ), the contributing factors wonât all be independent etc. etc. Indeed, âprobability distributionâ to me basically seems like the wrong type to talk about when weâre in the business of understanding âhow things actually workââwhat we want then is really a richer and more complex model (in the sense that we could have several different models that would yield the same approximate data distribution but that would paint a fairly different picture of âhow things actually workâ; basically Iâm saying that things like âquantum mechanicsâ or âthe Solow growth modelâ or whatever have much more structure and are not a single probability distribution).
Briefly on this, I think my issue becomes clearer if you look at the full section.
If we agree that log-normal is more likely than normal, and log-normal distributions are heavy-tailed, then saying âBy contrast, [performance in these jobs] is thin-tailedâ is just incorrect? Assuming you meant the mathematical senses of heavy-tailed and thin-tailed here, which I guess Iâm not sure if you did.
This uncertainty and resulting inability to assess whether this section is true or false obviously loops back to why I would prefer not to use the term âheavy-tailedâ at all, which I will address in more detail in my reply to your other comment.
I think the main takeaway here is that you find that section confusing, and thatâs not something one can âargue awayâ, and does point to room for improvement in my writing. :)
With that being said, note that we in fact donât say anywhere that anything âis thin-tailedâ. We just say that some paper âreportsâ a thin-tailed distribution, which seems uncontroversially true. (OTOH I can totally see that the âby contrastâ is confusing on some readings. And I also agree that it basically doesnât matter what we say literallyâif people read what we say as claiming that something is thin-tailed, then thatâs a problem.)
FWIW, from my perspective the key observations (which I apparently failed to convey in a clear way at least for you) here are:
The top 1% share of ex-post âperformanceâ [though see elsewhere that maybe thatâs not the ideal term] data reported in the literature varies a lot, at least between 3% and 80%. So usually youâll want to know roughly where on the spectrum you are for the job/âtask/âsituation relevant to you rather than just whether or not some binary property holds.
The range of top 1% shares is almost as large for data for which the sources used a mathematically âheavy-tailedâ type of distribution as model. In particular, there are some cases where we some source reports a mathematically âheavy-tailedâ distribution but where the top 1% share is barely larger than for other data based on a mathematically âthin-tailedâ distribution.
(As discussed elsewhere, itâs of course mathematically possible to have a mathematically âthin-tailedâ distribution with a larger top 1% share than a mathematically âheavy-tailedâ distribution. But the above observation is about what we in fact find in the literature rather than about whatâs mathematically possible. I think the key point here is not so much that we havenât found a âthin-tailedâ distribution with larger top 1% share than some âheavy-tailedâ distribution. but that the mathematical âheavy-tailedâ property doesnât cleanly distinguish data/âdistributions by their top 1% share even in practice.)
So donât look at whether the type of distribution used is âthin-tailedâ or âheavy-tailedâ in the mathematical sense, ask how heavy-tailed in the everyday sense (as operationalized by top 1% share or whatever you care about) your data/âdistribution is.
So basically what I tried to do is mentioning that we find both mathematically thin-tailed and mathematically heavy-tailed distributions reported in the literature in order to point out that this arguably isnât the key thing to pay attention to. (But yeah I can totally see that this is not coming across in the summary as currently worded.)
As I tried to explain in my previous comment, I think the question whether performance in some domain is actually âthin-tailedâ or âheavy-tailedâ in the mathematical sense is closer to ill-posed or meaningless than true or false. Hence why I set aside the issue of whether a normal distribution or similar-looking log-normal distribution is the better model.