AGB comments on How much does performance differ between people?

AGB 2 Apr 2021 18:31 UTC
37 points
0 ∶ 0
Hi Max and Ben, a few related thoughts below. Many of these are mentioned in various places in the doc, so seem to have been understood, but nonetheless have implications for your summary and qualitative commentary, which I sometimes think misses the mark.
- Many distributions are heavy-tailed mathematically, but not in the common use of that term, which I think is closer to ‘how concentrated is the thing into the top 0.1%/1%/etc.‘, and thus ‘how important is it I find top performers’ or ‘how important is it to attract the top performers’. For example, you write the following:
What share of total output should we expect to come from the small fraction of people we’re most optimistic about (say, the top 1% or top 0.1%) – that is, how heavy-tailed is the distribution of ex-ante performance?
- Often, you can’t derive this directly from the distribution’s mathematical type. In particular, you cannot derive it from whether a distribution is heavy-tailed in the mathematical sense.
- Log-normal distributions are particuarly common and are a particular offender here, because they tend to occur whenever lots of independent factors are multiplied together. But here is the approximate* fraction of value that comes from the top 1% in a few different log-normal distributions:
  EXP(N(0,0.0001)) → 1.02%
  EXP(N(0,0001)) → 1.08%
  EXP(N(0,0.01)) → 1.28%
  EXP(N(0,0.1)) → 2.22%
  EXP(N(0,1)) → 9.5%
- For a real-world example, geometric brownian motion is the most common model of stock prices, and produces a log-normal distribution of prices, but models based on GBM actually produce pretty thin tails in the commonsense use, which are in turn much thinner than the tails in real stock markets, as (in?)famously chronicled in Taleb’s Black Swan among others. Since I’m a finance person who came of age right as that book was written, I’m particularly used to thinking of the log-normal distribution as ‘the stupidly-thin-tailed one’, and have a brief moment of confusion every time it is referred to as ‘heavy-tailed’.
- The above, in my opinion, highlights the folly of ever thinking ‘well, log-normal distributions are heavy-tailed, and this should be log-normal because things got multiplied together, so the top 1% must be at least a few percent of the overall value’. Log-normal distributions with low variance are practically indistinguishable from normal distributions. In fact, as I understand it many oft-used examples of normal distributions, such as height and other biological properties, are actually believed to follow a log-normal distribution.
***
I’d guess we agree on the above, though if not I’d welcome a correction. But I’ll go ahead and flag bits of your summary that look weird to me assuming we agree on the mathematical facts:
By contrast, a large meta-analysis reports ‘thin-tailed’ (Gaussian) distributions for ex-post performance in less complex jobs such as cook or mail carrier [1]: the top 1% account for 3-3.7% of the total.
I haven’t read the meta-analysis, but I’d tentatively bet that much like biological properties these jobs actually follow log-normal distributions and they just couldn’t tell (and weren’t trying to tell) the difference.
These figures illustrate that the difference between ‘thin-tailed’ and ‘heavy-tailed’ distributions can be modest in the range that matters in practice
I agree with the direction of this statement, but it’s actually worse than that: depending on the tail of interest “heavy-tailed distributions” can have thinner tails than “thin-tailed distributions”! For example, compare my numbers for the top 1% of various log-normal distributions to the right-hand-side of a standard N(0,1) normal distribution where we cut off negative values (~3.5% in top 1%).
It’s also somewhat common to see comments like this from 80k staff (This from Ben Todd elsewhere in this thread):
You can get heavy tailed outcomes if performance is the product of two normally distributed factors (e.g. intelligence x effort).
You indeed can, but like the log-normal distribution this will tend to have pretty thin tails in the common use of the term. For example, multipling two N(100,225) distributions together, chosen because this is roughly the distribution of IQ, gets you a distribution where the top 1% account for 1.6% of the total. Looping back to my above thought, I’d also guess that performance on jobs like cook and mail-carrier look very close to this, and empirically were observed to have similarly thin tails (aptitude x intelligence x effort might in fact be the right framing for these jobs).
***
Ultimately, the recommendation I would give is much the same as the bottom line presented, which I was very happy to see. Indeed, I’m mostly grumbling because I want to discourage anything which treats heavy-tailed as a binary property**, as parts of the summary/commentary tend to, see above.
Some advice for how to work with these concepts in practice:
- In practice, don’t treat ‘heavy-tailed’ as a binary property. Instead, ask how heavy the tails of some quantity of interest are, for instance by identifying the frequency of outliers you’re interested in (e.g. top 1%, top 0.1%, …) and comparing them to the median or looking at their share of the total. [2]
- Carefully choose the underlying population and the metric for performance, in a way that’s tailored to the purpose of your analysis. In particular, be mindful of whether you’re looking at the full distribution or some tail (e.g. wealth of all citizens vs. wealth of billionaires).
*Approximate because I was lazy and just simulated 10000 values to get these and other quoted numbers. AFAIK the true values are not sufficiently different to affect the point I’m making.
**If it were up to me, I’d taboo the term ‘heavy-tailed’ entirely, because having an oft-used term whose mathematical and commonsense notions differ is an obvious recipe for miscommunication in a STEM-heavy community like this one.
What links here?
- EA Forum Prize: Winners for March 2021 by Aaron Gertler (22 May 2021 4:34 UTC; 26 points)
- Max_Daniel 3 Apr 2021 10:22 UTC
  4 points
  0 ∶ 0
  Parent
  Yeah, I think we agree on the maths, and I’m quite sympathetic to your recommendations regarding framing based on this. In fact, emphasizing “top x% share” as metric and avoiding any suggestion that it’s practically useful to treat “heavy-tailed” as a binary property were my key goals for the last round of revisions I made to the summary—but it seems like I didn’t fully succeed.
  FWIW, I maybe wouldn’t go quite as far as you suggest in some places. I think the issue of “mathematically ‘heavy-tailed’ distributions may not be heavy-tailed in practice in the everyday sense” is an instance of a broader issue that crops up whenever one uses mathematical concepts that are defined in asymptotic terms in applied contexts.
  To give just one example, consider that we often talk of “linear growth”, “exponential growth”, etc. I think this is quite useful, and that it would overall be bad to ‘taboo’ these terms and always replace them with some ‘model-agnostic’ metric that can be calculated for finitely many data points. But there we have the analog issue that depending on the parameters an e.g. exponential function can for practical purposes look very much like a linear function over the relevant finite range of data.
  Another example would be computational complexity, e.g. when we talk about algorithms being “polynomial” or “exponential” regarding how many steps they require as function of the size of their inputs.
  Yet another example would be attractors in dynamical systems.
  In these and many other cases we encounter the same phenomenon that we often talk in terms of mathematical concepts that by definition only tell us that some property holds “eventually”, i.e. in the limit of arbitrarily long amounts of time, arbitrarily much data, or similar.
  Of course, being aware of this really is important. In practice it often is crucial to have an intuition or more precise quantitative bounds on e.g. whether we have enough data points to be able to use some computational method that’s only guaranteed to work in the limit of infinite data. And sometimes we are better off using some algorithm that for sufficiently large inputs would be slower than some alternative, etc.
  But on the other hand, talk in terms of ‘asymptotic’ concepts often is useful as well. I think one reason for why is that in practice when e.g. we say that something “looks like a heavy-tailed distribution” or that something “looks like exponential growth” we tend to mean “the top 1% share is relatively large / it would be hard to fit e.g. a normal distribution” or “it would be hard to fit a straight line to this data” etc., as opposed to just e.g. “there is a mathematically heavy-tailed distribution that with the right parameters provides a reasonable fit” or “there is an exponential function that with the right parameters provides a reasonable fit”. That is, the conventions for the use of these terms are significantly influenced by “practical” considerations (and things like Grice’s communication maxims) rather than just their mathematical definition.
  So e.g. concretely when in practice we say that something is “log-normally distributed” we often do mean that it looks more heavy-tailed in the everyday sense than a normal distribution (even though it is a mathematical fact that there are log-normal distributions that are relatively thin-tailed in the everyday sense—indeed we can make most types of distributions arbitrarily thin-tailed or heavy-tailed in this sense!).
  - AGB 3 Apr 2021 11:27 UTC
    21 points
    0 ∶ 0
    Parent
    So taking a step back for a second, I think the primary point of collaborative written or spoken communication is to take the picture or conceptual map in my head and put it in your head, as accurately as possible. Use of any terms should, in my view, be assessed against whether those terms are likely to create the right picture in a reader’s or listener’s head. I appreciate this is a somewhat extreme position.
    If everytime you use the term heavy-tailed (and it’s used a lot—a quick CTRL + F tells me it’s in the OP 25 times) I have to guess from context whether you mean the mathematical or commonsense definitions, it’s more difficult to parse what you actually mean in any given sentence. If someone is reading and doesn’t even know that those definitions substantially differ, they’ll probably come away with bad conclusions.
    This isn’t a hypothetical corner case—I keep seeing people come to bad (or at least unsupported) conclusions in exactly this way, while thinking that their reasoning is mathematically sound and thus nigh-incontrovertible. To quote myself above:
    The above, in my opinion, highlights the folly of ever thinking ‘well, log-normal distributions are heavy-tailed, and this should be log-normal because things got multiplied together, so the top 1% must be at least a few percent of the overall value’.
    If I noticed that use of terms like ‘linear growth’ or ‘exponential growth’ were similarly leading to bad conclusions, e.g. by being extrapolated too far beyond the range of data in the sample, I would be similarly opposed to their use. But I don’t, so I’m not.
    If I noticed that engineers at firms I have worked for were obsessed with replacing exponential algorithms with polynomial algorithms because they are better in some limit case, but worse in the actual use cases, I would point this out and suggest they stop thinking in those terms. But this hasn’t happened, so I haven’t ever done so.
    I do notice that use of the term heavy-tailed (as a binary) in EA, especially with reference to the log-normal distribution, is causing people to make claims about how we should expect this to be ‘a heavy-tailed distribution’ and how important it therefore is to attract the top 1%, and so...you get the idea.
    Still, a full taboo is unrealistic and was intended as an aside; closer to ‘in my ideal world’ or ‘this is what I aim for my own writing’, rather than a practical suggestion to others. As I said, I think the actual suggestions made in this summary are good—replacing the question ‘is this heavy-tailed or not’ with ‘how heavy-tailed is this’ should do the trick- and hope to see them become more widely adopted.
    - Max_Daniel 3 Apr 2021 16:31 UTC
      4 points
      0 ∶ 0
      Parent
      I’m not sure how extreme your general take on communication is, and I think at least I have a fairly similar view.
      I agree that the kind of practical experiences you mention can be a good reason to be more careful with the use of some mathematical concepts but not others. I think I’ve seen fewer instances of people making fallacious inferences based on something being log-normal, but if I had I think I might have arrived at similar aspirations as you regarding how to frame things.
      (An invalid type of argument I have seen frequently is actually the “things multiply, so we get a log-normal” part. But as you have pointed out in your top-level comment, if we multiply a small number of thin-tailed and low-variance factors we’ll get something that’s not exactly a ‘paradigmatic example’ of a log-normal distribution even though we could reasonably approximate it with one. On the other hand, if the conditions of the ‘multiplicative CLT’ aren’t fulfilled we can easily get something with heavier tails than a log-normal. See also fn26 in our doc:
      We’ve sometimes encountered the misconception that products of light-tailed factors always converge to a log-normal distribution. However, in fact, depending on the details the limit can also be another type of heavy-tailed distribution, such as a power law (see, e.g., Mitzenmacher 2004, sc. 5-7 for an accessible discussion and examples). Relevant details include whether there is a strictly positive minimum value beyond which products can’t fall (ibid., sc. 5.1), random variation in the number of factors (ibid., sc. 7), and correlations between factors.
      )
- Max_Daniel 3 Apr 2021 10:51 UTC
  2 points
  0 ∶ 0
  Parent
  As an aside, for a good and philosophically rigorous criticism of cavalier assumptions of normality or (arguably) pseudo-explanations that involve the central limit theorem, I’d recommend Lyon (2014), Why are Normal Distributions Normal?
  Basically I think that whenever we are in the business of understanding how things actually work/”why” we’re seeing the data distributions we’re seeing, often-invoked explanations like the CLT or “multiplicative” CLT are kind of the tip of the iceberg that provides the “actual” explanation (rather then being literally correct by themselves), this iceberg having to do with the principle of maximum entropy / the tendency for entropy to increase / ‘universality’ and the fact that certain types of distributions are ‘attractors’ for a wide range of generating processes. I’m too much of an ‘abstract algebra person’ to have a clear sense of what’s going on, but I think it’s fairly clear that the folk story of “a lot of things ‘are’ normally distributed because of ‘the’ central limit theorem” is at best an ‘approximation’ and at worst misleading.
  (One ‘mathematical’ way to see this is that it’s fishy that there are so many different versions of the CLT rather than one clear ‘canonical’ or ‘maximally general’ one. I guess stuff like this also is why I tend to find common introductions to statistics horribly unaesthetic and have had a hard time engaging with them.)
- Max_Daniel 3 Apr 2021 10:38 UTC
  2 points
  0 ∶ 0
  Parent
  I haven’t read the meta-analysis, but I’d tentatively bet that much like biological properties these jobs actually follow log-normal distributions and they just couldn’t tell (and weren’t trying to tell) the difference.
  I kind of agree with this (and this is why I deliberately said that “they report a Gaussian distribution” rather than e.g. “performance is normally distributed”). In particular, yes, they just assumed a normal distribution and then ran with this in all cases in which it didn’t lead to obvious problems/bad fits no matter the parameters. They did not compare Gaussian with other models.
  I still think it’s accurate and useful to say that they were using (and didn’t reject) a normal distribution as model for low- and medium-complexity jobs as this does tell you something about how the data looks like. (Since there is a lot of possible data where no normal distribution is a reasonable fit.)
  I also agree that probably a log-normal model is “closer to the truth” than a normal one. But on the other hand I think it’s pretty clear that actually neither a normal nor a log-normal model is fully correct. Indeed, what would it mean that “jobs actually follow a certain type of distribution”? If we’re just talking about fitting a distribution to data, we will never get a perfect fit, and all we can do is providing goodness-of-fit statistics for different models (which usually won’t conclusively identify any single one). This kind of brute/naive empiricism just won’t and can’t get us to “how things actually work”. On the other hand, if we try to build a model of the causal generating mechanism of job performance it seems clear that the ‘truth’ will be much more complex and messy—we will only have finitely many contributing things (and a log-normal distribution is something we’d get at best “in the limit”), the contributing factors won’t all be independent etc. etc. Indeed, “probability distribution” to me basically seems like the wrong type to talk about when we’re in the business of understanding “how things actually work”—what we want then is really a richer and more complex model (in the sense that we could have several different models that would yield the same approximate data distribution but that would paint a fairly different picture of “how things actually work”; basically I’m saying that things like ‘quantum mechanics’ or ‘the Solow growth model’ or whatever have much more structure and are not a single probability distribution).
  What links here?
  - Max_Daniel's comment on How much does performance differ between people? by Max_Daniel (3 Apr 2021 16:20 UTC; 2 points)
  - AGB 3 Apr 2021 10:55 UTC
    4 points
    0 ∶ 0
    Parent
    Briefly on this, I think my issue becomes clearer if you look at the full section.
    If we agree that log-normal is more likely than normal, and log-normal distributions are heavy-tailed, then saying ‘By contrast, [performance in these jobs] is thin-tailed’ is just incorrect? Assuming you meant the mathematical senses of heavy-tailed and thin-tailed here, which I guess I’m not sure if you did.
    
    This uncertainty and resulting inability to assess whether this section is true or false obviously loops back to why I would prefer not to use the term ‘heavy-tailed’ at all, which I will address in more detail in my reply to your other comment.
    Ex-post performance appears ‘heavy-tailed’ in many relevant domains, but with very large differences in how heavy-tailed: the top 1% account for between 4% to over 80% of the total. For instance, we find ‘heavy-tailed’ distributions (e.g. log-normal, power law) of scientific citations, startup valuations, income, and media sales. By contrast, a large meta-analysis reports ‘thin-tailed’ (Gaussian) distributions for ex-post performance in less complex jobs such as cook or mail carrier
    - Max_Daniel 3 Apr 2021 16:20 UTC
      2 points
      0 ∶ 0
      Parent
      I think the main takeaway here is that you find that section confusing, and that’s not something one can “argue away”, and does point to room for improvement in my writing. :)
      With that being said, note that we in fact don’t say anywhere that anything ‘is thin-tailed’. We just say that some paper ‘reports’ a thin-tailed distribution, which seems uncontroversially true. (OTOH I can totally see that the “by contrast” is confusing on some readings. And I also agree that it basically doesn’t matter what we say literally—if people read what we say as claiming that something is thin-tailed, then that’s a problem.)
      FWIW, from my perspective the key observations (which I apparently failed to convey in a clear way at least for you) here are:
      The top 1% share of ex-post “performance” [though see elsewhere that maybe that’s not the ideal term] data reported in the literature varies a lot, at least between 3% and 80%. So usually you’ll want to know roughly where on the spectrum you are for the job/task/situation relevant to you rather than just whether or not some binary property holds.
      The range of top 1% shares is almost as large for data for which the sources used a mathematically ‘heavy-tailed’ type of distribution as model. In particular, there are some cases where we some source reports a mathematically ‘heavy-tailed’ distribution but where the top 1% share is barely larger than for other data based on a mathematically ‘thin-tailed’ distribution.
      (As discussed elsewhere, it’s of course mathematically possible to have a mathematically ‘thin-tailed’ distribution with a larger top 1% share than a mathematically ‘heavy-tailed’ distribution. But the above observation is about what we in fact find in the literature rather than about what’s mathematically possible. I think the key point here is not so much that we haven’t found a ‘thin-tailed’ distribution with larger top 1% share than some ‘heavy-tailed’ distribution. but that the mathematical ‘heavy-tailed’ property doesn’t cleanly distinguish data/distributions by their top 1% share even in practice.)
      So don’t look at whether the type of distribution used is ‘thin-tailed’ or ‘heavy-tailed’ in the mathematical sense, ask how heavy-tailed in the everyday sense (as operationalized by top 1% share or whatever you care about) your data/distribution is.
      So basically what I tried to do is mentioning that we find both mathematically thin-tailed and mathematically heavy-tailed distributions reported in the literature in order to point out that this arguably isn’t the key thing to pay attention to. (But yeah I can totally see that this is not coming across in the summary as currently worded.)
      As I tried to explain in my previous comment, I think the question whether performance in some domain is actually ‘thin-tailed’ or ‘heavy-tailed’ in the mathematical sense is closer to ill-posed or meaningless than true or false. Hence why I set aside the issue of whether a normal distribution or similar-looking log-normal distribution is the better model.