Unravelling the Mystery of Distributions of Impact: Power law or Lognormal?

Summary: Cost-effectiveness is believed to follow a heavy-tailed distribution. But we don’t know how heavy the tail is. E.g. it could be lognormally distributed (which is quite heavy tailed) or power law distributed (which is more heavy tailed). This information could help estimating the cost-effectiveness of future interventions and inform the explore-exploit tradeoff. This post delves into:

  • The background and motivation for our work

  • Our future research plans, including how we plan to leverage concepts such as entropy to gain insight

About this post

This post is not a full-fledged piece of SoGive research, but more akin to a research proposal or pre-publication notice. We plan to do this more often with our work so that we can get useful feedback before we have done too much work, to tackle publication bias, and so that when work gets started and isn’t finished, others can be aware that some unfinished thoughts are available.

This post was written with help from GPT-4.

Thanks to other members of the SoGive community for their support on this work, notably Vasco Grilo for modelling the multiplier on the 99.9th percentile and to Rebca van de Ven for earlier work on this topic.

Background

Recent work by Ben Todd suggests that the way impact is distributed between charities likely follows a heavy-tailed distribution. However the exact form of this distribution—lognormal, power law, or something else—remains unclear.

What do we mean by the distribution of impact or cost-effectiveness?

If you take everything in a large reference class, e.g. charities or impact investing opportunities or people, and assess the cost-effectiveness or impact or talent of each, what does that distribution look like?

Motivation

Lognormal and power law distributions look quite different in the tails. The tails matter because EA research seeks to find the very best interventions, not just those in the middle of the distribution.

To express this in more concrete terms, imagine that the EA community has already invested a certain amount of effort in exploring some charitable interventions and we have already found some which look high impact compared to others. How likely do we think it is that further research will yield something that is even better?

Although the previous paragraph referenced the split of impact between charitable interventions, the research we plan to do is abstract enough that it could be applied to other things too, e.g. the split of impact between impact investing opportunities, or the split of talent between people.

When considering the explore-exploit trade-off, two things are needed for the “explore” option (i.e. further research) to be favoured: (1) for the underlying distribution to actually be heavy tailed (2) for us to be able to identify the high impact things based on our research capabilities. Item (1) is about the “territory” and item (2) is about our ability to “map” it. To manage the scope of our research, we are only focusing on (1).

We believe that the work done by Ben Todd adds a lot of value by examining relevant data. However data, by its nature, is not good at capturing the tails. In order to complete the Bayesian picture, our aim is to capture the priors as well, which we believe should give relevant insight about the tails.

We will focus on the lognormal and power law distributions, since those two distributions seem to have been mentioned the most in the literature that we’ve seen thus far.

The difference between a lognormal and power law distribution is material. To illustrate this, imagine you have a dataset which doesn’t include much data about extreme values (as is normal for datasets). If you extrapolate using either a power law or lognormal assumption, the results can differ significantly. Our calculations suggest that if the dataset captured data up to the 99th percentile, then an estimate of the 99.9th percentile value using a power law distribution would be 189 times larger than the estimate of the 99.9th percentile value using a lognormal distribution.[1] This difference is due to the higher kurtosis of the power law distribution.

Our next steps

We plan to explore the following:

  • Whether the distribution of impact between charities seems more likely to have the features associated with lognormal distributions or power law distributions

    • If something is the product of lots of similar underlying processes multiplied together, then we would expect to have a lognormal distribution.

    • If we have reason to believe that something should be scale-invariant (i.e., like a fractal it (in some sense) “looks the same” no matter how much you zoom in or out), then we would expect it to have a power law distribution.

  • How the concept of entropy might help us: a good prior for a distribution is (at least some of the time) one with high entropy. We will review both information theoretic conceptions of entropy and thermodynamic conceptions of entropy.

    • We have encountered a paper which employs a thermodynamics concept of entropy (Kafri 2016), and this notion of entropy appears to lead to a power law distribution.

    • However it appears that applying an information-theoretic approach to entropy could lead to a wide range of possible probability distributions, depending on which constraints are most appropriate. This is described in this paper by Keith Conrad and illustrated with the long list of maximum entropy probability distributions listed on the relevant wikipedia page.

  • As well as the academic literature mentioned above, we will also be cross-referencing against relevant literature from within the EA community:

    • Benjamin Todd’s aforementioned article that analysed the distribution of the cost-effectiveness of different interventions across several problem areas.

    • Max Daniel and Benjamin Todd’s post on how much performance varies amongst people.

    • A piece by Stijn which also referred to this topic.

In addition, we are aware of the following relevant literature:

  • Clauset et al 2009 explores why power law distributions appear in empirical data, and also seems quite relevant to our research question

  • We have skimmed this Terry Tao blog post and found it interesting. It was also useful because the comments to that post led us to Kafri 2016, which may end up containing some of the central ideas in our research. We found out about it because Max Daniel referenced it on the EA Forum.

  • Newman 2005 clearly explains a number of key concepts relating power law distributions

  • Sneppen and Newman 1996 build an interesting model. It consists of N “agents”, which could be grains of sand in a sandpile, or species in an evolving ecosystem, and those agents are subject to “stresses” η(t) at each time-step t. Each agent possesses a threshold of tolerance, above which it does out or moves on. The paper shows that this model has power law outcomes. It appeared to refer to the concept of critical phenomena, however on more careful reading it appears that the root cause of the power law distributions seen in these models is not related to critical phenomena, so we don’t plan to focus on critical phenomena.

  • Max Daniel also has lots of other very useful content in a shortform which predates his work on the distribution of talent, which is essentially a fairly useful linkdump in its own right.

    • This includes multiple useful sources, such as Brian Tomasik’s early comment on the topic, Kokotajlo and Oprea’s 2020 paper (which Vasco has also mentioned on the EA Forum recently), and Tobias Baumann’s suffering-focused exploration of this question.

  • We learned from Max’s shortform that Linch started some work on this topic, but (as far as we’re aware) it’s still incomplete. It seems he was leaning away from the power law distribution being likely.

  • This article about Zipf’s law for Atlas models appeared, at first glance, to explain why Zipfian or power law distributions are so widespread. However, it now looks like it explains why one of the parameters in a power law distribution is likely to be one, which still begs the question of why it’s a power law in the first place. For this reason, we plan to deprioritise this article.

We would love to hear of any other potentially relevant literature, or ideas that may be worth exploring!

  1. ^

    We generated a Pareto distribution with 103 points up to the 99th percentile. We then fitted a lognormal distribution to the generated Pareto distribution. We then took points at the 99.9th, 99.99th and 99.999th percentile for the Pareto and fitted lognormal distribution. Taking the value of the Pareto as a fraction of the fitted lognormal at these percentiles gives us 189×, 1400× and 10760× respectively.