Thanks for this post, I’m glad to see more attention on what I believe is an important topics (see also here).
I’m actually about to publish a related piece I wrote with Ben Todd specifically on whether the distribution of impact-related metrics across people is heavy-tailed. So this post is a nice complement. I particularly appreciate the specific data sets you point to.
For now I’m just going to make a couple of quick points, split between comments to allow for a more organized discussion.
--
One thing I would highly encourage you to do if you haven’t already is to take a look at Power-law distributions in empirical data by Clauset et al. In particular, as explained in this paper:
I’m not sure what exactly you did when saying “The (least-squares estimate of the) power parameter p equals 0,6 for the education interventions, 1,02 for the health interventions and 1,8 for the climate policies.”, but note that one needs to be careful here. In particular, doing a standard linear regression / least-squares fit in a log-log plot is generally not a valid way of estimating the power law parameters. You need to use something like maximum likelihood estimation.
More broadly, I think you put too much emphasis on power laws specifically, as opposed to just saying that many cost-effectiveness distributions look very heavy-tailed. I.e. I’d recommend describing the key observation in a way that’s more agnostic between power laws, lognormal, and other heavy-tailed distributions. It is very hard to empirically distinguish these distributions from one another. This doesn’t matter much if you just want to describe the data we’ve seen, but can make a large difference when extrapolating beyond the range of observed data.
I’m aware that you do mention that if something looks like an approximately straight line in a log-log plot it could be either a power law or a log normal. I just think this and related points would best be more prominent and also be visible in the high-level framing.
Thanks for this post, I’m glad to see more attention on what I believe is an important topics (see also here).
I’m actually about to publish a related piece I wrote with Ben Todd specifically on whether the distribution of impact-related metrics across people is heavy-tailed. So this post is a nice complement. I particularly appreciate the specific data sets you point to.
For now I’m just going to make a couple of quick points, split between comments to allow for a more organized discussion.
--
One thing I would highly encourage you to do if you haven’t already is to take a look at Power-law distributions in empirical data by Clauset et al. In particular, as explained in this paper:
I’m not sure what exactly you did when saying “The (least-squares estimate of the) power parameter p equals 0,6 for the education interventions, 1,02 for the health interventions and 1,8 for the climate policies.”, but note that one needs to be careful here. In particular, doing a standard linear regression / least-squares fit in a log-log plot is generally not a valid way of estimating the power law parameters. You need to use something like maximum likelihood estimation.
More broadly, I think you put too much emphasis on power laws specifically, as opposed to just saying that many cost-effectiveness distributions look very heavy-tailed. I.e. I’d recommend describing the key observation in a way that’s more agnostic between power laws, lognormal, and other heavy-tailed distributions. It is very hard to empirically distinguish these distributions from one another. This doesn’t matter much if you just want to describe the data we’ve seen, but can make a large difference when extrapolating beyond the range of observed data.
I’m aware that you do mention that if something looks like an approximately straight line in a log-log plot it could be either a power law or a log normal. I just think this and related points would best be more prominent and also be visible in the high-level framing.