Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Neat post, and nice to see squiggle in the wild.
Some points
You could create a mixture distribution, you could fit a lognormal whose x% confidence interval is the range expressed by the points you’ve already found, you could use your subjective judgment to come up with a distribution which could fit it, you could use kernel density estimation (https://en.wikipedia.org/wiki/Kernel_density_estimation).
In your number of habitable planets estimate, you have a planetsPerHabitablePlanet estimate. This is an interesting decomposition. I would have looked at the fraction of planets which are habitable, and probably fit a beta distribution to it, given that we know that the fraction is between 0 and 1. This seems a bit like a matter of personal taste, though.
I agree with your claim that lognormal distributions are a better choice than normal. However this doesn’t explain whether another distribution might be better (especially in cases where data is scarce, such as the number of inhabitable planets).
For example, the power law distribution has some theoretical arguments in its favour and also has a significantly higher kurtosis, meaning there is a much fatter tail.
Thanks. I’ll read up on the power law dist and at the very least put a disclaimer in: I’m only checking which is better out of normal/lognormal.
Greater remark, Sanjay! Great piece, Stan!
Related to which type of distribution is better, in this episode of The 80,000 Hours Podcast (search for “So you mentioned, kind of, the fat tail-ness of the distribution.”), David Roodman suggests using the generalised Pareto distribution (GPD) to model the right tail (which often drives the expected value). David mentions the right tails of normal, lognormal and power law distributions are particular cases of the GDP:
So, fitting the right-tail empirical data to a GPD is arguably better than assuming (or fitting the data to) one particular type of distribution:
Cool. To be clear, I think if anyone was reading your piece with any level of care or attention, it would be clear that you were comparing normal and lognormal, and not making any stronger claims than that.
Nice post, Stan!
I think one caveat here is that, if we want to obtain an expected value as output, the input point estimates should refer to the mean instead of the median. They are the same or similar for non-heavy-tailed distributions (like uniform or normal), but could differ a lot for heavy-tailed ones (like exponential or lognormal). When setting a lognormal to a point estimate, I think people often use the geometric mean between 2 percentiles (e.g. 5th and 95th percentiles), which corresponds to the median, not mean. Using the median in this case will underestimate the expected value, because it equals (see here):
E(X) = Median(X)*e^(sigma^2/2), where sigma^2 is the variance of log(X).
Here you mention that this “lognormal mean” can lead to extreme results, but I think that is a feature as long as we think the lognormal is modelling the right tail correctly. If we do not think so, we can still use the mean of:
Truncated lognormal distribution.
Minimum between a lognormal distribution and a maximum value (after which we think the lognormal no longer models the right tail well).
In my mind:
Being objective is about faithfully representing the information we have about reality, even if that means being more uncertain.
The evidence base being sparse suggests we are uncertain about what reality actually looks like, which means a faithful representation of it will more easily be achieved by intervals, not point estimates. For example, I think using interval estimates in the Drake equation in much more important that in the cost-effectiveness analyses of GiveWell’s top charities.
One compromise to achieve transparency while mainting the benefits of interval estimates is using pessimistic, realistic and optimistic point estimates. One the one hand, this may result in wider intervals because the product between 2 5th percentiles is rarer than a 5th percentile, so the pessimistic final estimate will be more pessimistic than its inputs. On the other hand, we can think as the wider intervals as accounting for structural uncertainty of the model.
Thanks for your feedback, Vasco. It’s led me to make extensive changes to the post:
More analysis on the pros/cons of modelling with distributions. I argue that sometimes it’s good that the crudeness of point-estimate work reflects the crudeness of the evidence available. Interval-estimate work is more honest about uncertainty, but runs the risk of encouraging overconfidence in the final distribution.
I include the lognormal mean in my analysis of means. You have convinced me that the sensitivity of lognormal means to heavy right tails is a strength, not a weakness! But the lognormal mean appears to be sensitive to the size of the confidence interval you use to calculate it—which means subjective methods are required to pick the size, introducing bias.
Overall I agree that interval estimation is better suited to the Drake equation than to GiveWell CEAs. But I’d summarise my reasons as follows:
The Drake Equation really seeks to ask “how likely is it that we have intelligent alien neighbours?”, but point-estimate methods answer the question “what is the expected number of intelligent alien neighbours?”. With such high variability the expected number is virtually useless, but the distribution of this number allows us to estimate the number of alien neighbours. GiveWell CEAs probably have much less variation and hence a point-estimate answer is relatively more useful
Reliable research on the numbers that go into the Drake equation often doesn’t exist, so it’s not too bad to “make up” interval estimates to go into it. We know much more about the charities GiveWell studies, so made-up distributions (even those informed by reliable point-estimates) are much less permissible.
Thanks again, and do let me know what you think!
Nice, thanks for the update!
Yes, but only as long as we think the heavy right tail is being accurately modelled! Jaime Sevilla has this post on which methods to use to aggregate forecasts.
I think it is worth flagging that risk, but I would say:
In general, if a given method is more accurate, it seems reasonable to follow that method everything else equal.
One can always warn about not overweighting results estimated with intervals.
Intuitively, there seems to be much higher risk of being overconfident about a point estimate than about a mean estimated with intervals together with a confidence interval. For example, regarding Toby Ord’s best guess given in Table 6.1 of The Precipice for the existential risk from nuclear war between 2021 and 2120, I think it is easier to be overconfident about A than B:
A. 0.1 %.
B. 0.1 % (90 % confidence interval, 0.03 % to 0.3 %). Toby mentions that:
“There is significant uncertainty remaining in these estimates and they should be treated as representing the right order of magnitude—each could easily be a factor of 3 higher or lower”.
Yes, for the same median, the wider the interval, the greater the mean. If one is having a hard time linking 2 given estimates to a confidence interval, one can try the narrowest and widest reasonable intervals, and see if the lognormal mean will vary a lot.
I think people with knowledge about GiveWell’s cost-effectiveness analyses would be able to come up with reasonable distributions. A point estimate is equivalent to assigning probability 1 to that estimate, and 0 to all other outcomes, so it is easy to come up with something better (although it may well not be worth the effort).
Thanks again!
I think I have been trying to portray the point-estimate/interval-estimate trade-off as a difficult decision, but probably interval estimates are the obvious choice in most cases.
So I’ve re-done the “Should we always use interval estimates?” section to be less about pros/cons and more about exploring the importance of communicating uncertainty in your results. I have used the Ord example you mentioned.
Makes sense, thanks!
Hi Stan,
One way of getting around this is transforming all divisions into multiplications. For example, one can calculate E(X/Y) from E(X)*E(1/Y) (assuming independence), instead of using E(X)/E(Y). Computing E(1/Y) will require using Guesstimate or similar, but then the mean can be used in a spreadsheet without having to run a full Monte Carlo simulation, which would take longer.
I am not sure, but I think a similar approach can be followed for most estimates. For example, one can use Guesstimate to obtain E(X^alpha) or E(log(X)), instead of using E(X)^alpha or log(E(X)).
Using intervals is still useful to get ranges for the outputs in a principled way, but I wonder whether the expected value alone is enough. I think expected utility is all that matters, so there is a sense in which the expected value captures all the relevant information.
I suppose I have been using interval estimates because I think they can inform how much the expected value might change in response to new information, which is useful to know. However, I am not confident uncertainty, which is what can be directly observed from the outputted intervals, is a good proxy for resilience.
I think I have come to believe assessing resilience doing a sensitivity analysis with point estimates derived from distributions is usually better than trying to evaluate it based on the uncertainty of the final result.
Note that 1/Y is generally not well defined when Y’s range contains 0, and it’s messy when it approaches it, and when both X and Y contains both positive and negative parts. My preferred solution is to either look at Xs and Ys that are both positive, or to look at the joint pdf of X and Y, rather than the sum.
Hi Nuño,
Nice points!
I agree. Just one note, I think a distribution for Y which encompasses 0 cannot be correct, because it would lead to infinities, which I am happy to reject. Can you give some examples in which Y (i.e. a distribution in the denominator) is defined such that it could not be zero, but you still found messiness?
For this case, one can get point estimates from:
E(X) = P(X > 0)*E(X | X > 0) + P(X < 0)*E(X | X < 0).
E(1/Y) = P(Y > 0)*E(1/Y | Y > 0) + P(Y < 0)*E(1/Y | Y < 0).
This may not buy you enough. E.g., sometimes you may want to calculate the $/life saved, where life saved is a distribution which could be 0.
I think that in practice you (almost) always want to calculate lives/$, not $/life, and the cost is practically never zero
Hi Lorenzo,
Yes, I prefer to calculate the cost-effectiveness in terms of benefits per unit cost. This way, the expected cost-effectiveness can be multiplied by the cost to obtain the expected benefits. In contrast, the cost cannot be divided by the expected cost per unit benefit to obtain the expected benefits.
Another advantage of benefits per unit cost is that they always increase with the goodness of the intervention, whereas the cost per unit benefit has a more confusing relationship (when it can be both positive and negative).
Yes, I do not think the cost can be zero. Even if the monetary cost is zero, there are always time costs.