I think there are probably cases where you want to do tail analysis and where doing something more like arithmetic mean produces much better estimates. I quickly tried to construct a toy model of this (but I failed).
In particular, I think if you have 10 possible models of tail behavior, your prior is 10% on each, and you don’t update much between which fitting models to use based on seeing the data you have (due to limited data from the tail), then I think the right aggregation model is going to be arithmetic mean (or something close to this based on the amount of update).
The fact that the mean isn’t robust to outliers is actually the right property in the case: indeed low probabilites in the tail are dominated by outliers. (See work by Nassim Taleb for instance.)
In particular, I think if you have 10 possible models of tail behavior, your prior is 10% on each, and you don’t update much between which fitting models to use based on seeing the data you have (due to limited data from the tail), then I think the right aggregation model is going to be arithmetic mean (or something close to this based on the amount of update).
There is a sense in which I agree with the above in theory, because I think the models are mutually incompatible. The annual war deaths as a fraction of the global population cannot simultaneously follow e.g. a Pareto and lognormal distribution. However, I would say the median or other method which does not overweight extremely high predictions is better in practice. For example:
The weighted/unweighted median performed better than the weighted/unweighted mean on Metaculus’ questions.
Samotsvetyaggregated predictions differing a lot between them from 7 forecasters[1] using the geometric mean after removing the lowest and highest values.
The geometric mean, like the median, does not overweight extremely high predictions.
The more one removes extreme predictions before using the geometric mean, the closer it gets to the median.
A priori, it seems sensible to use an aggregation method that one of the most accomplished forecasting groups uses.
The fact that the mean isn’t robust to outliers is actually the right property in the case: indeed low probabilites in the tail are dominated by outliers. (See work by Nassim Taleb for instance.)
I said “I did not use the mean because it is not resistant to outliers”, but I meant “because it ignores information from extremely low predictions” (I have updated the post):
The arithmetic mean of probabilities ignores information from extreme predictions
The arithmetic mean of probabilities ignores extreme predictions in favor of tamer results, to the extent that even large changes to individual predictions will barely be reflected in the aggregate prediction.
As an illustrative example, consider an outsider expert and an insider expert on a topic, who are eliciting predictions about an event. The outsider expert is reasonably uncertain about the event, and each of them assigns a probability of around 10% to the event. The insider has priviledged information about the event, and assigns to it a very low probability.
Ideally, we would like the aggregate probability to be reasonably sensitive to the strength of the evidence provided by the insider expert—if the insider assigns a probability of 1 in 1000 the outcome should be meaningfully different than if the insider assigns a probability of 1 in 10,000 [9].
The arithmetic mean of probabilities does not achieve this—in both cases the pooled probability is around (10%+1/1,000)/2≈(10%+1/10,000)/2≈5.00%. The uncertain prediction has effectively overwritten the information in the more precise prediction.
The geometric mean of odds works better in this situation. We have that [(1:9)×(1:999)]1/2≈1:95, while [(1:9)×(1:9999)]1/2≈1:300. Those correspond respectively to probabilities of 1.04% and 0.33% - showing the greater sensitivity to the evidence the insider brings to the table.
See (Baron et al, 2014) for more discussion on the distortive effects of the arithmetic mean of probabilities and other aggregates.
I have asked Jaime Sevilla to share his thoughts. Thanks for raising this important point!
For the question “What is the unconditional probability of London being hit with a nuclear weapon in October?”, the 7 forecasts were 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, and 0.001. The largest of these is 1 M (= 0.01/10^-8) times the smallest.
Thanks for the detailed reply and for asking Jaime Sevilla!
FWIW on the Samotsvety nuclear forecasts, I’m pretty intuitively scared by that aggregation methodology and spread of numbers (as people discussed in comments on that post).
Interesting case. I can see the intuitive case for the median.
I think the mean is more appropriate—in this case, what this is telling you is that your uncertainty is dominated by the possibility of a fat tail, and the priority is ruling it out.
I am still standing by the median. I think using a weighted mean could be reasonable, but not a simple one. Even if all distributions should have the same weight on priors, and the update to the weights based on the fit to the data is pretty negligible, I would say one should put less weight on predictions further away from the median. This effect can be simply captured by aggregating the predictions using the median[1].
I have now done a graph illustrating how the mean ignores information from extremely low predictions:
More importantly, I have now noticed the high annual probabilities of extinction are associated with arguably unreasonably high probability of extinction conditional on an annual population loss of at least 10 %.
For the annual probability of a war causing human extinction to be at least 0.0122 %, which is similar to the “0.0124 %/year”[2] I inferred for Stephen’s results, the probability of a war causing human extinction conditional on it causing an annual population loss of at least 10 % has to be at least 14.8 %. I see this as super high.
“I inferred for Stephen’s results, the probability of a war causing human extinction conditional on it causing an annual population loss of at least 10 % has to be at least 14.8 %.”
This is interesting! I hadn’t thought about it that way and find this framing intuitively compelling.
That does seem high to me, though perhaps not ludicrously high. Past events have probably killed at least 10% of the global population, WWII was within an order of magnitude of that, and we’ve increased out warmaking capacity since then. So I think it would be reasonable to put that annual chance of a war killing at least 10% of the global population at at least 1%.
That could give some insight into the extinction tail, perhaps implying that my estimate was about 10x too high. That would still make it importantly wrong, but less egregiously than the many orders of magnitude you estimate in the main post?
That does seem high to me, though perhaps not ludicrously high. Past events have probably killed at least 10% of the global population, WWII was within an order of magnitude of that, and we’ve increased out warmaking capacity since then. So I think it would be reasonable to put that annual chance of a war killing at least 10% of the global population at at least 1%.
Note the 14.8 % I mentioned in my last comment refers to “the probability of a war causing human extinction conditional on it causing an annual population loss of at least 10 %”, not to the annual probability of a war causing a population loss of 10 %. I think 14.8 % for the former is super high[1], but I should note the Metaculus’ community might find it reasonable:
It is predicting:
A 5 % chance of a nuclear catastrophe causing a 95 % population loss conditional on it causing a population loss of at least 10 %.
A 10 % chance of a bio catastrophe causing a 95 % population loss conditional on it causing a population loss of at least 10 %.
I think a nuclear or bio catastrophe causing a 95 % population loss would still be far from causing extinction, so I could still belive the above suggest the probability of a nuclear or bio war causing extinction conditional on it causing a population loss of at least 10 % is much lower than 5 % and 10 %, and therefore much lower than 14.8 % too.
However, the Metaculus’ community may find extinction is fairly likely conditional on a 95 % population loss.
That could give some insight into the extinction tail, perhaps implying that my estimate was about 10x too high. That would still make it importantly wrong, but less egregiously than the many orders of magnitude you estimate in the main post?
Note “the probability of a war causing human extinction conditional on it causing an annual population loss of at least 10 %” increases quite superlinearly with the annual probability of a war causing human extinction (see graph in my last comment). So this will be too high by more than 1 OOM if the 14.8 % I mentioned is high by 1 OOM. To be precise, for the best fit distribution with a “probability of a war causing human extinction conditional on it causing an annual population loss of at least 10 %” of 1.44 %, which is roughly 1 OOM below 14.8 %, the annual probability of a war causing human extinction is 3.41*10^-7, i.e. 2.56 (= log10(1.24*10^-4/(3.41*10^-7))) OOMs lower. In reality, I suspect 14.8 % is high by many OOMs, so an astronomically low prior still seems reasonable to me.
I have just finished a draft where I get an insive view estimate of 5.53*10^-10 for the nearterm annual probability of human extinction from nuclear war, which is not too far from the best guess prior I present in the post of 6.36*10^-14. Comments are welcome, but no worries if you have other priorities! Update: I have now published the post.
I understand war extinction risk may be majorly driven by AI and bio risk rather than nuclear war. However, I have sense this is informed to a significant extent by Toby’s estimates for existential risk given in The Precipice, whereas I have found them consistently much higher than my estimates for extinction risk for the matters I have investigated. For example, in the draft I linked above, I say it is plausible extinction risk from nuclear war is similar to that from asteroids and comets.
I think there are probably cases where you want to do tail analysis and where doing something more like arithmetic mean produces much better estimates. I quickly tried to construct a toy model of this (but I failed).
In particular, I think if you have 10 possible models of tail behavior, your prior is 10% on each, and you don’t update much between which fitting models to use based on seeing the data you have (due to limited data from the tail), then I think the right aggregation model is going to be arithmetic mean (or something close to this based on the amount of update).
The fact that the mean isn’t robust to outliers is actually the right property in the case: indeed low probabilites in the tail are dominated by outliers. (See work by Nassim Taleb for instance.)
There is a sense in which I agree with the above in theory, because I think the models are mutually incompatible. The annual war deaths as a fraction of the global population cannot simultaneously follow e.g. a Pareto and lognormal distribution. However, I would say the median or other method which does not overweight extremely high predictions is better in practice. For example:
The weighted/unweighted median performed better than the weighted/unweighted mean on Metaculus’ questions.
Samotsvety aggregated predictions differing a lot between them from 7 forecasters[1] using the geometric mean after removing the lowest and highest values.
The geometric mean, like the median, does not overweight extremely high predictions.
The more one removes extreme predictions before using the geometric mean, the closer it gets to the median.
A priori, it seems sensible to use an aggregation method that one of the most accomplished forecasting groups uses.
I said “I did not use the mean because it is not resistant to outliers”, but I meant “because it ignores information from extremely low predictions” (I have updated the post):
I have asked Jaime Sevilla to share his thoughts. Thanks for raising this important point!
For the question “What is the unconditional probability of London being hit with a nuclear weapon in October?”, the 7 forecasts were 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, and 0.001. The largest of these is 1 M (= 0.01/10^-8) times the smallest.
Thanks for the detailed reply and for asking Jaime Sevilla!
FWIW on the Samotsvety nuclear forecasts, I’m pretty intuitively scared by that aggregation methodology and spread of numbers (as people discussed in comments on that post).
You are welcome!
For reference, here are Jaime’s thoughts:
I am still standing by the median. I think using a weighted mean could be reasonable, but not a simple one. Even if all distributions should have the same weight on priors, and the update to the weights based on the fit to the data is pretty negligible, I would say one should put less weight on predictions further away from the median. This effect can be simply captured by aggregating the predictions using the median[1].
I have now done a graph illustrating how the mean ignores information from extremely low predictions:
More importantly, I have now noticed the high annual probabilities of extinction are associated with arguably unreasonably high probability of extinction conditional on an annual population loss of at least 10 %.
For the annual probability of a war causing human extinction to be at least 0.0122 %, which is similar to the “0.0124 %/year”[2] I inferred for Stephen’s results, the probability of a war causing human extinction conditional on it causing an annual population loss of at least 10 % has to be at least 14.8 %. I see this as super high.
Or, if there were no null predictions, the geometric mean or probability linked to the geometric mean of the odds.
I have added this to the post now. Previously, I only had Stephen’s extinction risk per war.
“I inferred for Stephen’s results, the probability of a war causing human extinction conditional on it causing an annual population loss of at least 10 % has to be at least 14.8 %.”
This is interesting! I hadn’t thought about it that way and find this framing intuitively compelling.
That does seem high to me, though perhaps not ludicrously high. Past events have probably killed at least 10% of the global population, WWII was within an order of magnitude of that, and we’ve increased out warmaking capacity since then. So I think it would be reasonable to put that annual chance of a war killing at least 10% of the global population at at least 1%.
That could give some insight into the extinction tail, perhaps implying that my estimate was about 10x too high. That would still make it importantly wrong, but less egregiously than the many orders of magnitude you estimate in the main post?
Thanks for jumping in, Stephen!
Note the 14.8 % I mentioned in my last comment refers to “the probability of a war causing human extinction conditional on it causing an annual population loss of at least 10 %”, not to the annual probability of a war causing a population loss of 10 %. I think 14.8 % for the former is super high[1], but I should note the Metaculus’ community might find it reasonable:
It is predicting:
A 5 % chance of a nuclear catastrophe causing a 95 % population loss conditional on it causing a population loss of at least 10 %.
A 10 % chance of a bio catastrophe causing a 95 % population loss conditional on it causing a population loss of at least 10 %.
I think a nuclear or bio catastrophe causing a 95 % population loss would still be far from causing extinction, so I could still belive the above suggest the probability of a nuclear or bio war causing extinction conditional on it causing a population loss of at least 10 % is much lower than 5 % and 10 %, and therefore much lower than 14.8 % too.
However, the Metaculus’ community may find extinction is fairly likely conditional on a 95 % population loss.
Note “the probability of a war causing human extinction conditional on it causing an annual population loss of at least 10 %” increases quite superlinearly with the annual probability of a war causing human extinction (see graph in my last comment). So this will be too high by more than 1 OOM if the 14.8 % I mentioned is high by 1 OOM. To be precise, for the best fit distribution with a “probability of a war causing human extinction conditional on it causing an annual population loss of at least 10 %” of 1.44 %, which is roughly 1 OOM below 14.8 %, the annual probability of a war causing human extinction is 3.41*10^-7, i.e. 2.56 (= log10(1.24*10^-4/(3.41*10^-7))) OOMs lower. In reality, I suspect 14.8 % is high by many OOMs, so an astronomically low prior still seems reasonable to me.
I have just finished a draft where I get an insive view estimate of 5.53*10^-10 for the nearterm annual probability of human extinction from nuclear war, which is not too far from the best guess prior I present in the post of 6.36*10^-14. Comments are welcome, but no worries if you have other priorities! Update: I have now published the post.
I understand war extinction risk may be majorly driven by AI and bio risk rather than nuclear war. However, I have sense this is informed to a significant extent by Toby’s estimates for existential risk given in The Precipice, whereas I have found them consistently much higher than my estimates for extinction risk for the matters I have investigated. For example, in the draft I linked above, I say it is plausible extinction risk from nuclear war is similar to that from asteroids and comets.
I updated “quite high” in my last comment to “super high”.