We have been thinking about aggregation methods a lot here at CEARCH, and our views on it are evolving. A few months ago we switched to using the geometric mean as our default aggregation method—although we are considering switching to the geometric mean of odds for probabilities, based on Simon’s M persuasive post that you referenced (although in many cases the difference is very small).
Cool!
Firstly I’d like to say that our main weakness on the nuclear winter probability is a lack of information. Experts in the field are not forthcoming on probabilities, and most modeling papers use point-estimates and only consider one nuclear war scenario.
Right, I wish experts were more transparent about their best guesses and uncertainty (accounting for the limitations of their studies).
One of my top priorities as we take this project to the “Deep” stage is to improve on this nuclear winter probability estimate. This will likely involve asking more experts for inside views, and exploring what happens to some of the top models when we introduce some uncertainty at each stage.
Nice to know there is going to be more analysis! I think one important limitation of your current model, which I would try to eliminate in further work, is that it relies on the vague concept of nuclear winter to define the climatic effects. You calculate the expected mortality multiplying:
Probability of a large nuclear war.
Probability of nuclear winter if there is a large nuclear war.
Expected mortality if there is a nuclear winter.
However, I believe it is better to rely on a more precise concept to assess the climatic effects, namely the amount of soot injected into the stratosphere, or the mean drop in global temperature over a certain period (e.g. 2 years) after the nuclear war. In my analysis, I relied on the amount of soot, estimating the expected famine deaths due to the climatic effects multiplying:
Probability of a large nuclear war.
Expected soot injection into the stratosphere if there is a large nuclear war.
Expected famine deaths due to the climatic effects for the expected soot injection into the stratosphere.
Ideally, I would get the expected famine deaths multiplying:
Probability of a large nuclear war.
Expected famine deaths if there is a large nuclear war. To obtain the distribution of the famine deaths, I would:
Define a logistic function describing the famine deaths as a function of the soot injected into the stratosphere (or, even better, the mean drop in global temperature over a certain period). In my analysis, I approximated the logistic function as a piecewise linear function.
Input into the above function a distribution for the soot injected into the stratosphere if there is a large nuclear war (or, even better, the mean drop in global temperature over a certain period if there is a large nuclear war). To obtain this soot distribution, I would:
Define a function describing the soot injected into the stratosphere as a function of the number of offensive nuclear detonations.
Input into the above function a distribution for the number of offensive nuclear detonations if there is a large nuclear war.
Luisa followed something like the above, although I think her results are super pessimistic.
I think you are generally right that we should go with the method that works the best on relatively large forecasting datasets like Metaculus. In this case I think there is a bit more room for personal discretion, given that I am working from only three forecasts, where one is more than two orders of magnitude smaller than the others.
Fair point, there is no data on which method is best when we are just aggregating 3 forecasts. That being said:
A priori, it seems reasonable to assume that the best method for large samples is also the best method for small samples.
Samotsvetyaggregated predictions differing a lot between them from 7 forecasters[1], and still used a modified version of the geometric mean, which ensures predictions smaller than 10 % of the mean are not ignored. A priori, it seems sensible to use an aggregation method that one of the most accomplished forecasting groups uses.
I feel that in this situation—some experts think nuclear winter is an almost-inevitable consequence for large-scale nuclear war, others think it is very unlikely—it would just feel unjustifiably confident to conclude that the probability is only 2%. Especially since two of these three estimates are in-house estimates.
I think there is a natural human bias towards thinking that the probability of events whose plausibility is hard to assess (not lotteries) has to be somewhere between 10 % to 90 %. In general, my view is more that it feels overconfident to ignore predictions, and using the mean does this when samples differ a lot among them. To illustrate, if I am trying to aggregate N probabilities, 10 %, 1 %, 0.1 %, …, and 10^-N, for N = 9:
The probability corresponding to the geometric mean of odds is 0.0152 % (= 1/(1 + (1/9)^(-(1 + 7)/2*7/7))), which is 1.52 times the median of 0.01 %.
The mean is 1.59 % (= 0.1*(1 − 0.1^7)/(1 − 0.1)/7), i.e. 159 times the median.
I think the mean is implausible because:
Ignoring the 4 to 5 lowest predictions among only 7 seems unjustifiable, and using the mean is equivalent to using the probability corresponding to the geometric mean of odds putting 0 weight in the 4 to 5 lowest predictions, which would lead to 0.894 % (= 1/(1 + (1/9)^(-(1 + 5)/2*5/7))) to 4.15 % (= 1/(1 + (1/9)^(-(1 + 4)/2*4/7))).
Ignoring the 3 lowest and 3 highest predictions among 7 seems justifiable, and would lead to the median, whereas the mean is 159 times the median.
You say 2 % probability of nuclear winter conditional on large nuclear war seems unjustifiable, but note the geometric mean of odds implies 4 %. In any case, I suspect the reason even this would feel too high is that it may in fact be too high, depending on how one defines nuclear winter, but that you are overestimating famine deaths conditional on nuclear winter. You put a weight of:
1⁄3 in Luisa’s results multiplied by 0.5, but I think the weight may still be too high given how pessimistic they are. Luisa predicts a 5 % chance of at least 36 % deaths (= 2.7/7.5), which looks quite high to me.
2⁄3 in Xia 2022′s results multiplied by 0.75, but this seems like an insufficient adjustment given you are relying on 37.5 % famine deaths, and this refers to no adaptation. Reducing food waste, decreasing the consumption of animals, expanding cultivated area, and reducing the production of biofuels are all quite plausible adaptation measures to me. So I think their baseline scenario is quite pessimistic, unless you also want to account for deaths indirectly resulting from infrastructure destruction which would happen even without any nuclear winter. I have some thoughts on reasons Xia 2022′s famine deaths may be too low and high here.
At the end of the day, I should say our estimates for the famine deaths are pretty much in agreement. I expect 4.43 % famine deaths due to the climatic effects of a large nuclear war, whereas you expect 6.16 % (20.2 % probability of nuclear winter if there is a large-scale nuclear war times 30.5 % deaths given nuclear winter).
For the question “What is the unconditional probability of London being hit with a nuclear weapon in October?”, the 7 forecasts were 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, and 0.001. The largest of these is 1 M (= 0.01/10^-8) times the smallest, whereas in your case the largest probability is 2 k (= 0.6/0.000355) times the smallest.
Thanks for the reply, Stan!
Cool!
Right, I wish experts were more transparent about their best guesses and uncertainty (accounting for the limitations of their studies).
Nice to know there is going to be more analysis! I think one important limitation of your current model, which I would try to eliminate in further work, is that it relies on the vague concept of nuclear winter to define the climatic effects. You calculate the expected mortality multiplying:
Probability of a large nuclear war.
Probability of nuclear winter if there is a large nuclear war.
Expected mortality if there is a nuclear winter.
However, I believe it is better to rely on a more precise concept to assess the climatic effects, namely the amount of soot injected into the stratosphere, or the mean drop in global temperature over a certain period (e.g. 2 years) after the nuclear war. In my analysis, I relied on the amount of soot, estimating the expected famine deaths due to the climatic effects multiplying:
Probability of a large nuclear war.
Expected soot injection into the stratosphere if there is a large nuclear war.
Expected famine deaths due to the climatic effects for the expected soot injection into the stratosphere.
Ideally, I would get the expected famine deaths multiplying:
Probability of a large nuclear war.
Expected famine deaths if there is a large nuclear war. To obtain the distribution of the famine deaths, I would:
Define a logistic function describing the famine deaths as a function of the soot injected into the stratosphere (or, even better, the mean drop in global temperature over a certain period). In my analysis, I approximated the logistic function as a piecewise linear function.
Input into the above function a distribution for the soot injected into the stratosphere if there is a large nuclear war (or, even better, the mean drop in global temperature over a certain period if there is a large nuclear war). To obtain this soot distribution, I would:
Define a function describing the soot injected into the stratosphere as a function of the number of offensive nuclear detonations.
Input into the above function a distribution for the number of offensive nuclear detonations if there is a large nuclear war.
Luisa followed something like the above, although I think her results are super pessimistic.
Fair point, there is no data on which method is best when we are just aggregating 3 forecasts. That being said:
A priori, it seems reasonable to assume that the best method for large samples is also the best method for small samples.
Samotsvety aggregated predictions differing a lot between them from 7 forecasters[1], and still used a modified version of the geometric mean, which ensures predictions smaller than 10 % of the mean are not ignored. A priori, it seems sensible to use an aggregation method that one of the most accomplished forecasting groups uses.
I think there is a natural human bias towards thinking that the probability of events whose plausibility is hard to assess (not lotteries) has to be somewhere between 10 % to 90 %. In general, my view is more that it feels overconfident to ignore predictions, and using the mean does this when samples differ a lot among them. To illustrate, if I am trying to aggregate N probabilities, 10 %, 1 %, 0.1 %, …, and 10^-N, for N = 9:
The probability corresponding to the geometric mean of odds is 0.0152 % (= 1/(1 + (1/9)^(-(1 + 7)/2*7/7))), which is 1.52 times the median of 0.01 %.
The mean is 1.59 % (= 0.1*(1 − 0.1^7)/(1 − 0.1)/7), i.e. 159 times the median.
I think the mean is implausible because:
Ignoring the 4 to 5 lowest predictions among only 7 seems unjustifiable, and using the mean is equivalent to using the probability corresponding to the geometric mean of odds putting 0 weight in the 4 to 5 lowest predictions, which would lead to 0.894 % (= 1/(1 + (1/9)^(-(1 + 5)/2*5/7))) to 4.15 % (= 1/(1 + (1/9)^(-(1 + 4)/2*4/7))).
Ignoring the 3 lowest and 3 highest predictions among 7 seems justifiable, and would lead to the median, whereas the mean is 159 times the median.
You say 2 % probability of nuclear winter conditional on large nuclear war seems unjustifiable, but note the geometric mean of odds implies 4 %. In any case, I suspect the reason even this would feel too high is that it may in fact be too high, depending on how one defines nuclear winter, but that you are overestimating famine deaths conditional on nuclear winter. You put a weight of:
1⁄3 in Luisa’s results multiplied by 0.5, but I think the weight may still be too high given how pessimistic they are. Luisa predicts a 5 % chance of at least 36 % deaths (= 2.7/7.5), which looks quite high to me.
2⁄3 in Xia 2022′s results multiplied by 0.75, but this seems like an insufficient adjustment given you are relying on 37.5 % famine deaths, and this refers to no adaptation. Reducing food waste, decreasing the consumption of animals, expanding cultivated area, and reducing the production of biofuels are all quite plausible adaptation measures to me. So I think their baseline scenario is quite pessimistic, unless you also want to account for deaths indirectly resulting from infrastructure destruction which would happen even without any nuclear winter. I have some thoughts on reasons Xia 2022′s famine deaths may be too low and high here.
At the end of the day, I should say our estimates for the famine deaths are pretty much in agreement. I expect 4.43 % famine deaths due to the climatic effects of a large nuclear war, whereas you expect 6.16 % (20.2 % probability of nuclear winter if there is a large-scale nuclear war times 30.5 % deaths given nuclear winter).
For the question “What is the unconditional probability of London being hit with a nuclear weapon in October?”, the 7 forecasts were 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, and 0.001. The largest of these is 1 M (= 0.01/10^-8) times the smallest, whereas in your case the largest probability is 2 k (= 0.6/0.000355) times the smallest.