Hey, thanks for the review. Perhaps unsurprisingly, we thought you were stronger when talking about the players in the field and the tools they have at their disposal, but weaker when talking about how this cashes out in terms of probabilities, particularly around parameters you consider to have “Knightian uncertainty”.
We liked your overview of post-Soviet developments, and thought it was a good overview of reasons to be pessimistic. We would also have hoped for an expert overview of reasons for optimism (which an ensemble of experts could provide). For instance, the New START treaty did get renewed.
(We will discuss your analysis at the next Samotsvety meeting, and update our estimates accordingly.)
First, as mentioned in private communication, we think that you are underestimating the chance that the “informed and unbiased actors” in our audience would be able to avoid a nuclear strike. For instance, here are Robert Wiblin’s triggers for leaving London. Arguably, readers are best positioned to estimate/adjust this number for themselves (based on e.g. how readily they responded to the covid crisis).
This single factor explains the first ~3x difference in our forecasts. After adjusting for that, Peter’s estimate falls in the range of forecasts in our sample. This is reassuring!
Second, we think that taking Luisa Rodríguez’s 0.38%/year at face value is unadvisable. She aggregates forecaster and expert probabilities using the simple average (arithmetic mean), but best practices indicate the geometric mean of odds instead. If one does so, we arrive at a 0.13% baseline risk. From this starting point, we update upwards on the current situation — if you update from a lower baseline, your estimate should be lower.
This could account for the 1.5x to 3x difference in our forecast of any nuclear casualties. Making this adjustment would bring your forecast even closer to our aggregate.
These two factors together bridge the order of magnitude between our forecasts. But overall, we shouldn’t be that surprised that subject matter experts were more pessimistic/uncertain than forecasters. For instance, in Luisa Rodríguez’s sample, superforecasters were likewise around 10x lower than experts.
Calculating the baseline probability of a nuclear war between the United States and Russia—i.e., the probability pre-Ukraine invasion and absent conventional confrontation—is difficult because no one has ever fought a nuclear war. Nuclear war is not analogous to conventional war, so it is difficult to form even rough comparison classes that would provide a base rate. It is a unique situation that approaches true Knightian uncertainty, muddying attempts to characterize it in terms of probabilistic risk. That said, for the sake of argument, where the forecasters adjusted downward from Luisa Rodriguez’s estimate of 0.38%/year, I would adjust upward to 0.65%/year given deterioration in U.S.-Russian strategic stability.
...“Conditional on Russia/NATO nuclear warfare killing at least one person, London is hit with a nuclear weapon.” (Samotsvety aggregate forecast: 18%.)
This is the most problematic of the component forecasts because it implies a highly confident answer to one of the most significant and persistent questions in nuclear strategy: whether escalation can be controlled once nuclear weapons have been used. Is it possible to wage a “merely” tactical nuclear war, or will tactical war inevitably lead to a strategic nuclear exchange in which the homelands of nuclear-armed states are targeted? Would we “rationally” climb an escalatory ladder, pausing at each rung to evaluate pros and cons of proceeding, or would we race to the top in an attempt to gain advantage? Is the metaphorical ladder of escalation really just a slippery slope to Armageddon?...
...Given the degree of disagreement and the paucity of data, it would not be unreasonable to assign this question 50⁄50 odds...
...Regardless, a nuclear strike on London would probably (>70%) result in a retaliatory U.K. strike on Moscow...
We don’t think the answer is necessarily “highly confident”. Just because an issue is complex doesn’t mean our best guess should be near 50%. In particular, if there are many options, e.g., a one-off accident, containment in Ukraine, containment in Europe, and further escalation to e.g., London, and we are maximally uncertain about which one would happen, we should assign equal probability to all. So in this case with four possible outcomes, total uncertainty would cash out to 25% for each of them.
But we are not maximally uncertain. In particular, as the author points out, a nuclear strike to London would likely be followed by a nuclear strike to Moscow. Actors on both sides would want to avoid such an outcome, which brings our probability lower. Certainly, the fact that the reviewer is higher brings up a bit higher, though, since the logic of nuclear war is something we don’t have deep experience with.
(Below written by Peter in collaboration with Josh.)
It sounds like I have a somewhat different view of Knightian uncertainty, which is fine—I’m not sure that it substantially affects what we’re trying to accomplish. I’ll simply say that, to the extent that Knight saw uncertainty as signifying the absence of “statistics of past experience,” nuclear war strikes me as pretty close to a definitional example. I think we make the forecasting challenge easier by breaking the problem into pieces, moving us closer to risk. That’s one reason I wanted to add conventional conflict between NATO and Russia as an explicit condition: NATO has a long history of confronting Russia and, by and large, managed to avoid direct combat.
By contrast, the extremely limited history of nuclear war does not enable us to validate any particular model of the risk. I fear that the assumptions behind the models you cite may not work out well in practice and would like to see how they perform in a variety of as-similar-as-possible real world forecasts. That said, I am open to these being useful ways to model the risk. Are you aware of attempts to validate these types of methods as applied to forecasting rare events?
On the ignorance prior:
I agree that not all complex, debatable issues imply probabilities close to 50-50. However, your forecast will be sensitive to how you define the universe of “possible outcomes” that you see as roughly equally likely from an ignorance prior. Why not define the possible outcomes as: one-off accident, containment on one battlefield in Ukraine, containment in one region in Ukraine, containment in Ukraine, containment in Ukraine and immediately surrounding countries, etc.? Defining the ignorance prior universe in this way could stack the deck in favor of containment and lead to a very low probability of large-scale nuclear war. How can we adjudicate what a naive, unbiased description of the universe of outcomes would be?
As I noted, my view of the landscape is different: it seems to me that there is a strong chance of uncontrollable escalation if there is direct nuclear war between Russia and NATO. I agree that neither side wants to fight a nuclear war—if they did, we’d have had one already!— but neither side wants its weapons destroyed on the ground either. That creates a strong incentive to launch first, especially if one believes the other side is preparing to attack. In fact, even absent that condition, launching first is rational if you believe it is possible to “win” a nuclear war, in which case you want to pursue a damage-limitation strategy. If you believe there is a meaningful difference between 50 million dead and 100 million dead, then it makes sense to reduce casualties by (a) taking out as many of the enemy’s weapons as possible; (b) employing missile defenses to reduce the impact of whatever retaliatory strike the enemy manages; and (c) building up civil defenses (fallout shelters etc.) such that more people survive whatever warheads survive (a) and (b). In a sense “the logic of nuclear war” is oxymoronic because a prisoner’s dilemma-type dynamic governs the situation such that, even though cooperation (no war) is the best outcome, both sides are driven to defect (war). By taking actions that seem to be in our self-interest we ensure what we might euphemistically call a suboptimal outcome. When I talk about “strategic stability,” I am referring to a dynamic where the incentives to launch first or to launch-on-warning have been reduced, such that choosing cooperation makes more sense. New START (and START before it) attempts to boost strategic stability by establishing nuclear parity (at least with respect to strategic weapons). But its influence has been undercut by other developments that are de-stabilizing.
Thank you again for the thoughtful comments, and I’m happy to engage further if that would be clarifying or helpful to future forecasting efforts.
Thanks for the reply and the thoughtful analysis, Misha and Nuño, and please accept our apologies for the delayed response. The below was written by Peter in collaboration with Josh.
First, regarding the Rodriguez estimate, I take your point about the geometric mean rather than arithmetic mean and that would move my probability of risk of nuclear war down a bit — thanks for pointing that out. To be honest, I had not dug into the details of the Rodriguez estimate and was attempting to remove your downward adjustment from it due to “new de-escalation methods” since I was not convinced by that point. To give a better independent estimate on this I’d need to dig into the original analysis and do some further thinking of my own. I’m curious: How much of an adjustment were you making based on the “new de-escalation methods” point?
Regarding some of the other points:
On “informed and unbiased actors”: I agree that if someone were following Rob Wiblin’s triggers, they’d have a much higher probability of escape. However, I find the construction of the precise forecasting question somewhat confusing and, from context, had been interpreting it to mean that you were considering the probability that informed and unbiased actors would be able to escape after Russia/NATO nuclear warfare had begun but before London had been hit, which made me pessimistic because that seems like a fairly late trigger for escape. However, it seems that this was not your intention. If you’re assuming something closer to Wiblin’s triggers before Russia/NATO nuclear warfare begins, I’d expect greater chance of escape like you do. I would still have questions about how able/willing such people would be to potentially stay out of London for months at a time (as may be implied by some of Wiblin’s triggers) and what fraction of readers would truly follow that protocol, though. As you say, perhaps it makes most sense for people to judge this for themselves, but describing the expected behavior in more detail may help craft a better forecasting question.
On reasons for optimism from “post-Soviet developments”: I am curious what, besides the New START extension, you may be thinking of getting others’ views on. From my perspective, the New START extension was the bare minimum needed to maintain strategic predictability/transparency. It is important, but (and I say this as someone who worked closely on Senate approval of the treaty) it did not fundamentally change the nuclear balance or dramatically improve stability beyond the original START. Yes, it cut the number of deployed strategic warheads, which is significant, but 1,550 on each side is still plenty to end civilization as we know it (even if employed against only counterforce targets). The key benefit to New START was that it updated the verification provisions of the original START treaty, which was signed before the dissolution of the Soviet Union, so I question whether it should be considered a “post-Soviet development” for the purposes of adjusting forecasts relative to that era. START (and its verification provisions) had been allowed to lapse in December 2009, so the ratification of New START was crucial, but the value of its extension needs to be considered against the host of negative developments that I briefly alluded to in my response.
Hey, thanks for the review. Perhaps unsurprisingly, we thought you were stronger when talking about the players in the field and the tools they have at their disposal, but weaker when talking about how this cashes out in terms of probabilities, particularly around parameters you consider to have “Knightian uncertainty”.
We liked your overview of post-Soviet developments, and thought it was a good overview of reasons to be pessimistic. We would also have hoped for an expert overview of reasons for optimism (which an ensemble of experts could provide). For instance, the New START treaty did get renewed.
(We will discuss your analysis at the next Samotsvety meeting, and update our estimates accordingly.)
First, as mentioned in private communication, we think that you are underestimating the chance that the “informed and unbiased actors” in our audience would be able to avoid a nuclear strike. For instance, here are Robert Wiblin’s triggers for leaving London. Arguably, readers are best positioned to estimate/adjust this number for themselves (based on e.g. how readily they responded to the covid crisis).
This single factor explains the first ~3x difference in our forecasts. After adjusting for that, Peter’s estimate falls in the range of forecasts in our sample. This is reassuring!
Second, we think that taking Luisa Rodríguez’s 0.38%/year at face value is unadvisable. She aggregates forecaster and expert probabilities using the simple average (arithmetic mean), but best practices indicate the geometric mean of odds instead. If one does so, we arrive at a 0.13% baseline risk. From this starting point, we update upwards on the current situation — if you update from a lower baseline, your estimate should be lower.
This could account for the 1.5x to 3x difference in our forecast of any nuclear casualties. Making this adjustment would bring your forecast even closer to our aggregate.
These two factors together bridge the order of magnitude between our forecasts. But overall, we shouldn’t be that surprised that subject matter experts were more pessimistic/uncertain than forecasters. For instance, in Luisa Rodríguez’s sample, superforecasters were likewise around 10x lower than experts.
—Misha and Nuño
On Laplace’s law and Knightian uncertainty
The probability of nuclear war doesn’t seem like it would fall under the heading of Knightian uncertainty. For instance, we can start out by applying Laplace’s law of succession, or a bit more complicated multi-step methods.
On the right ignorance prior
We don’t think the answer is necessarily “highly confident”. Just because an issue is complex doesn’t mean our best guess should be near 50%. In particular, if there are many options, e.g., a one-off accident, containment in Ukraine, containment in Europe, and further escalation to e.g., London, and we are maximally uncertain about which one would happen, we should assign equal probability to all. So in this case with four possible outcomes, total uncertainty would cash out to 25% for each of them.
But we are not maximally uncertain. In particular, as the author points out, a nuclear strike to London would likely be followed by a nuclear strike to Moscow. Actors on both sides would want to avoid such an outcome, which brings our probability lower. Certainly, the fact that the reviewer is higher brings up a bit higher, though, since the logic of nuclear war is something we don’t have deep experience with.
(Below written by Peter in collaboration with Josh.)
It sounds like I have a somewhat different view of Knightian uncertainty, which is fine—I’m not sure that it substantially affects what we’re trying to accomplish. I’ll simply say that, to the extent that Knight saw uncertainty as signifying the absence of “statistics of past experience,” nuclear war strikes me as pretty close to a definitional example. I think we make the forecasting challenge easier by breaking the problem into pieces, moving us closer to risk. That’s one reason I wanted to add conventional conflict between NATO and Russia as an explicit condition: NATO has a long history of confronting Russia and, by and large, managed to avoid direct combat.
By contrast, the extremely limited history of nuclear war does not enable us to validate any particular model of the risk. I fear that the assumptions behind the models you cite may not work out well in practice and would like to see how they perform in a variety of as-similar-as-possible real world forecasts. That said, I am open to these being useful ways to model the risk. Are you aware of attempts to validate these types of methods as applied to forecasting rare events?
On the ignorance prior:
I agree that not all complex, debatable issues imply probabilities close to 50-50. However, your forecast will be sensitive to how you define the universe of “possible outcomes” that you see as roughly equally likely from an ignorance prior. Why not define the possible outcomes as: one-off accident, containment on one battlefield in Ukraine, containment in one region in Ukraine, containment in Ukraine, containment in Ukraine and immediately surrounding countries, etc.? Defining the ignorance prior universe in this way could stack the deck in favor of containment and lead to a very low probability of large-scale nuclear war. How can we adjudicate what a naive, unbiased description of the universe of outcomes would be?
As I noted, my view of the landscape is different: it seems to me that there is a strong chance of uncontrollable escalation if there is direct nuclear war between Russia and NATO. I agree that neither side wants to fight a nuclear war—if they did, we’d have had one already!— but neither side wants its weapons destroyed on the ground either. That creates a strong incentive to launch first, especially if one believes the other side is preparing to attack. In fact, even absent that condition, launching first is rational if you believe it is possible to “win” a nuclear war, in which case you want to pursue a damage-limitation strategy. If you believe there is a meaningful difference between 50 million dead and 100 million dead, then it makes sense to reduce casualties by (a) taking out as many of the enemy’s weapons as possible; (b) employing missile defenses to reduce the impact of whatever retaliatory strike the enemy manages; and (c) building up civil defenses (fallout shelters etc.) such that more people survive whatever warheads survive (a) and (b). In a sense “the logic of nuclear war” is oxymoronic because a prisoner’s dilemma-type dynamic governs the situation such that, even though cooperation (no war) is the best outcome, both sides are driven to defect (war). By taking actions that seem to be in our self-interest we ensure what we might euphemistically call a suboptimal outcome. When I talk about “strategic stability,” I am referring to a dynamic where the incentives to launch first or to launch-on-warning have been reduced, such that choosing cooperation makes more sense. New START (and START before it) attempts to boost strategic stability by establishing nuclear parity (at least with respect to strategic weapons). But its influence has been undercut by other developments that are de-stabilizing.
Thank you again for the thoughtful comments, and I’m happy to engage further if that would be clarifying or helpful to future forecasting efforts.
Thanks for the detailed answers!
Wiblin’s triggers to bolt away:
Thanks for the reply and the thoughtful analysis, Misha and Nuño, and please accept our apologies for the delayed response. The below was written by Peter in collaboration with Josh.
First, regarding the Rodriguez estimate, I take your point about the geometric mean rather than arithmetic mean and that would move my probability of risk of nuclear war down a bit — thanks for pointing that out. To be honest, I had not dug into the details of the Rodriguez estimate and was attempting to remove your downward adjustment from it due to “new de-escalation methods” since I was not convinced by that point. To give a better independent estimate on this I’d need to dig into the original analysis and do some further thinking of my own. I’m curious: How much of an adjustment were you making based on the “new de-escalation methods” point?
Regarding some of the other points:
On “informed and unbiased actors”: I agree that if someone were following Rob Wiblin’s triggers, they’d have a much higher probability of escape. However, I find the construction of the precise forecasting question somewhat confusing and, from context, had been interpreting it to mean that you were considering the probability that informed and unbiased actors would be able to escape after Russia/NATO nuclear warfare had begun but before London had been hit, which made me pessimistic because that seems like a fairly late trigger for escape. However, it seems that this was not your intention. If you’re assuming something closer to Wiblin’s triggers before Russia/NATO nuclear warfare begins, I’d expect greater chance of escape like you do. I would still have questions about how able/willing such people would be to potentially stay out of London for months at a time (as may be implied by some of Wiblin’s triggers) and what fraction of readers would truly follow that protocol, though. As you say, perhaps it makes most sense for people to judge this for themselves, but describing the expected behavior in more detail may help craft a better forecasting question.
On reasons for optimism from “post-Soviet developments”: I am curious what, besides the New START extension, you may be thinking of getting others’ views on. From my perspective, the New START extension was the bare minimum needed to maintain strategic predictability/transparency. It is important, but (and I say this as someone who worked closely on Senate approval of the treaty) it did not fundamentally change the nuclear balance or dramatically improve stability beyond the original START. Yes, it cut the number of deployed strategic warheads, which is significant, but 1,550 on each side is still plenty to end civilization as we know it (even if employed against only counterforce targets). The key benefit to New START was that it updated the verification provisions of the original START treaty, which was signed before the dissolution of the Soviet Union, so I question whether it should be considered a “post-Soviet development” for the purposes of adjusting forecasts relative to that era. START (and its verification provisions) had been allowed to lapse in December 2009, so the ratification of New START was crucial, but the value of its extension needs to be considered against the host of negative developments that I briefly alluded to in my response.