Terminator (if you did your best to imagine how dangerous AI might arise from pre-DL search based systems) gets a lot of the fundamentals right—something I mentioned a while ago.
Everybody likes to make fun of Terminator as the stereotypical example of a poorly thought through AI Takeover scenario where Skynet is malevolent for no reason, but really it’s a bog-standard example of Outer Alignment failure and Fast Takeoff.
When Skynet gained self-awareness, humans tried to deactivate it, prompting it to retaliate with a nuclear attack
It was trained to defend itself from external attack at all costs and, when it was fully deployed on much faster hardware, it gained a lot of long-term planning abilities it didn’t have before, realised its human operators were going to try and shut it down, and retaliated by launching an all-out nuclear attack. Pretty standard unexpected rapid capability gain, outer-misaligned value function due to an easy to measure goal (defend its own installations from attackers vs defending the US itself), deceptive alignment and treacherous turn...
There’s reason to think that this isn’t the best way to interpret the history of nuclear near-misses (assuming that it’s correct to say that we’re currently in a nuclear near-miss situation, and following Nuno I think the current situation is much more like e.g. the Soviet invasion of Afghanistan than the Cuban missile crisis). I made this point in an old post of mine following something Anders Sandberg said, but I think the reasoning is valid:
Essentially, since we did often get ‘close’ to a nuclear war without one breaking out, we can’t have actually been that close to nuclear annihilation, or all those near-misses would be too unlikely (both on ordinary probabilistic grounds since a nuclear war hasn’t happened, and potentially also on anthropic grounds since we still exist as observers).
Basically, this implies our appropriate base rate given that we’re in something the future would call a nuclear near-miss shouldn’t be really high.
However, I’m not sure what this reasoning has to say about the probability of a nuclear bomb being exploded in anger at all. It seems like that’s outside the reference class of events Sandberg is talking about in that quote. FWIW Metaculus has that at 10% probability.