Seems like a lot of alignment research at the moment is analogous to physicists at Los Alamos National Labs running computer simulations to show that a next-generation nuke will reliably give a yield of 5 megatons plus or minus 0.1 megatons, and will not explode accidentally, and is therefore aligned with the Pentagon’s mission of developing ‘safe and reliable’ nuclear weaponry.… and then saying ‘We’ll worry about the risks of nuclear arms races, nuclear escalation, nuclear accidents, nuclear winter, and nuclear terrorism later—they’re just implementation details’.
The situation is much worse than that. It’s more like: They are worried about the possibility that the first ever nuclear explosion will ignite the upper atmosphere and set the whole earth ablaze. (You’ve probably read, this is a real concern they had). Except in this hypothetical the preliminary calculations are turning up the answer of Yes no matter how they run them. So they are continuing to massage the calculations and make the modelling software more realistic in the hopes of a No answer, and also advocating for changes to the design of the bomb that’ll hopefully mitigate the risk, and also advocating for the whole project to slow down before it’s too late, but the higher-ups have go fever & so it looks like in a few years the whole world will be on fire. Meanwhile, some other people are talking to the handful of Los Alamos physicists and saying “but even if the atmosphere doesn’t catch on fire, what about arms races, accidents, terrorism, etc.?” and the physicists are like “lol yeah that’s gonna be a whole big problem if we manage so survive the first test, which unfortunately we probably won’t. We’d be working on that problem if this one didn’t take priority.”
Cool. That all makes sense.
Seems like a lot of alignment research at the moment is analogous to physicists at Los Alamos National Labs running computer simulations to show that a next-generation nuke will reliably give a yield of 5 megatons plus or minus 0.1 megatons, and will not explode accidentally, and is therefore aligned with the Pentagon’s mission of developing ‘safe and reliable’ nuclear weaponry.… and then saying ‘We’ll worry about the risks of nuclear arms races, nuclear escalation, nuclear accidents, nuclear winter, and nuclear terrorism later—they’re just implementation details’.
The situation is much worse than that. It’s more like: They are worried about the possibility that the first ever nuclear explosion will ignite the upper atmosphere and set the whole earth ablaze. (You’ve probably read, this is a real concern they had). Except in this hypothetical the preliminary calculations are turning up the answer of Yes no matter how they run them. So they are continuing to massage the calculations and make the modelling software more realistic in the hopes of a No answer, and also advocating for changes to the design of the bomb that’ll hopefully mitigate the risk, and also advocating for the whole project to slow down before it’s too late, but the higher-ups have go fever & so it looks like in a few years the whole world will be on fire. Meanwhile, some other people are talking to the handful of Los Alamos physicists and saying “but even if the atmosphere doesn’t catch on fire, what about arms races, accidents, terrorism, etc.?” and the physicists are like “lol yeah that’s gonna be a whole big problem if we manage so survive the first test, which unfortunately we probably won’t. We’d be working on that problem if this one didn’t take priority.”
That’s a vivid but perhaps all too accurate (and all too horrifying) analogy.