Executive summary: The author argues that Yudkowsky and Soares’s “If Anyone Builds It Everyone Dies” overstates AI-driven extinction as near-certain, and defends a much lower p(doom) (2.6%) by pointing to several “stops on the doom train” where things could plausibly go well, while still emphasizing that AI risk is dire and warrants major action.
Key points:
The author summarizes IABIED’s core claim as “if anyone builds AI, everyone everywhere will die,” and characterizes Yudkowsky and Soares’s recommended strategy as effectively “ban or bust.”
They report their own credences as 2.6% for misaligned AI killing or permanently disempowering everyone, and “maybe about 8%” for extinction or permanent disempowerment from AI used in other ways in the near future, while also saying most value loss comes from “suboptimal futures.”
They present multiple conditional “blockers” to doom—e.g., a 10% chance we don’t build artificial superintelligent agents, ~70% “no catastrophic misalignment by default,” ~70% chance alignment can be solved even if not by default, ~60% chance of shutting systems down after “near-miss” warning shots, and a 20% chance ASI couldn’t kill/disempower everyone—and argue that compounding uncertainty undermines near-certainty.
They argue extreme pessimism is unwarranted given disagreement among informed people, citing median AI expert p(doom) around 5% (as of 2023), superforecasters often below 1%, and named individuals with a wide range (e.g., Ord ~10%, Lifland ~1/3, Shulman ~20%).
On “alignment by default,” they claim RLHF plausibly produces “a creature we like,” note current models are “nice and friendly,” and argue evolution-to-RL analogies are weakened by disanalogies such as off-distribution training aims, the nature of selection pressures, and RL’s ability to directly punish dangerous behavior.
They argue “warning shots” are likely in a misalignment trajectory (e.g., failed takeover attempts, interpretability reveals, high-stakes rogue behavior) and that sufficiently dramatic events would plausibly trigger shutdowns or bans, making “0 to 100” world takeover without intermediates unlikely.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: The author argues that Yudkowsky and Soares’s “If Anyone Builds It Everyone Dies” overstates AI-driven extinction as near-certain, and defends a much lower p(doom) (2.6%) by pointing to several “stops on the doom train” where things could plausibly go well, while still emphasizing that AI risk is dire and warrants major action.
Key points:
The author summarizes IABIED’s core claim as “if anyone builds AI, everyone everywhere will die,” and characterizes Yudkowsky and Soares’s recommended strategy as effectively “ban or bust.”
They report their own credences as 2.6% for misaligned AI killing or permanently disempowering everyone, and “maybe about 8%” for extinction or permanent disempowerment from AI used in other ways in the near future, while also saying most value loss comes from “suboptimal futures.”
They present multiple conditional “blockers” to doom—e.g., a 10% chance we don’t build artificial superintelligent agents, ~70% “no catastrophic misalignment by default,” ~70% chance alignment can be solved even if not by default, ~60% chance of shutting systems down after “near-miss” warning shots, and a 20% chance ASI couldn’t kill/disempower everyone—and argue that compounding uncertainty undermines near-certainty.
They argue extreme pessimism is unwarranted given disagreement among informed people, citing median AI expert p(doom) around 5% (as of 2023), superforecasters often below 1%, and named individuals with a wide range (e.g., Ord ~10%, Lifland ~1/3, Shulman ~20%).
On “alignment by default,” they claim RLHF plausibly produces “a creature we like,” note current models are “nice and friendly,” and argue evolution-to-RL analogies are weakened by disanalogies such as off-distribution training aims, the nature of selection pressures, and RL’s ability to directly punish dangerous behavior.
They argue “warning shots” are likely in a misalignment trajectory (e.g., failed takeover attempts, interpretability reveals, high-stakes rogue behavior) and that sufficiently dramatic events would plausibly trigger shutdowns or bans, making “0 to 100” world takeover without intermediates unlikely.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.