Thanks for asking, Toby! Here are my ranking and quick thoughts:
AGI Battle Royale: Why “slow takeover” scenarios devolve into a chaotic multi-AGI fight to the death. I see this as an argument for takeover risk being very connected to differences in power rather than absolute power, with takeover by a few agents remaining super difficult as long as power is not super concentrated. So I would say efforts to mitigate power inequality will continue to be quite important, although society already seems be aware of this.
The Leeroy Jenkins principle: How faulty AI could guarantee “warning shots”. Quantitative illustration of how warning shots can decrease tail risk a lot. Judging from society’s reaction to very small visible harms caused by AI (e.g. self-driving car accidents), it seems to me like each time an AI disaster kill x humans, society will react in such a way that AI disaster killing 10 x humans will be made significantly less likely. To illustrate how fast the risk can decrease, if each doubling of deaths is made 50 % as likely, since there are 32.9 (= LN(8*10^9)/LN(2)) doublings between 1 and 8 billion deaths, starting at 1 death caused by AI per year, the annual risk of human extinction would be 1.25*10^-10 (= 0.5^32.9).
Thanks for sharing these. To prioritise my reading a bit (if you have time): which arguments do you find particularly useful and why?
Thanks for asking, Toby! Here are my ranking and quick thoughts:
AGI Battle Royale: Why “slow takeover” scenarios devolve into a chaotic multi-AGI fight to the death. I see this as an argument for takeover risk being very connected to differences in power rather than absolute power, with takeover by a few agents remaining super difficult as long as power is not super concentrated. So I would say efforts to mitigate power inequality will continue to be quite important, although society already seems be aware of this.
The Leeroy Jenkins principle: How faulty AI could guarantee “warning shots”. Quantitative illustration of how warning shots can decrease tail risk a lot. Judging from society’s reaction to very small visible harms caused by AI (e.g. self-driving car accidents), it seems to me like each time an AI disaster kill x humans, society will react in such a way that AI disaster killing 10 x humans will be made significantly less likely. To illustrate how fast the risk can decrease, if each doubling of deaths is made 50 % as likely, since there are 32.9 (= LN(8*10^9)/LN(2)) doublings between 1 and 8 billion deaths, starting at 1 death caused by AI per year, the annual risk of human extinction would be 1.25*10^-10 (= 0.5^32.9).
Chaining the evil genie: why “outer” AI safety is probably easy. Explainer of how an arbitrarily intelligent AIs can seemingly be made arbitrarily constrained if they robustly adopt the goals humans give them.
The bullseye framework: My case against AI doom. Good overview of titotal’s posts.
How “AGI” could end up being many different specialized AI’s stitched together. Good pointers to the importance and plausibility of specialisation, and how this reduces risk.
Bandgaps, Brains, and Bioweapons: The limitations of computational science and what it means for AGI. Good illustration that some problems will not be solved by brute force computation, but it leaves room for AI to find efficient heuristic (as AlphaFold does).
“Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation. Great investigation, but it tackles a specific threat, so the lessons are not so generalisable as those of other posts.