Resources I send to AI researchers about AI safety

This is my list of resources I send to machine learning (ML) researchers in presenting arguments about AI safety. New resources have been coming out fast, and I’ve also been user-testing these, so the top part of this post are my updated (Nov 2022) recommendations. The rest of the post (originally posted June 2022) has been modified for organization but mostly left for reference; I make occasional additions to it (last updated June 2023).

Core recommended resources

Core readings for ML researchers[1]


Arguments for risk from advanced AI systems


Research directions

Core readings for the public

Core readings for EAs

(Readings that are more philosophical and involve x-risk and discussion of AGI-like systems, so expected to be less liked by ML researchers (I have some limited data suggesting this), but they’re anecdotally well-liked by EAs)

Getting involved for EAs

If you haven’t read Charlie’s writeup about research, or Gabe’s writeup about engineering, worth a look! Richard Ngo’s AGI safety career advice is also good. Also if you’re interested in theory, see John Wentworth’s writeup about independent research, and Vivek wrote some alignment exercises to try (also see John Wentworth’s work in general). With respect to outreach, I’d try to use a more technical pitch than what Vael used; I think Sam Bowman’s pitch is pretty great, and Marius also has a nice writeup of his pitch (not specific to NLP).

Full list of recommended resources

These reading choices are drawn from the various other reading lists (also Victoria Krakovna’s); this is not original in any way, just something to draw from if you’re trying to send someone some of the more accessible resources.


Central Arguments

Technical Work on AI alignment

How does this lead to xrisk /​ killing people though?

Forecasting (When might advanced AI be developed?)

Calibration and Forecasting

Common Misconceptions

Counterarguments to AI safety (messy doc):

Collection of public surveys about AI

Miscellaneous older text

Text I’m no longer using but still use for reference sometimes.

If you’re interested in getting into this:

Introduction to large-scale risks from humanity, including “existential risks” that could lead to the extinction of humanity

Chapter 3 is on natural risks, including risks of asteroid and comet impacts, supervolcanic eruptions, and stellar explosions. Ord argues that we can appeal to the fact that we have already survived for 2,000 centuries as evidence that the total existential risk posed by these threats from nature is relatively low (less than one in 2,000 per century).

Chapter 4 is on anthropogenic risks, including risks from nuclear war, climate change, and environmental damage. Ord estimates these risks as significantly higher, each posing about a one in 1,000 chance of existential catastrophe within the next 100 years. However, the odds are much higher that climate change will result in non-existential catastrophes, which could in turn make us more vulnerable to other existential risks.

Chapter 5 is on future risks, including engineered pandemics and artificial intelligence. Worryingly, Ord puts the risk of engineered pandemics causing an existential catastrophe within the next 100 years at roughly one in thirty. With any luck the COVID-19 pandemic will serve as a “warning shot,” making us better able to deal with future pandemics, whether engineered or not. Ord’s discussion of artificial intelligence is more worrying still. The risk here stems from the possibility of developing an AI system that both exceeds every aspect of human intelligence and has goals that do not coincide with our flourishing. Drawing upon views held by many AI researchers, Ord estimates that the existential risk posed by AI over the next 100 years is an alarming one in ten.

Chapter 6 turns to questions of quantifying particular existential risks (some of the probabilities cited above do not appear until this chapter) and of combining these into a single estimate of the total existential risk we face over the next 100 years. Ord’s estimate of the latter is one in six.

How AI could be an existential risk

  • AI alignment researchers disagree a weirdly high amount about how AI could constitute an existential risk, so I hardly think the question is settled. Some plausible ones people are considering (copied from the paper)

  • “Superintelligence”

    • A single AI system with goals that are hostile to humanity quickly becomes sufficiently capable for complete world domination, and causes the future to contain very little of what we value, as described in “Superintelligence”. (Note from Vael: Where the AI has an instrumental incentive to destroy humans and uses its planning capabilities to do so, for example via synthetic biology or nanotechnology.)

  • Part 2 of “What failure looks like

    • This involves multiple AIs accidentally being trained to seek influence, and then failing catastrophically once they are sufficiently capable, causing humans to become extinct or otherwise permanently lose all influence over the future. (Note from Vael: I think we might have to pair this with something like “and in loss of control, the environment then becomes uninhabitable to humans through pollution or consumption of important resources for humans to survive”)

  • Part 1 of “What failure looks like

    • This involves AIs pursuing easy-to-measure goals, rather than the goals humans actually care about, causing us to permanently lose some influence over the future. (Note from Vael: I think we might have to pair this with something like “and in loss of control, the environment then becomes uninhabitable to humans through pollution or consumption of important resources for humans to survive”)

  • War

    • Some kind of war between humans, exacerbated by developments in AI, causes an existential catastrophe. AI is a significant risk factor in the catastrophe, such that no catastrophe would be occurred without the developments in AI. The proximate cause of the catastrophe is the deliberate actions of humans, such as the use of AI-enabled, nuclear or other weapons. See Dafoe (2018) for more detail. (Note from Vael: Though there’s a recent argument that it may be unlikely for nuclear weapons to cause an extinction event, and instead it would just be catastrophically bad. One could still do it with synthetic biology though, probably, to get all of the remote people.)

  • Misuse

    • Intentional misuse of AI by one or more actors causes an existential catastrophe (excluding cases where the catastrophe was caused by misuse in a war that would not have occurred without developments in AI). See Karnofsky (2016) for more detail.

  • Other

Governance, aimed at highly capable systems in addition to today’s systems

It seemed like a lot of your thoughts about AI risk went through governance, so wanted to mention what the space looks like (spoiler: it’s preparadigmatic) if you haven’t seen that yet!

AI Safety in China

AI Safety community building, student-focused (see academic efforts above)

If they’re curious about other existential /​ global catastrophic risks:

Large-scale risks from synthetic biology

Large-scale risks from nuclear

Why I don’t think we’re on the right timescale to worry most about climate change:

List for “Preventing Human Extinction” class

I’ve also included a list of resources that I had students read through for the course Stanford first-year course “Preventing Human Extinction”.

When might advanced AI be developed?

Why might advanced AI be a risk?

Thinking about making advanced AI go well (technical)

Thinking about making advanced AI go well (governance)

Optional (large-scale risks from AI)

Natural science sources

  1. ^
  2. ^

    I swear I didn’t set out to self-promote here—it’s just doing weirdly well on user testing for both EAs and ML researchers at the moment (this is partly because it’s relatively current; I expect it’ll do less well over time)

    Note: I’ve written a new version of this talk that goes over the AI risk arguments through March 2023, and there’s a new website talking about my interview findings (

  3. ^

    Hi X,

    [warm introduction]

    In the interests of increasing options, I wanted to reach out and say that I’d be particularly happy to help you explore synthetic biology pathways more, if you were so inclined. I think it’s pretty plausible we’ll get another worse pandemic in our lifetimes, and worth investing a career or part of a career to work on it. Especially since so few people will make that choice, so a single person probably matters a lot compared to entering other more popular careers.

    No worries if you’re not interested though—this is just one option out of many. I’m emailing you in a batch instead of individually so that hopefully you feel empowered to ignore this email and be done with this class :P. Regardless, thanks for a great quarter and hope you have great summers!

    If you are interested:

Crossposted from LessWrong (69 points, 12 comments)
No comments.