A Brief Overview of AI Safety/​Alignment Orgs, Fields, Researchers, and Resources for ML Researchers

Crossposted to LessWrong [Link]

TLDR: I’ve written an overview of the AI safety space, tagged by keywords and subject/​field references (short version, long version). The aim is to allow existing ML researchers to quickly gauge interest in the subject based on their existing subfield skills and interests!

Overview

When ML researchers first hear about AI alignment and are interested in learning more, they often wonder how their existing skills and interests could fit within the research already taking place. With expertise in specific subfields, and momentum in their careers and projects, interested researchers are curious about the overall AI alignment space and what research projects they could invest in relatively easily. As one step towards addressing this, the AISFB Hub commissioned a collection of resources that could be provided to technical researchers trying to quickly assess what areas seem like promising candidates for them to investigate further: (Short version, Long version).

These documents list a subset of the various organizations and researchers involved in the AI safety space, along with major papers. To allow quick scanning, I focused on keywords and subject/​field references. As this was targeted at researchers who already have experience with ML, the summaries provided are primarily meant to allow the reader to quickly gauge interest in the subject based on their existing subfield skills and interests.

This work contains papers and posts up through the end of 2022. Please contact me or Vael Gates if you would be willing to keep it updated!

Details and Disclaimers

As an attempt at collecting alignment research, I generally see this post as complementary to Larsen’s post on technical alignment. Neither entirely includes the other, with Larsen’s post having a slightly stronger and more curated focus on fields and projects, while this collection emphasized providing general resources and example areas of work for new researchers.

Overall, this list took a little over 40 hours of work to put together. It primarily included looking into and summarizing the work of organizations I knew about. This was supplemented by investigating a list of researchers provided by the AISFB Hub, along with work referenced by various other posts in LessWrong/​EA forums, and by the organizations and researchers from their websites and papers.

More specifically, these lists include various AI organizations (ex. DeepMind’s safety team, MIRI, OpenAI…) and individual researchers (both academic and independent) currently working on the subject, summaries of papers and posts they have produced, and a number of guides and other resources for those trying to get into the field. All of these include some keyword tags for quicker scanning. Unfortunately, it is impossible to include every research direction and relevant piece of work while keeping this concise. Instead, I tried to limit paper selection to representative samples of the ideas being actively worked on, or explicit overviews of their agendas, while providing as many links as possible for those interested in looking deeper.

Still, with all of that said, I believe these documents can provide an easily shareable resource for anyone who either is themself or knows someone who is interested in transitioning into alignment research but is lacking information about how they might approach, learn about, or contribute to the field. Of course, if you just want to use it to check out some papers, that would work too. Thank you for reading!