The Hubinger lectures on AGI safety: an introductory lecture series

In early 2023, I (Evan Hubinger) gave a series of recorded lectures to SERI MATS fellows with the goal of building up a series of lectures that could serve as foundational introductory material to a variety of topics in AGI safety. Those lectures have now been edited and are available on YouTube for anyone who would like to watch them.

The basic goal of this lecture series is to serve as longform, in-depth video content for people who are new to AGI safety, but interested enough to be willing to spend a great deal of time engaging with longform content, and who prefer video content to written content. Though we already have good introductory shortform video content and good introductory longform written content, the idea of this lecture series is to bridge the gap between those two.

Note that the topics I chose to include are highly opinionated: though this is introductory material, it is not intended to introduce the listener to every topic in AI safety—rather, it is focused on the topics that I personally think are most important to understand. This is intentional: in my opinion, I think it is far more valuable to have some specific gears-level model of how to think about AI safety, rather than a shallow overview of many different possible ways of thinking about AI safety. The former allows you to actually start operationalizing that model to work on interventions that would be valuable under it, something the latter doesn’t do.

The lecture series is composed of six lectures, each around 2 hours long, covering the topics:

  1. Machine learning + instrumental convergence

  2. Risks from learned optimization

  3. Deceptive alignment

  4. How to evaluate alignment proposals

  5. LLMs + predictive models

  6. Overview of alignment proposals

Each lecture features a good deal of audience questions both in the middle and at the end, the idea being to hopefully pre-empt any questions or confusions the listener might have.

The full slide deck for all the talks is available here.

EDIT: As a follow-up to the above lecture series covering some of my more recent work and thoughts on deceptive instrumental alignment, I would also recommend this talk I gave at FAR.

Crossposted from LessWrong (126 points, 0 comments)