The Intro to ML Safety course covers foundational techniques and concepts in ML safety for those interested in pursuing research careers in AI safety, with a focus on empirical research.
We think it’s a good fit for people with ML backgrounds who are looking to get into empirical research careers focused on AI safety.
Intro to ML Safety is an 8-week virtual course that aims to introduce students with a deep learning background to the latest empirical AI Safety research. The program introduces foundational ML safety concepts such as robustness, alignment, monitoring, and systemic safety.
The course takes 5 hours a week, and consists of a mixture of:
Assigned readings and lecture videos (publicly available at course.mlsafety.org)
Homework and coding assignments
A facilitated discussion session with a TA and weekly optional office hours
The course will be virtual by default, though in-person sections may be offered at some universities.
The Intro to ML Safety curriculum
The course covers:
Hazard Analysis: an introduction to concepts from the field of hazard analysis and how they can be applied to ML systems; and an overview of standard models for modelling risks and accidents.
Robustness: Robustness focuses on ensuring models behave acceptably when exposed to abnormal, unforeseen, unusual, highly impactful, or adversarial events. We cover techniques for generating adversarial examples and making models robust to adversarial examples; benchmarks in measuring robustness to distribution shift; and approaches to improving robustness via data augmentation, architectural choices, and pretraining techniques.
Monitoring: We cover techniques to identify malicious use, hidden model functionality and data poisoning, and emergent behaviour in models; metrics for OOD detection; confidence calibration for deep neural networks; and transparency tools for neural nets.
Alignment: We define alignment as reducing inherent model hazards. We cover measuring honesty in models; power aversion; an introduction to ethics; and imposing ethical constraints in ML systems.
Systemic Safety: In addition to directly reducing hazards from AI systems, there are several ways that AI can be used to make the world better equipped to handle the development of AI by improving sociotechnical factors like decision making ability and safety culture. We cover using ML for improved epistemics; ML for cyberdefense; and ways in which AI systems could be made to better cooperate.
Additional X-Risk Discussion: The last section of the course explores the broader importance of the concepts covered: namely, existential risk and possible existential hazards. We cover specific ways in which AI could potentially cause an existential catastrophe, such as weaponization, proxy gaming, treacherous turn, deceptive alignment, value lock-in, and persuasive AI. We introduce some considerations for influencing future AI systems; and introduce research on selection pressures.
How is this program different from AGISF?
If you are interested in an empirical research career in AI safety, then you are in the target audience for this course. The ML Safety course does not overlap much with AGISF, so we expect that participants who both have and have not previously done AGISF to get a lot out of Intro to ML Safety.
Intro to ML Safety is focused on ML empirical research rather than conceptual work. Participants are required to watch recorded lectures and complete homework assignments that test their understanding of the technical material.
The program will last 8 weeks, beginning on June 12th and ending on August 14th.
Participants are expected to commit around 5-10 hours per week. This includes ~1-2 hours of recorded lectures, ~2-3 hours of readings, ~2 hours of written assignments, and 1.5 hours of in person discussion.
In order to give more people the opportunity to study ML Safety, we will provide a $500 stipend to eligible students who complete the course
Eligibility
This is a technical course. A solid background in deep learning is required.
Intro to ML Safety virtual program: 12 June − 14 August
The Intro to ML Safety course covers foundational techniques and concepts in ML safety for those interested in pursuing research careers in AI safety, with a focus on empirical research.
We think it’s a good fit for people with ML backgrounds who are looking to get into empirical research careers focused on AI safety.
Intro to ML Safety is run by the Center for AI Safety and designed and taught by Dan Hendrycks, a UC Berkeley ML PhD and director of the Center for AI Safety.
Apply to be a participant by May 22nd
Website: https://course.mlsafety.org/
About the Course
Intro to ML Safety is an 8-week virtual course that aims to introduce students with a deep learning background to the latest empirical AI Safety research. The program introduces foundational ML safety concepts such as robustness, alignment, monitoring, and systemic safety.
The course takes 5 hours a week, and consists of a mixture of:
Assigned readings and lecture videos (publicly available at course.mlsafety.org)
Homework and coding assignments
A facilitated discussion session with a TA and weekly optional office hours
The course will be virtual by default, though in-person sections may be offered at some universities.
The Intro to ML Safety curriculum
The course covers:
Hazard Analysis: an introduction to concepts from the field of hazard analysis and how they can be applied to ML systems; and an overview of standard models for modelling risks and accidents.
Robustness: Robustness focuses on ensuring models behave acceptably when exposed to abnormal, unforeseen, unusual, highly impactful, or adversarial events. We cover techniques for generating adversarial examples and making models robust to adversarial examples; benchmarks in measuring robustness to distribution shift; and approaches to improving robustness via data augmentation, architectural choices, and pretraining techniques.
Monitoring: We cover techniques to identify malicious use, hidden model functionality and data poisoning, and emergent behaviour in models; metrics for OOD detection; confidence calibration for deep neural networks; and transparency tools for neural nets.
Alignment: We define alignment as reducing inherent model hazards. We cover measuring honesty in models; power aversion; an introduction to ethics; and imposing ethical constraints in ML systems.
Systemic Safety: In addition to directly reducing hazards from AI systems, there are several ways that AI can be used to make the world better equipped to handle the development of AI by improving sociotechnical factors like decision making ability and safety culture. We cover using ML for improved epistemics; ML for cyberdefense; and ways in which AI systems could be made to better cooperate.
Additional X-Risk Discussion: The last section of the course explores the broader importance of the concepts covered: namely, existential risk and possible existential hazards. We cover specific ways in which AI could potentially cause an existential catastrophe, such as weaponization, proxy gaming, treacherous turn, deceptive alignment, value lock-in, and persuasive AI. We introduce some considerations for influencing future AI systems; and introduce research on selection pressures.
How is this program different from AGISF?
If you are interested in an empirical research career in AI safety, then you are in the target audience for this course. The ML Safety course does not overlap much with AGISF, so we expect that participants who both have and have not previously done AGISF to get a lot out of Intro to ML Safety.
Intro to ML Safety is focused on ML empirical research rather than conceptual work. Participants are required to watch recorded lectures and complete homework assignments that test their understanding of the technical material.
You can read about more the ML safety approach in Open Problems in AI X-risk.
Time Commitment
The program will last 8 weeks, beginning on June 12th and ending on August 14th.
Participants are expected to commit around 5-10 hours per week. This includes ~1-2 hours of recorded lectures, ~2-3 hours of readings, ~2 hours of written assignments, and 1.5 hours of in person discussion.
In order to give more people the opportunity to study ML Safety, we will provide a $500 stipend to eligible students who complete the course
Eligibility
This is a technical course. A solid background in deep learning is required.
If you don’t have this background, we recommend Week 1-6 of MIT 6.036 followed by Lectures 1-13 of the University of Michigan’s EECS498 or Week 1-6 and 11-12 of NYU’s Deep Learning.
Apply to be a participant by May 22nd
Website: https://course.mlsafety.org/