I made an AI safety fellowship. What I wish I knew.


If you make an AI safety course: Leverage other people. Make sure to cover why AI safety is important. Build a community and just start. Use the below resources and slides.

Why Bother

Making an AI safety fellowship is highly worth doing. It has likely been a major update on my and other student’s career paths. For approximately 150 hours of work, it has been the highest impact thing I’ve done all year. I’ve learnt so much from teaching and have met some great people doing so. We made an 8-week AI safety course using the course materials from Blue Dot. There were around 9 people each week, mostly studying computer science. This is in person in Christchurch New Zealand. We will run next semester.

What I wish I knew:

Leverage other people.

It surprised me just how willing other people were to help us. It’s likely because AI safety fellowships are cool and useful. The diverse range of skills is something I could have never replicated myself. For example, our university computer science society helped us advertise to most of the cohort. ANZ AI Safety is giving us marketing advice. Our Effective Altruism group gave us some funding. We have a new co-facilitator next semester. The best way to organize this is by direct messaging people and online or in-person conversations.

Beware the AI buzz

Many people joined because of an interest in AI. They don’t know how important AI safety is. I recommend spending the first week discussing why AI safety is important. Blue Dots readings don’t spend enough time on this, likely as signups already recognize the importance of AI safety. We will likely do this by summarizing AI safety from first principles.

Build a community

People often stayed within their social groups. This made people less willing to share viewpoints they were unsure of. I should have encouraged a culture of being ok with being wrong. An intro pizza session would have helped.

Just start

I didn’t feel ready when I started. I’m glad I did so anyway. You don’t have to know everything. You will learn from teaching and can leverage the group’s combined knowledge. In the long term, having the initiative to start is what ends up giving you useful skills. Facilitating is fun, which helps you keep going. People will appreciate the effort, so will accept your mistakes.

Course materials

Blue Dot course readings are high quality. I highly recommend showing Andrej Karpathy’s Let’s build GPT: from scratch, in code, spelled out. Some students found this to be the most valuable session. I wish I had my slides when I started. They are adaptions of Blue Dots facilitation materials. I used them as a guide for conversations, not as a lecture.

My slides are ordered by week. Adapt these as needed.

AI safety and the years ahead
What is AI safety
Scalable Oversight
Reinforcement learning from human feedback
Mechanistic Interpretability
Technical governance
Contributing to AI alignment