Advice on Pursuing Technical AI Safety Research

1. Introduction

This post is a collection of tips and tricks for pursuing work in technical AI Safety (gathered by chatting with numerous Alignment researchers). A huge thanks to Richard Ngo, who provided all the advice on paper replication (section 2.1) and very kindly edited the rest. I hope this post is useful, though it is entirely possible I’ve gotten something wrong. If that’s the case, please do leave a comment letting me know! And don’t be afraid to ignore something if it sounds like it wouldn’t apply to you-none of this advice is a ‘hard rule’.

The following applies to both aspiring research scientists and research engineers, with a focus on the importance of machine learning (ML) programming. Since the best ML work has strong empirical components, the distinction between research scientist and research engineer is often blurred at AI Safety organisations – all but the most senior research scientists will need to dig into code (and this is often still true of senior roles).

2. Conducting independent research

When you’re first starting out, independent research can be daunting. Rather than trying to come up with totally novel projects, it’s often a better idea to start by replicating existing machine learning papers and then adding small modifications. The main goal of early independent work should be learning and practising skills, rather than making a significant, novel contribution. Being able to try new things quickly is incredibly helpful in doing good empirical research! Once you’re comfortable implementing and tinkering with neural networks, you’ll be in a good position to test out your own ideas.

2.1. Advice on paper replication

(from Richard Ngo)

i. Pick whichever paper will teach you the most. Start very simple—it’s much better if your project takes too little time rather than getting stuck halfway. The ideal goal is to have a finished and working deliverable you can show others. If you’re new to Machine Learning, you likely won’t want to start with a Reinforcement Learning or Alignment paper. Instead, opt for supervised learning papers (e.g. image classification with the CIFAR dataset).

ii. You should aim to get similar results to the original paper; however, this can sometimes be difficult due to cherry-picked seeds or omitted details. If you get inexplicably stuck, try emailing the authors with specific questions..

Once you’ve successfully replicated the original results, run the same algorithm on a new dataset or try adding small improvements. Get comfortable with not knowing what “should” happen. Log everything!

iii. Alignment work often requires reasoning about hypothetical agent behaviour, which has resulted in an important niche of conceptual/​theoretical work. If you’d like to pursue such work, implementation experience in ML is still valuable; at the very least, try playing around with state-of-the-art models.

For example, what’s the most advanced task you can get GPT-3 to perform? What’s the least advanced task it fails at? Large language models are very new—you don’t need much context to learn something about them that nobody has discovered before. Play around and then write about it!

iv. Writing a lot is the best way to get into conceptual/​theoretical work. The easiest starting point is to summarise existing work. Once you’ve done this, you can try extending the work with your own thoughts. Lower your standards until you’re regularly producing stuff, and when in doubt, don’t put off publishing your work (LessWrong is a good place to do this. You can also apply for membership to the Alignment Forum). As you produce your summaries, always look for links to existing results or testable hypotheses.

You can also consider emailing your work to alignment researchers and asking for feedback. You’re not guaranteed a response; however, this is a more concrete request than generally asking for a call or mentorship (especially if you have specific questions you’d like them to answer). If you receive feedback, a great next step is to implement it and update the researcher. This demonstrates that their input will have an impact. It can also be helpful to try contacting more junior researchers who are less likely to be overwhelmed with such requests.

Once you’ve produced even a few things, you should probably start applying to places. Don’t delay that crucial step by underrating yourself! If you’re rejected, you can re-apply later.

v. If pursuing independent projects poses a financial burden, apply for funding with the Long-term Future Fund. The application is relatively short and often useful for clarifying goals, regardless of the outcome. If you’re not sure whether applying is worth it, we recommend you at least open the application and take a look – it might be a lot less work than you’re imagining!

Lastly, good luck! Learning the skills needed for alignment research isn’t easy, but we appreciate everyone who’s putting effort towards it.

3. Applying for AI Safety research training programs

There are several safety-focussed training programs that can help you develop skills and access mentorship. It should be noted that these roles can be competitive and facing rejection is totally normal–it doesn’t mean you’re not capable. Consider applying to these programs, even if it seems unlikely you’ll get in.

3.1 Some resources to help you find programs/​jobs:

If you’re fairly new to the AI Safety research landscape and want to learn more about the field in general, EA Cambridge’s AGI Safety Fundamentals program is a great place to start.

3.2 Skill up with AI Safety-adjacent research training programs

3.2.1 For Students

The EA Internships Board has summer research positions related to alignment, many ideal/​intended for students. In addition, it’s usually a good idea to pursue opportunities that are not alignment specific, but will teach you relevant skills (e.g. ML research at your local university).

If you’re interested in graduate school, two important considerations are references and publications. You’ll want to put yourself in the best position when it’s time to apply, which usually means: a) prioritise positions that have the highest potential to result in a conference submission or publication, and b) prioritise supervisors who are well-known in your field or whom you can work with closely to ensure a detailed reference. These tend to be more important than focusing on direct alignment work early-on, since you’re still skilling up and want to have a generally competitive graduate school application. Finally, don’t forget to consider personal fit and prioritise work environments that are conducive to your learning.

3.2.2 For Professionals

If you’re pursuing industry work at an alignment org, go ahead and apply as early as possible! To reiterate, it’s simply too easy to under-rate your skills and put off applying. Rejection is expected/​common early on and can inform your next steps, which will often be to either: a) apply for funding to conduct independent work/​study, or b) skill-up at any ML organisation. Gaining related industry experience will set you up well for direct alignment work later; and on the other hand, skilling up in ML generally has high returns, offering several possible career options. It’s also worth noting that technical roles, such as software engineering, are in demand across other effective altruist cause areas (e.g. biosecurity). If you’d like to explore other EA-related causes that need your skills, check out 80,000 hours career advising.

3.2.3 Funding options for independent research

  • Apply to the Long-Term Future Fund

  • There’s also a comprehensive list of funding sources here (though they won’t all be relevant)

4. Sign-up for AI Safety-specific Career Coaching

If you’d like help solidifying your next steps in pursuing Technical AI Safety, consider signing-up for AI Safety Support’s career coaching. This is open to everyone, from total beginners to students to more senior career professionals looking to transition.