(My suggestions) On Beginner Steps in AI Alignment
Summary: This post aims to summarise my beginner advice for others taking their first steps in AI Alignment. I link useful resources and make concrete suggestions related to upgrading your thinking, working with others and utilising the community to make decisions.
Epistemic status: I’ve only spent about 75 hours taking first steps toward AI alignment/safety work. I used to be a data scientist but I quit my job after getting a grant to work out how to contribute to AI safety. I think other resources are probably better in general (more in depth, come from more experience). Some non-AI researchers /beginners have suggested the advice and resources are good to share.
Thanks to all who read my draft and provided feedback.
If you’re reading this post, I assume you understand why so many people are concerned about AI Alignment. It’s a really scary topic and the possibilities in the not-so-different future have lots of people trying to work out how to help.
If you’re like me, you might spend a lot of time bouncing around between different forum posts, 1:1’s for advice and your own thinking. This post might help you reduce the time it takes to take concrete steps (not including technical development which is covered better elsewhere).
Let’s start with Failure Modes (how we might fail) from the perspective of the Alignment field which has had a large influx of new talent recently.
Possible Failure Modes
Rather than consider how individuals might succeed, we can consider the success of the AI Alignment field in general (\as it might interact with an influx of new people.
Some failure modes related to growth include: (% of my concern)
Upskilling many people in irrelevant ways that don’t help us solve Alignment. (70%).
Not adequately enabling new talent (20%).
Unnecessarily distracting current researchers (10%).
My % concern assigned to each is pretty subjective. I’m not an Alignment researcher and I’m interested in whether these concerns seem reasonable to others.
It’s hard to know if we’d be failing in each of these modes without inside views on questions like:
What Alignment strategies are currently most productive?
Where could more Alignment researchers be best utilized?
How productive might future AI researchers be if we heavily invest in them?
I used these failure modes and the resulting questions to motivate the advice below.
Advice
Think Well/Develop an inside-view
We can increase the chances of doing productive research, and upskilling in ways that will lead to doing productive research by generally thinking well. In part, this means taking problems seriously. Alignment is probably very hard and it’s likely that making meaningful steps to solving it isn’t easy.
Reading the Sequences on LessWrong is a good start if you want to think well in general. Thinking well, in general, is likely a pre-requisite to thinking well about specific topics like Alignment. I think doing this alongside other efforts is reasonable, which side-steps the question of prioritising this over things like learning machine learning.
Understanding Alignment well is likely also very important for ensuring the projects you work on are valuable (and done well). There is a range of opinions on who should form an inside-view (understanding the details of a topic), or what topics people should fold inside views about when it comes to Alignment. Neel Nanda has some thoughts on this worth reading, but you should also read Rohin Sha’s comment response.
Specific advice below will outline ways to start forming informed opinions, and inside views about Alignment topics. For beginners, I suspect that the failure mode of learning about the wrong things (since skills you pick up early with generalize) isn’t too concerning but the failure mode of working on stuff because that’s what you’re good at it rather than because there are good arguments for it is dangerous. Thinking well and having inside views should mitigate the chance of this misdirected focus.
Read the Sequences
The Sequences are a collection of blog posts on Rationality (how to think well). They were written to help future researchers have the cognitive skills to work on Alignment. Harry Potter and the Methods of Rationality is rationalist fan-fiction which can be a great introduction to these ideas (and is personally my favorite book). Audiobooks should be available for HPMOR and the sequences. Sequence highlights are also available for a shorter introduction.
If you can get over their quirkiness and think deeply about the related ideas, they will improve your ability to think. This is important for being able to engage with community/ideas currently surrounding Alignment and to think well which is critical to solving hard problems like Alignment.
Write about your ideas
Writing is an excellent way to learn.
Trying to ELI5 something can show you if you understand it, and sharing your writing with others might allow them to give you feedback. Reviewing older drafts can give you a sense later of how your ideas have developed since writing which can also be very useful.
It might be best to focus first on writing up your ideas, and later on distillations (summarising other people’s ideas) as bad distillations are a failure mode to avoid.
Publish on the EA Forum/LessWrong
Publishing your ideas on the forum is a good way to get feedback and quickly drop bad ideas/concepts. It is hard to get over the fear of being wrong or giving people bad advice, but that’s why we have comments and upvoting.
Personally, I’m aspiring to do a lot more of this. I have so many drafts I need to get out the door.
Read the EA Forum/Alignment Forum/LessWrong
A quick summary of posts useful for beginners interested in AI Alignment:
Levelling Up in AI Safety Research Engineering. A post summarising technical skills and ways to skill up for AI Safety Research Engineering. The first 3 levels can probably be pursued simultaneously.
How to pursue a career in technical AI Alignment. A comprehensive guide for those considering direct work on technical AI Alignment (which had input from many current researchers). Helpful for key decisions like whether to aim for conceptual or technical contribution. Has some great general advice.
Some posts to be aware of and come back to:
(My understanding of) What Everyone in Technical Alignment is Doing and Why. As of 2022, a summary of different organisations, researchers and strategies.
Most People Start With The Same Few Bad Ideas—LessWrong. I thought this was a really interesting post that is worth reading once you’ve got enough context to understand it. Understanding “the path of Alignment Maturity” seems important. John Wentworth has a sequence of rants that you might also enjoy if you like this post (I particularly like Godzilla Strategies).
Concrete Advice for Forming Inside Views on AI Safety—EA Forum. I linked this above about forming inside views. This is worth coming back to but be sure to read Rohin Sha’s comment.
Work with others near-your level
Working with others near your level has many benefits including motivation, support when you’re stuck, accountability buddies, exposure to good ideas/advice and practice collaborating. There’s been a lot of movement to create communities working upskilling in AI safety so finding others shouldn’t be hard.
The real challenge might be finding people at the same level as you at the same time as you, but asking around online is likely to help you find those people.
It would be very reasonable to write up questions that you and others are stuck on and post them on LessWrong/Twitter. The process of writing up this question can be very useful (so useful you might end up finding the answer yourself).
Of course, it’s still valuable to engage with more experienced researchers, and if you feel the need, my experience is that people are very happy to chat if you reach out. I’d be careful not to expect experienced researchers to have magic answers that solve all your problems. Rather they might model good thinking or point you towards valuable documents.
Join this discord server
This discord server might be worth joining for those interested in replicated ML papers while they upskill to work on AI Alignment.
Join this Slack group
This slack group might be worth joining for those interested in having accountability partners/groups while they upskill to work on AI Alignment.
Engage with the broader community
The EA/Lesswrong/Alignment communities and ecosystems are hugely valuable to anyone aspiring to contribute to Alignment. The ecosystem can help you:
Make important decisions
Accelerate your growth
Contribute at a meta-level.
Most importantly, the community can help you work out if Alignment is right for you, and if so how to contribute (such as technical vs governance, conceptual vs empirical technical work, researcher vs engineer). This guide by Richard Ngo is a good place to start. Discussing your specific case with 80k is where I’d go next, but talking to other people in the community might be helpful as well.
For accelerating your growth, reading/posting on the forum, doing courses, attending events, and having debates with people are useful things to do. I should note here that one personal failure mode might be over-engaging or pre-maturely engaging with the community. The forum alone can be overwhelming and has lots of activity on a range of topics. It’s hard to give general advice that works for everyone so the onus is somewhat on individuals to consume the right content for themselves.
Specific ways to interact with the ecosystem to maximise your contribution are listed below.
Talk to an 80k coach.
80k focuses on career advice, and I and others have had great experiences with them. They can help you identify how you might eventually contribute to AI Alignment and practical steps.
Do a course (or consume the content online)
If you’ve got the necessary prerequisites (knowledge/skill/time/location), then you might be suitable for these initiatives (or future iterations). I can’t speak with any detail about any of the advanced courses, but I recommend investigating these after you’ve done at least one of the introductory courses and have thought about (and maybe discussed with others) whether you want to be a researcher/engineer and how those courses advance your goals.
I want to highlight here that upskilling lots of people in AI capabilities is not obviously good. If as we skill up, we aren’t asking why what we’re learning is useful to solving Alignment or use those skills to further increase AI capabilities then that would be very bad/wasteful. This is why I also recommend engaging with LessWrong/EA community and content and thinking hard about how your efforts contribute to good outcomes.
Introductory:
The EA Cambridge AGI Safety Fundamentals courses (Technical Alignment/AI governance) are the best place to start and no prior understanding of deep learning is necessary (although it might be useful). I’ve read the content but hope to participate online in the next iteration. This course will get you up to date on many of the most prominent ideas in Alignment and if you complete it with a cohort will probably help you form a network and engage much more deeply.
The Intro to ML Safety Course (an Empirical AI Safety Program) also aims to be an introductory program for those with a Deep Learning background. I’ve also watched the content for this program and browsed the readings. It has a very different flavour to AIGSF and is much more focussed on empirical as opposed to conceptual Alignment topics. It covers a number of AI safety topics that might feel distinct for Alignment (like AI misuse) but I found these interesting too.
Online Deep learning courses. See Levelling Up in AI Safety Research Engineering for those.
Advanced: Bay Area
A research-oriented option for students that I’ve heard be highly recommended is SERI-MATS: “Over four weeks, the participants will develop an understanding of a research agenda at the forefront of AI Alignment through online readings and cohort discussions, averaging 10 h/week”. Tentatively, the next intake is in January.
Redwood Research (an Alignment organisation) has run a few Machine Learning Alignment Bootcamps which have “a focus on ML skills and concepts that are relevant to doing the kinds of Alignment research that we think seem most leveraged for reducing AI x-risk”.
Advanced: UK
If you are planning to work on conceptual, as opposed to empirical research, and you live in the UK or can travel to London, Refine, is a three-month paid fellowship “for helping aspiring independent researchers find, formulate, and get funding for new conceptual Alignment research bets, ideas that are promising enough to try out for a few months to see if they have more potential”. I’m not sure when the next intake is so this is something to keep an eye out for.
Another London program just getting started is ARENA. The goal of this program “is to skill people up to the point where they could become ML research engineers at AI safety teams” so the program is very different from Refine (which is focused on conceptual research). It will likely run for 2 −3 months.
Once you’ve completed courses, it’s important to apply your skills by doing tasks like distilling or writing about your own ideas (see above). Independent projects such as those described here and here might also be valuable next steps.
There are probably more courses/programs not listed here. Ask around!
Also, if you’re young enough/or are still studying, then getting an undergraduate degree in mathematics or computer science seems like an obvious thing to do. 80k can help with these kinds of decisions.
Apply for Funding
On a personal note, I received an FTX Future funding regrant this year, which enabled me to quit my job. I hear that there are lots of other ways to get funding and that you don’t need to be doing independent research already to get it. Read this post-section on funding if you think it might be valuable to you.
Move to the Bay Area (or another hub, or start an EA hub).
Uncomfortable as it might be to discuss, opportunities to contribute to AI Alignment are not evenly distributed. Berkeley and the Bay Area is where most activity occurs, so you might find it easiest to contribute there. Obviously, this is a big decision so make it with due caution (having tested your fit for alignment or maybe spoken to some people in the area first).
There are some Alignment organizations elsewhere (such as in the UK). Remote work is also possible (but it will probably be less productive).
In the future, AI Alignment organizations/researchers will likely be more geographically diverse, so if you are otherwise constrained, accelerating the creation of EA hubs and coordinating those with similar interests in your city might be valuable too.
Attend an EA Global or EAGx Conference
These events happen several times a year in major cities. They are invaluable for meeting like-minded individuals at all career stages and getting up to speed on what people are currently thinking about. If you can attending EAG San Francisco is likely best for engaging with the Alignment community.
Smaller EAGx events are also worth going to such as EAGx Berkeley.
More on the value of EAG events can be found here. For a sense of the kind of people you might meet at EA events, you can try the EA gathertown for remote working.
Final Thoughts
AI Alignment is a big scary problem, and having many people genuinely interested in helping is excellent. If you’re just starting out, I hope this is helpful.
For others, if you think there are any big omissions or mistakes in this post, please let me know!
These are helpful suggestions; thanks.
They seem aimed mostly at young adults starting their careers—which is fine, but limited to that age-bracket.
It might also be helpful for someone who’s an AI alignment expert to suggest some ways for mid-career or late-career researchers from other fields to learn more. That can be easier in some ways, harder in others—we come to AI safety with our own ‘insider view’ of our field, and those may entail very different foundational assumptions about human nature, human values, cognition, safety, likely X risks, etc. So, rather than learning from scratch, we may have to ‘unlearn what we have learned’ to some degree first.
For example, apart from young adults often starting with the same few bad ideas about AI alignment, established researchers from particular fields might often start with their own distinctive bad ideas about AI alignment—but those might be quite field-dependent. For example, psych professors like me might have different failure modes in learning about AI safety than economics professors, or moral philosophy professors.
Thanks, Geoffrey, I appreciate the response.
It was definitely not my goal to describe how experienced people might “unlearn what they have learned”, but I’m not sure that much of the advice changes for experienced people.
“Unlearning” seems instrumentally useful if it makes it easier for you to contribute/think well but using your previous experience might also be valuable. For example, REFINE thinks that conceptual research is not varied enough and is looking for people with diverse backgrounds.
This is a good example and I think generally I haven’t addressed that failure mode in this article. I’m not aware of any resources for mid or late-career professionals transitioning into alignment but I will comment here if I hear of such a resource, or someone else might suggest a link.
Strong endorse. This is good, and not Goodharted on genre-fitting or seeming professional.