(Even) More Early-Career EAs Should Try AI Safety Technical Research

tlevinJun 30, 2022, 9:14 PM

86 points

Research careers Career choice AI safety AI alignment AI governance

[May 2023 edit: I no longer endorse the conclusion of this post and think that most early-career EAs should be trying to contribute to AI governance, either via research, policy work, advocacy, or other stuff, with the exception of people with outlier alignment research talent or very strong personal fits for other things. I do stand by a lot of the reasoning, though, especially of the form “pick the most important thing and only rule it out if you have good reasons to think personal fit differences outweigh the large differences in importance.” The main thing that changed was that governance and policy stuff now seems more tractable than I thought, while alignment research seems less tractable.]

AI Safety Technical Research (AISTR) remains an extremely neglected career path, with only 100-200 people doing full-time AISTR.^[1] While that number might grow substantially in the next few years as the current generation of EAs in college makes career choices, I believe the number will still be shockingly low in a few years (as an upper bound, less than 1,000).^[2]

This is the case despite a large and growing share of EAs who agree in theory that AI x-risk is the most important problem and years of 80,000 Hours strongly advocating AISTR careers. In my experience, even in settings where this view is a strong consensus (and where quantitative skills are abundant), I find that a lot of people who seem like they could be good at AISTR are not planning on doing so — and have not given AISTR a serious try. I think this is mostly for misguided reasons, and there are benefits to studying AI safety even if you don’t go into AISTR.

I claim that since AISTR seems like the highest-EV path for ambitious and smart people early in their careers, ~~most~~ the plurality of early-career EAs should at least try this path for a few months (or until they get good firsthand information that they aren’t suited for it). Trying it for a few months might mean reading this starter pack, taking an AI course, doing EA Cambridge’s Technical Alignment Curriculum, and/or trying to write up some distillations of existing research. For more, see “How to pursue a career in technical AI alignment.”^[3] (Importantly, if at any point in this early exploration you find yourself completely bored and miserable, you can simply quit there!)

The best reason not to do AISTR, if my core claims are right, is that community-building, ~~(and maybe~~ massively scalable projects, or policy) could be even higher-impact, depending on your personal fit for those paths. However, I argue that most community-builders and policy professionals should build ~~some fairly deep~~ basic familiarity with the field in order to do their jobs effectively. [Edit: following a comment from Locke_USA, I want to tone down some of the claims relating to policy careers. Locke points out that AI policy research is also extremely neglected; in my own research I have found that there are lots of really important questions that could use attention including from people without much technical background. Relatedly, they point out that getting deep technical knowledge has opportunity costs in building policy-related knowledge and skills, and while technical credentials like a CS degree could be useful in policy careers, self-studying ML for a few months could actually be negative. I think these are strong points, and I’ve modified a few claims above and below via strikethrough.^[4]]

Epistemic status: I’m pretty confident (c. 90%) that the optimal number of EAs planning on doing AISTR is higher than the status quo, less confident (c. 70%) that >50% of early-career EAs should try it, mostly because the “super-genius” thing discussed below seems plausible.

The core case for trying

The core case is very simple, so I’ll just state it here, note a couple other advantages, and explore the ways it could be wrong or not apply to an individual case:

You might be very good at a lot of different things, but it is hard to know whether you could be very good at a specific thing until you try it. I think the kinds of EAs who might get really good at policy or philosophy stuff tend to be good, model-based thinkers in general, which means they have some chance of being able to contribute to AISTR.
So, you should at least try the thing that, if you were very good at it, you would have the highest EV. At a first approximation, it seems like the differences between the impact of various careers are extremely large, so an impact-maximizing heuristic might mean something like “go down the list of highest-upside-if-true hypotheses about what you could be good at, in order.”
That thing is AISTR. AI seems potentially very powerful, and it seems like (in part because so few people are trying) we haven’t made tons of progress on making sure AI systems do what we want. While other careers in the AI space — policy work, community-building, ops/infrastructure, etc. — can be very highly impactful, that impact is predicated on the technical researchers, at some point, solving the problems, and if a big fraction of our effort is not on the object-level problem, this seems likely to be a misallocation of resources.^[5]

These three together should imply that if you think you might be very good at AISTR — even if this chance is small — you should give it a shot.

It also seems like technical-adjacent roles could be very impactful, i.e., people can make valuable contributions without being the ones solving very hard math problems. As John Wentworth writes: “Many technical alignment researchers are bad-to-mediocre at writing up their ideas and results in a form intelligible to other people. And even for those who are reasonably good at it, writing up a good intuitive explanation still takes a lot of work, and that work lengthens the turn-time on publishing new results.” Being technically fluent but not a “super-genius” (see below) would mean you could significantly improve communication around the field about new alignment concepts, which seems incredibly high-EV.

Other reasons to try

If your other career plans involve things related to AI, like AI policy/strategy or EA community-building, having an inside view on AI safety seems really important, and taking some first steps on alignment research seems like a great way to start building one.

AI Policy/Strategy

For people who eventually decide to do AI policy/strategy research, early exploration in AI technical material seems clearly useful, in that it gives you a better sense of how and when different AI capabilities might develop and helps you distinguish useful and “fake-useful” AI safety research, which seems really important for this kind of work. (Holden Karnofsky says “I think the ideal [strategy] researcher would also be highly informed on, and comfortable with, the general state of AI research and AI alignment research, though they need not be as informed on these as for the previous section [about alignment].)

Even people who decide to follow a path like “accumulate power in the government/private actors to spend it at critical AI junctures,” it seems very good to develop your views about timelines and key inputs; otherwise, I am concerned that they will not be focused on climbing the right ladders or will not know who to listen to. ~~Spending a few months really getting familiar with the field, and then spending a few hours a week staying up to date, seems sufficient for this purpose.~~ I’m convinced by Locke_USA’s comment that for applied policy research or government influence, the optimal allocation of your time is very unlikely to include months of AISTR training, especially outside traditional contexts like CS coursework in college.

Community-Building

First of all, a huge part of a community-builder’s job is identifying the most promising people and accelerating their path to high impact as quickly as possible. In the context of AI alignment (which I claim is the most important cause area in which to build community), this involves the skill of identifying when someone has just said an interesting thought about alignment, which means you probably need to be conversant on some level in alignment research. More generally, a pair of recent forum posts from Emma Williamson and Owen Cotton-Barratt emphasize the benefits of community-builders spending time getting good at object-level things, and AI safety seems like an obvious object-level thing to spend some time on, for the above reasons.

It also seems like community-builders being better informed on AISTR would make us a more appealing place for people who could do very good AISTR work, whom we want to attract. Communities, as a rule, attract people who like doing what people in the community do; if the community primarily talks about EA in fellowship-style settings and strategizes about outreach, it’ll mostly attract people who like to talk about EA. If the community is full of people spending significant amounts of time learning about AI, it will instead attract people who like to do that, which seems like a thing that reduces existential risks from AI.^[6] I am, consequently, very bullish on the Harvard AI Safety Team that Alexander Davies started this past spring as a community-building model.

Broader worldview-building benefits

Finally, the path towards trying AISTR yields benefits along the way. Learning some coding is broadly useful, and learning some ML seems like it builds some good intuitions about human cognition. (My evidence for this is just the subjective experience of learning a little bit about ML and it producing interesting thoughts about the human learning process, which was helpful as I wrote curricula and lesson plans for EA fellowships and a formal academic course.) Also, insofar as learning about the technical alignment field builds your sense of what the future might look like (for the reasons articulated in the policy/strategy section), it could just help you make better long-run decisions. Seems useful.

Uncertainties and caveats

Reasons this could be wrong, or at least not apply to an individual person, follow from each of the three core claims:

Some people already know they will not be very good at it. In my understanding, people in the field disagree about the value of different skills, but it seems like you should probably skip AISTR if you have good evidence that your ~~shape rotator~~ quantitative abilities aren’t reasonably strong — e.g., you studied reasonably hard for SAT Math but got <700 (90th percentile among all SAT takers).^[7] [Edit: comments from Linch and Lukas have convinced me that “shape rotator” is a counterproductive term here, so I’ve replaced it with “quantitative.”]
Other traits that should probably lead you to skip the AISTR attempt: you’re already in a position where the opportunity cost would be very high, you have good reason to think you could be exceptionally good at another priority path, or you have good reason to think you are ill-suited for research careers. But I emphasize good reason because I think people’s bar for reaching this conclusion is too low.^[8]
a.) Being world-class at a less important thing could be more impactful than being very good at the most important thing. People in the AISTR field also seem to disagree about the relative value of different skill levels. That is, some people think we need a handful of Von Neumann-style “super-geniuses” to bring about paradigm shifts and “mere geniuses” will just get in the way and should work on governance or other issues instead. Others, e.g. at Redwood Research and Anthropic, seem to think that “mere geniuses” doing ML engineering, interpretability research, “AI psychology,” or data labeling are an important part of gaining traction on the problem. If the former group are right, then merely being “very good” at alignment — say, 99th percentile — is still basically useless. And since you probably already know whether you’re a super-genius, based on whether you’re an IMO medalist or something, non-super geniuses would therefore already have good reason to think they won’t be useful (see point 1 above).
But my sense is that there’s enough probability that Redwood et al are right, or that skills like distillation are important and scarce enough, that “mere geniuses” should give it a try, and I think there are probably “mere geniuses” who don’t know who they are (including for racial/gendered reasons).
b.) You might produce more total AISTR-equivalents by community-building. The classic “multiplier effect” case: if you spend 5% of your career on community-building, you only need to produce 0.05 “you-equivalents” for this to be worth it. I hope to address this in a later post, but I think it’s actually much harder for committed EAs to produce “you-equivalents” than it seems. First of all, impact is fat-tailed, so even if you counterfactually get, say, three people to “join EA,” this doesn’t mean you’ve found 3 “you-equivalents” or even 0.05. Secondly, you might think of spending a year community-building now as equivalent to giving up the last year of your impact, not the first, so the bar could be a lot higher than 0.05 you-equivalents.^[9]
AISTR might not be the most important thing. You could think governance is even more important than technical research (e.g. political lock-in is a bigger problem than misaligned AI), that longtermism is wrong, or that biorisk is more pressing. This is well beyond the scope of this post; I will just say that I don’t agree for the reasons listed in “the core case for trying” above.
Edit: both in light of Locke’s comment and to be consistent with my earlier claim about massively scalable projects, I should say that if you could be great at other neglected kinds of research, at influencing key decisions, or at management, this could definitely outweigh the importance/neglectedness of AISTR. I don’t want to convince the next great EA policy researcher that they’re useless if they can’t do AISTR. My intention with this post is mostly to convince people that if you’re starting from a prior that career impact is very fat-tailed, it takes good evidence from personal fit (see #1 above) to move to the second- or third-most (etc.) most impactful careers. As I argue in #2 of the “core case for trying,” based on this fait-tailed-ness, I think the best career search process starts at the top and proceeds downward with good evidence, but the default process is more likely to look like the socio-emotional thing I describe below, followed by rationalizing this choice.

So, why are so few EAs doing AISTR?

I think EAs overrate how many other EAs are doing AI safety technical research because the few hundred people who are doing AISTR tend to be disproportionately visible in the community (perhaps because being super enmeshed in the EA/longtermist community is a strong factor in deciding to do AISTR).

This leads EAs who aren’t already doing AISTR to think, “My comparative advantage must not be in AISTR, since there are so many people way smarter than me [which sometimes means ‘people fluent in concepts and vocabulary that I might be able to learn with a couple weeks of focused effort’] who are doing technical research, but at least I can hang in policy and philosophy discussions.” This kind of comparative identity seems to be very powerful in shaping what we think we can or can’t do well. I observe that we prefer to spend more time and focus on things we’re already pretty good at compared to the people around us, probably as a social-status and self-esteem thing. (This is part of a broader observation that career decisions among high-achieving students are primarily identity-driven, social, and emotional, and highly responsive to status incentives.)

But reactive identity formation is a very bad way of making career decisions if your goal is to positively affect the world. Adding 0.01% of quality-adjusted work to AISTR is worth more than adding 0.1% of quality-adjusted policy/philosophy work if AISTR is >10x more valuable. It might mean a somewhat lower place on various totem poles, but I hope our egos can take that hit to reduce the odds that we go extinct.

Thanks to Alexander Davies, Nikola Jurkovic, Mauricio, Rohan Subramani, Drake Thomas, Sarthak Agrawal, Thomas Kwa, and Caleb Parikh for suggestions and comments, though they do not necessarily endorse all the claims herein (and may not have read the final version of the post). All mistakes are my own.

Appendix by Aaron Scher: How To Actually Try AISTR

Trevor’s note: this is, in Aaron’s words, “Aaron Scher’s opinion, highly non-expert”; I’m including this more as a plausible path rather than a proposed default. Another idea I’ve heard from MIRI folks that I like is “write a solution to the alignment problem, right now, with your current state of knowledge, and then get feedback on why it doesn’t work, which will likely direct you to learning the relevant math and statistics and CS, and iterate this until you start having good novel insights.” Finally, you might just try building overall software engineering skills as a first step, which seems like a good career fit for some people anyway. With any of these, your mileage may vary.

Anyway, here’s Aaron’s suggestion:

Step 0: Spend at least 6 hours trying to find solutions for ELK
- If these are some of the worst 6 hours of your week, maybe AISTR isn’t for you.^[10]
- If it’s kinda fun or interesting go do the longer steps below
Step 1: Learn the basics of the Alignment space and key question
- AGISF is a good place to start
- If you have the motivation to learn on your own, you could just do the AGISF curriculum on your own, or some other reading lists (linked here). Learning on your own can be hard, though
Step 2: Distill some research
- Deep dive into a topic, read the key papers and the things they cite etc. Then summarize one or some of these papers in order to make the content more digestible or understandable for a wider audience.
- There is a need in the community for distillers
- Part of the reason to do this is that good distillations make Step 1 easier for other people
Step 3: Understand key research agendas from top researchers
- Questions to ask: What is each researcher working on? Why is this person working on the things they are? How does this research agenda reduce existential risk from AI? How do they break up large topics into smaller questions?
- See this list from AI Safety Support
Step 4: Dive into questions you’re interested in and do original research
- Mentorship seems useful for doing original research, so does have some support. If you’ve made it this far, consider applying for funding or applying for a program
Flaws in this approach:
- It will take a long time. Step 1 takes ~20 hours minimum, but up to like 150 hours, Step 2 probably ~8 hours minimum, Step 3 probably 8 hours minimum, Step 4??

^
This order-of-magnitude estimate aggregates a few sources: “fewer than 100” as of 2017, “around 50” as of 2018, 430 on the EA survey working on AI (though this includes governance people), and a claim that Cambridge’s AI Safety Fundamentals Technical Track had “close to 100 facilitators” (but most of them don’t actually work in AISTR).
^
At EAGxBoston, 89 out of ~970 attendees listed “AI Safety Technical Research” as an “Area of Expertise.” But attendees seem to interpret the term “expertise” loosely. The alphabetically first 5 of these included 4 undergraduates (including one whom I know personally and does not plan to go into AISTR) and 1 PhD student. Naively, if there are 10,000 EAs and EAGxBoston was a representative sample by issue area, this would imply ~700 people interested in AISTR in EA, including undergrads but excluding non-EA researchers, and we should then downweight since AISTR people seem disproportionately likely to go to EAGx conferences (see the “Why Are So Few EAs Doing AISTR?” section) and many of these will probably do something else full-time.
^
Thanks to Michael Chen’s comment for suggesting this.
^
Please do not take this to mean that I don’t find any of the other comments persuasive! Several raise good objections to specific claims, but this is the only one that I think actually forces a modification of the core argument of the post.
^
Much of this language comes from an excellent comment by Drake Thomas.
^
This is largely inspired by Eli Tyre’s critiques of EA-community-building-as-Ponzi-scheme.
^
Note the “e.g.” — I know this is an imperfect proxy, possibly neither precise nor accurate. That is, I imagine some people in the field would say >>90th percentile shape rotator skills are probably necessary. On the other hand, as Neel Nanda pointed out to me, some people might get lower SAT math scores but still be useful AISTRs; maybe you have executive function issues (in which case, see if you have ADHD!), or maybe you’re excellent in some other way such that you could be a great software engineer but aren’t great at visuospatial reasoning. As roon points out, “world-famous programmer of ReactJS Dan Abramov” admits to not being able to rotate a cube in his head. Seems like it would be pretty useful to establish what the lower bounds of math ability are; in the words of many a great scholar, further research is needed.
^
For example, I think my own reasoning for going into policy, “I am very interested in policy, am a good writer and talker, and am a US citizen,” was insufficient to skip this stage, but if I had gotten rapidly promoted in a competitive policy job that would have been pretty good. Likewise, an identity like “I’m a people person” is a bad reason not to do research; an experience like working on a long project with weak feedback mechanisms and hating it for those reasons is more convincing.
^
My model for policy careers is that delaying by one year costs 10-20% of your lifetime impact (assuming roughly 20-year AI timelines, over which time you get ~4 promotions that ~3x your impact). Maybe AISTR careers are less back-loaded, in which case the number would be lower.
^
Note, however, that some people just “bounce off” ELK or other angles on AISTR but find other angles really interesting.

What links here?