A grand strategy to recruit AI capabilities researchers into AI safety research

Edit: I was told privately that the grand strategy was unclear. The proposed grand strategy is to:

  1. Convince everyone. Recruit all AI capabilities researchers to transition into AI safety until the alignment problem is solved.

  2. Scale 1-on-1 outreach to AI capabilities researchers (what seems to be working best so far), and learn from experience to optimize outreach strategies.

  3. Exponentially grow the AI safety movement by mentoring newcomers to mentor newcomers to mentor newcomers (especially AI capabilities researchers).

  4. Maximize the social prestige of the AI safety movement (especially from the perspective of AI capabilities researchers).

Original post: AI capabilities research seems to be substantially outpacing AI safety research. It is most likely true that successfully solving the AI alignment problem before the successful development of AGI is critical for the continued survival and thriving of humanity.

Assuming that AI capabilities research continues to outpace AI safety research, the former will eventually result in the most negative externality in history: a significant risk of human extinction. Despite this, a free-rider problem causes AI capabilities research to myopically push forward, both because of market competition and great power competition (e.g., U.S. and China). AI capabilities research is thus analogous to the societal production and usage of fossil fuels, and AI safety research is analogous to green-energy research. We want to scale up and accelerate green-energy research as soon as possible, so that we can halt the negative externalities of fossil fuel use.

My claim: A task that seems extremely effective on expectation (and potentially, maximally effective for a significant number of people) is to recruit AI capabilities researchers into AI safety research.

More generally, the goal is to convince people (especially, but not limited to, high-impact decision-makers like politicians, public intellectuals, and leaders in relevant companies) of why it is so important that AI safety research outpace AI capabilities research.

Logan has written an excellent post on the upside of pursuing trial-and-error experience in convincing others of the importance of AI alignment. The upside includes, but is not limited to: the development of an optimized curriculum for how to convince people, and the use of this curriculum to help numerous and widely located movement-builders pitch the importance of AI alignment to high-impact individuals. Logan has recently convinced me to collaborate on this plan. Please let us know if you are interested in collaborating as well! You can email me at pspark@math.harvard.edu.

I would also like to propose the following additional suggestions:

1) For behavioral-science researchers like myself, prioritize research that is laser-focused on (a) practical persuasion, (b) movement-building, (c) how to solve coordination problems (e.g., market competition, U.S.-China competition) in high-risk domains, (d) how to scale up the AI safety research community, (e) how to recruit AI capabilities researchers into AI safety, and (f) how to help them transition. Examples include RCTs (and data collection in general; see Logan’s post!) that investigate effective ways to scale up AI safety research at the expense of AI capabilities research. More generally, scientific knowledge on how to help transition specialists of negative-externality or obsolete occupations would be broadly applicable.

2) For meta-movement-builders, brainstorm and implement practical ways for decentralized efforts around the world to scale up the AI safety research community and to recruit talented people from AI capabilities research. Examples of potential contributions include the development of optimized curricula for training AI safety researchers or for persuading AI capabilities researchers, a well-thought-out grand strategy, and coordination mechanisms.

3) For marketing specialists, brainstorm and implement practical ways to make the field of AI safety prestigious among the AI researcher community. Potential methods to recruit AI capabilities researchers include providing fellowships and other financial opportunities, organizing enjoyable and well-attended AI-safety social events like retreats and conferences, and inviting prestigious experts to said social events. Wining-and-dining AI researchers might help too, to the extent that effective altruists feel comfortable doing so. Note that a barrier to AI safety researchers’ enthusiasm is the lack of traditional empirical feedback loops. (“I’m proud of this tangible contribution, so I’m going to do this more!”) So, keeping the AI safety research community enthusiastic, social, and prestigious will be in many ways a unique challenge.

An especially strong priority for the AI safety research community is to attract talented AI researchers from all over the world. What really matters is that AI safety research has a monopoly on the relevant talent globally, not just in the San Francisco Bay Area. Ideally, this means continually communicating with AI research communities in other countries, facilitating the immigration of AI researchers, persuading them to pursue AI safety research rather than capabilities research, and creating an environment conducive for immigrant AI researchers to stay (in, for example, the San Francisco Bay Area’s thriving AI safety community).

We should also seek to maximize the long-term prestige differential between AI safety research and AI capabilities research, especially in Silicon Valley (but also among the general public). Building this prestige differential would be difficult but potentially very impactful, given that the strategy of offering financial incentives to poach AI researchers may hit a wall at some point. This is because many of the industry-leading research teams are employed by deep-pocketed companies who are enthusiastic about AI capabilities research. But given that working at a fossil fuel company is not prestigious despite its high pay, a long-term prestige differential between AI safety research and AI capabilities research might be feasible even without a large pay increase in the favor of the former.

Finally, we really need to scale up the AI safety mentoring pipeline. I’ve heard anecdotally that many enthusiastic and talented people interested in transitioning into AI safety are “put on hold” due to a lack of mentors. They have to wait for a long time to be trained, and an even longer time to make any contributions that they are proud of. Sooner or later, they are probably going to lose their enthusiasm. This seems like a substantial loss of value. We should instead aim to be supercharging the growth of the AI safety community. This means the streamlining and widening the availability of AI safety training opportunities, especially to potential and current AI capabilities researchers, should be an absolutely top priority. Please feel free to email me at pspark@math.harvard.edu if you have any ideas on how to do this!

tl;dr. It is extremely important on expectation that AI safety research outpace AI capabilities research. Practically speaking, this means that AI safety research needs to scale up to the point of having a monopoly on talented AI researchers, and this must be true globally rather than just in the San Francisco Bay Area. We should laser-focus our efforts of persuasion, movement-building, and mentoring on this goal. Our grand strategy should be to continue doing so until the bulk of global AI research is on AI safety rather than AI capabilities, or until the AI alignment problem is solved. To put it bluntly, we shouldon all frontsscale up efforts to recruit talented AI capabilities researchers into AI safety research, in order to slow down the former in comparison to the latter.