New roles on my team: come build Open Phil’s technical AI safety program with me!
Open Phil announced two weeks ago that we’re hiring for over 20 roles across our teams working on global catastrophic risk reduction — and we’ll answer questions at our AMA starting tomorrow. Ahead of that, I wanted to share some information about the roles I’m hiring for on my team (Technical AI Safety). This team is aiming to think through what technical research could most help us understand and reduce AI x-risk, and build thriving fields in high priority research areas by making grants to great projects and research groups.
First of all — since we initially listed roles on Sep 29, we’ve added three new roles in Technical AI Safety that you might not have seen yet if you only saw the original announcement! In addition to the (Senior) Program Associate role that was there originally, we added an Executive Assistant role last week — and yesterday we added a (Senior) Research Associate role and a role for a Senior Program Associate specializing in a particular subfield of AI safety research (e.g. interpretability, alignment theory, etc). Check those out if they seem interesting! The Executive Assistant role in particular requires a very different, less technical skill set.
Secondly, before starting to answer AMA questions, I wanted to highlight that our technical AI safety giving is far away from where it should be at equilibrium, there is considerable room to grow, and hiring more people is likely to lead quickly to more and better grants. My estimate is that last year, we recommended around ~$25M in grants to technical AI safety, and so far this year I’ve recommended a similar amount. With more capacity for grant evaluation, research, and operations, we think this could pretty readily double or more.
All of our GCR teams (Technical AI Safety led by me, Capacity Building led by Claire Zabel, AI Governance and Policy led by Luke Muehlhauser, and Biosecurity led by Andrew Snyder-Beattie) are heavily capacity constrained right now — especially the teams that do work related to AI, given the recent boom in interest and activity in that area. I think my team currently faces even more severe constraints than other program teams. Compared to other teams, my team:
Is much smaller: Until literally last week, it was just me focusing primarily on technical AI safety (although Claire’s team sometimes funds technical AI safety work, primarily upskilling). Last week, Max Nadeau joined as my first Program Associate. In contrast, the capacity building team has eight people, and the biosecurity and AI governance teams each have five people.
Likely has worse “coverage” of its field:
Ideally, a robust and committed grantmaking team in a given field would:
Maintain substantive relationships with the most impactful / promising (say) 5-30% of existing grantees, potential grantees, and key non-grantee players (e.g. people working on AI safety in industry labs) in their field.
Have pretty robust systems for hearing about most of the plausible potential new grantees in their field (via e.g. application forms or strong referral networks).
Have the bandwidth to give non-trivial consideration to a large fraction of plausible potential grantees, in order to make an informed, explicit decision about whether to fund them and how much.
Have the bandwidth to retrospectively evaluate what came out of large grants or important categories of grant.
My team has absolutely nowhere near that level of coverage (for example, we haven’t had the time to open application forms or to get to know academics who could work on safety). While all our GCR program areas could use a lot more “field coverage,” my guess is that our coverage in technical AI safety is considerably worse than the coverage that at least Claire and Andrew get in their fields. Not only does this team have fewer people to cover its field with, the set of plausible potential players feels like it could well be larger, since large numbers of technical people have started to get a lot more interested in AI safety recently.
Has a more nascent strategy: While we’ve been funding technical AI safety research in one form or another since 2015, the program area has switched leadership and strategic direction multiple times, and the current iteration is pretty close to a fresh slate — we’ve closed out most of our old programs and are looking to build out a fresh stable of grantmaking initiatives from the ground up.
One reason our strategy is up in the air is that the team in its current iteration is very new, and advances in AI capabilities are rapidly changing the landscape of tractable research projects. I’ve led the program area for less than a year, and most of the grants I’ve made have been to new groups that didn’t exist before 2021 and/or to research projects that weren’t even practically feasible to do before the last couple of years. In contrast, other program leads have been building out a strategy for a few years or more.
Another big reason is that we have a huge number of unanswered questions about what technical projects we most want to see, what kind of results would most change our mind about key questions or move the needle on key safety techniques, and how we should prioritize between different streams of object-level work. For example, better answers to questions like these could change what research areas we go big on and what we pitch to potential grantees:
How can we tell how promising an interpretability technique is? What are the best “internal validity” measures of success? What are the best downstream tasks to measure?
What are the elements of an ideal model organism for misalignment, and what are the challenges to creating such a model?
What is the most compelling theory of change / path to impact for research on adversarial attacks and defenses, and what is the most exciting version of that kind of research?
Are there some empirical research directions inspired by the assistance games / reward uncertainty tradition which could be helpful even in a language model paradigm?
If you join the technical AI safety team in this round, you could help relieve some severe bottlenecks while building this new iteration of the program area from the ground up. If this sounds exciting to you, I strongly encourage you to apply!
Interestingly, these figures are actually considerably larger than annual technical AI safety giving in the several years before that, even though we had fewer full-time-equivalent staff working in the area in 2022 and 2023 compared to 2015-2021.
Initially, our program was led by Daniel Dewey. By around 2019, Catherine Olsson had joined the team, and eventually (I think by 2020-2021) it transitioned to being a team of three run by Nick Beckstead, who managed Catherine and Daniel, as well as Asya Bergal at half her time. In 2021, all three of Daniel, Catherine, and Nick left for other roles. For an interim period, there was no single point person: Holden was personally handling bigger grants (e.g. Redwood Research), and Asya was handling smaller grants (e.g. an RFP that Nick originally started and our PhD fellowship). Holden then moved on to direct work and Asya went full-time on capacity building. I began doing grantmaking in Oct 2022, and quickly ended up full-time handling FTXFF bailout grants. Since late January 2023 or so, I’ve been presiding over a more normal program area.