New roles on my team: come build Open Phil’s technical AI safety program with me!
Open Phil announced two weeks ago that we’re hiring for over 20 roles across our teams working on global catastrophic risk reduction — and we’ll answer questions at our AMA starting tomorrow. Ahead of that, I wanted to share some information about the roles I’m hiring for on my team (Technical AI Safety). This team is aiming to think through what technical research could most help us understand and reduce AI x-risk, and build thriving fields in high priority research areas by making grants to great projects and research groups.
First of all — since we initially listed roles on Sep 29, we’ve added three new roles in Technical AI Safety that you might not have seen yet if you only saw the original announcement! In addition to the (Senior) Program Associate role that was there originally, we added an Executive Assistant role last week — and yesterday we added a (Senior) Research Associate role and a role for a Senior Program Associate specializing in a particular subfield of AI safety research (e.g. interpretability, alignment theory, etc). Check those out if they seem interesting! The Executive Assistant role in particular requires a very different, less technical skill set.
Secondly, before starting to answer AMA questions, I wanted to highlight that our technical AI safety giving is far away from where it should be at equilibrium, there is considerable room to grow, and hiring more people is likely to lead quickly to more and better grants. My estimate is that last year, we recommended around ~$25M in grants to technical AI safety,[1] and so far this year I’ve recommended a similar amount. With more capacity for grant evaluation, research, and operations, we think this could pretty readily double or more.
All of our GCR teams (Technical AI Safety led by me, Capacity Building led by Claire Zabel, AI Governance and Policy led by Luke Muehlhauser, and Biosecurity led by Andrew Snyder-Beattie) are heavily capacity constrained right now — especially the teams that do work related to AI, given the recent boom in interest and activity in that area. I think my team currently faces even more severe constraints than other program teams. Compared to other teams, my team:
Is much smaller: Until literally last week, it was just me focusing primarily on technical AI safety (although Claire’s team sometimes funds technical AI safety work, primarily upskilling). Last week, Max Nadeau joined as my first Program Associate. In contrast, the capacity building team has eight people, and the biosecurity and AI governance teams each have five people.
Likely has worse “coverage” of its field:
Ideally, a robust and committed grantmaking team in a given field would:
Maintain substantive relationships with the most impactful / promising (say) 5-30% of existing grantees, potential grantees, and key non-grantee players (e.g. people working on AI safety in industry labs) in their field.
Have pretty robust systems for hearing about most of the plausible potential new grantees in their field (via e.g. application forms or strong referral networks).
Have the bandwidth to give non-trivial consideration to a large fraction of plausible potential grantees, in order to make an informed, explicit decision about whether to fund them and how much.
Have the bandwidth to retrospectively evaluate what came out of large grants or important categories of grant.
My team has absolutely nowhere near that level of coverage (for example, we haven’t had the time to open application forms or to get to know academics who could work on safety). While all our GCR program areas could use a lot more “field coverage,” my guess is that our coverage in technical AI safety is considerably worse than the coverage that at least Claire and Andrew get in their fields. Not only does this team have fewer people to cover its field with, the set of plausible potential players feels like it could well be larger, since large numbers of technical people have started to get a lot more interested in AI safety recently.
Has a more nascent strategy: While we’ve been funding technical AI safety research in one form or another since 2015, the program area has switched leadership and strategic direction multiple times,[2] and the current iteration is pretty close to a fresh slate — we’ve closed out most of our old programs and are looking to build out a fresh stable of grantmaking initiatives from the ground up.
One reason our strategy is up in the air is that the team in its current iteration is very new, and advances in AI capabilities are rapidly changing the landscape of tractable research projects. I’ve led the program area for less than a year, and most of the grants I’ve made have been to new groups that didn’t exist before 2021 and/or to research projects that weren’t even practically feasible to do before the last couple of years. In contrast, other program leads have been building out a strategy for a few years or more.
Another big reason is that we have a huge number of unanswered questions about what technical projects we most want to see, what kind of results would most change our mind about key questions or move the needle on key safety techniques, and how we should prioritize between different streams of object-level work. For example, better answers to questions like these could change what research areas we go big on and what we pitch to potential grantees:
How can we tell how promising an interpretability technique is? What are the best “internal validity” measures of success? What are the best downstream tasks to measure?
What are the elements of an ideal model organism for misalignment, and what are the challenges to creating such a model?
What is the most compelling theory of change / path to impact for research on adversarial attacks and defenses, and what is the most exciting version of that kind of research?
Are there some empirical research directions inspired by the assistance games / reward uncertainty tradition which could be helpful even in a language model paradigm?
If you join the technical AI safety team in this round, you could help relieve some severe bottlenecks while building this new iteration of the program area from the ground up. If this sounds exciting to you, I strongly encourage you to apply!
- ^
Interestingly, these figures are actually considerably larger than annual technical AI safety giving in the several years before that, even though we had fewer full-time-equivalent staff working in the area in 2022 and 2023 compared to 2015-2021.
- ^
Initially, our program was led by Daniel Dewey. By around 2019, Catherine Olsson had joined the team, and eventually (I think by 2020-2021) it transitioned to being a team of three run by Nick Beckstead, who managed Catherine and Daniel, as well as Asya Bergal at half her time. In 2021, all three of Daniel, Catherine, and Nick left for other roles. For an interim period, there was no single point person: Holden was personally handling bigger grants (e.g. Redwood Research), and Asya was handling smaller grants (e.g. an RFP that Nick originally started and our PhD fellowship). Holden then moved on to direct work and Asya went full-time on capacity building. I began doing grantmaking in Oct 2022, and quickly ended up full-time handling FTXFF bailout grants. Since late January 2023 or so, I’ve been presiding over a more normal program area.
- Survey on the acceleration risks of our new RFPs to study LLM capabilities by 10 Nov 2023 23:59 UTC; 38 points) (
- Survey on the acceleration risks of our new RFPs to study LLM capabilities by 10 Nov 2023 23:59 UTC; 27 points) (LessWrong;
- 19 Oct 2023 17:27 UTC; 10 points) 's comment on AMA: Six Open Philanthropy staffers discuss OP’s new GCR hiring round by (
- 19 Oct 2023 16:56 UTC; 7 points) 's comment on Open Philanthropy is hiring for multiple roles across our Global Catastrophic Risks teams by (
- 24 Oct 2023 1:09 UTC; 2 points) 's comment on AMA: Six Open Philanthropy staffers discuss OP’s new GCR hiring round by (
Was there some blocker that caused this to happen now, rather than 6 months / 1 year ago?
I only got into grantmaking less than a year ago (in November 2022), and shortly after I unburied myself from FTXFF-collapse-related grants around January, I started hiring in a private round which led to Max joining (a private round is generally much less of a logistical lift than a big public round). I’m now joining this big public round along with other OP GCR teams because combining hiring rounds makes it easier on the back-end. See Luke’s AMA answers here and here for more detail on the “Why are you hiring now rather than previously?” question, and my comment here for more color on my personal working situation over the last ten months or so.
Excited to see this team expand! A few [optional] questions:
What do you think were some of your best and worst grants in the last 6 months?
What are your views on the value of “prosaic alignment” relative to “non-prosaic alignment?” To what extent do you think the most valuable technical research will look fairly similar to “standard ML research”, “pure theory research”, or other kinds of research?
What kinds of technical research proposals do you think are most difficult to evaluate, and why?
What are your favorite examples of technical alignment research from the past 6-12 months?
What, if anything, do you think you’ve learned in the last year? What advice would you have for a Young Ajeya who was about to start in your role?