Taking a leave of absence from Open Philanthropy to work on AI safety
I’m planning a leave of absence (aiming for around 3 months and potentially more) from Open Philanthropy, starting on March 8, to explore working directly on AI safety.
I have a few different interventions I might explore. The first I explore will be AI safety standards: documented expectations (enforced via self-regulation at first, and potentially government regulation later) that AI labs won’t build and deploy systems that pose too much risk to the world, as evaluated by a systematic evaluation regime. (More here.) There’s significant interest from some AI labs in self-regulating via safety standards, and I want to see whether I can help with the work ARC and others are doing to hammer out standards that are both protective and practical—to the point where major AI labs are likely to sign on.
During my leave, Alexander Berger will serve as sole CEO of Open Philanthropy (as he did during my parental leave in 2021).
Depending on how things play out, I may end up working directly on AI safety full-time. Open Philanthropy will remain my employer for at least the start of my leave, but I’ll join or start another organization if I go full-time.
The reasons I’m doing this:
First, I’m very concerned about the possibility that transformative AI could be developed soon (possibly even within the decade—I don’t think this is >50% likely, but it seems too likely for my comfort). I want to be as helpful as possible, and I think the way to do this might be via working on AI safety directly rather than grantmaking.
Second, as a general matter, I’ve always aspired to help build multiple organizations rather than running one indefinitely. I think the former is a better fit for my talents and interests.
At both organizations I’ve co-founded (GiveWell and Open Philanthropy), I’ve had a goal from day one of helping to build an organization that can be great without me—and then moving on to build something else.
I think this went well with GiveWell thanks to Elie Hassenfeld’s leadership. I hope Open Philanthropy can go well under Alexander’s leadership.
Trying to get to that point has been a long-term project. Alexander, Cari, Dustin and I have been actively discussing the path to Open Philanthropy running without me since 2018.1 Our mid-2021 promotion of Alexander to co-CEO was a major step in this direction (putting him in charge of more than half of the organization’s employees and giving), and this is another step, which we’ve been discussing and preparing for for over a year (and announced internally at Open Philanthropy on January 20).
I’ve become increasingly excited about various interventions to reduce AI risk, such as working on safety standards. I’m looking forward to experimenting with focusing my energy on AI safety.
Footnotes
-
This was only a year after Open Philanthropy became a separate organization, but it was several years after Open Philanthropy started as part of GiveWell under the title “GiveWell Labs.” ↩
- Joining the Carnegie Endowment for International Peace by 29 Apr 2024 15:45 UTC; 228 points) (
- Open Philanthropy: Our Progress in 2023 and Plans for 2024 by 27 Mar 2024 17:03 UTC; 139 points) (
- New roles on my team: come build Open Phil’s technical AI safety program with me! by 19 Oct 2023 16:46 UTC; 102 points) (
- Seeking (Paid) Case Studies on Standards by 26 May 2023 17:58 UTC; 99 points) (
- Our Progress in 2022 and Plans for 2023 by 12 May 2023 3:06 UTC; 90 points) (
- AI Safety − 7 months of discussion in 17 minutes by 15 Mar 2023 23:41 UTC; 89 points) (
- New roles on my team: come build Open Phil’s technical AI safety program with me! by 19 Oct 2023 16:47 UTC; 83 points) (LessWrong;
- Future Matters #8: Bing Chat, AI labs on safety, and pausing Future Matters by 21 Mar 2023 14:50 UTC; 81 points) (
- Seeking (Paid) Case Studies on Standards by 26 May 2023 17:58 UTC; 69 points) (LessWrong;
- AI #2 by 2 Mar 2023 14:50 UTC; 66 points) (LessWrong;
- Short bios of 17 “senior figures” in EA by 29 Jun 2023 17:20 UTC; 62 points) (
- (Linkpost) Alexander Berger is now the sole CEO of Open Philanthropy by 14 Aug 2023 8:53 UTC; 35 points) (
- What next for the EA Community in 2024? by 9 Feb 2024 10:00 UTC; 32 points) (
- EA & LW Forum Weekly Summary (20th − 26th Feb 2023) by 27 Feb 2023 3:46 UTC; 29 points) (
- EA Organization Updates & Opportunities: March 2023 by 15 Mar 2023 23:54 UTC; 27 points) (
- AI Safety − 7 months of discussion in 17 minutes by 15 Mar 2023 23:41 UTC; 25 points) (LessWrong;
- Posts we recommend from last week (Digest #126) by 1 Mar 2023 21:58 UTC; 24 points) (
- 23 Oct 2023 16:54 UTC; 22 points) 's comment on AMA: Six Open Philanthropy staffers discuss OP’s new GCR hiring round by (
- EA & LW Forum Weekly Summary (20th − 26th Feb 2023) by 27 Feb 2023 3:46 UTC; 4 points) (LessWrong;
As AI heats up, I’m excited and frankly somewhat relieved to have Holden making this change. While I agree with 𝕮𝖎𝖓𝖊𝖗𝖆′s comment below that Holden had a lot of leverage on AI safety in his recent role, I also believe he has an vast amount of domain knowledge that can be applied more directly to problem solving. We’re in shockingly short supply of that kind of person, and the need is urgent.
Alexander has my full confidence in his new role as the sole CEO. I consider us incredibly fortunate to have someone like him already involved and and prepared to of succeed as the leader of Open Philanthropy.
My understanding is that Alexander has different views from Holden in that he prioritises global health and wellbeing over longtermist cause areas. Is there a possibility that Open Phil’s longtermist giving decreases due to having a “non-longtermist” at the helm?
I believe that’s an oversimplification of what Alexander thinks but don’t want to put words in his mouth.
In any case this is one of the few decisions the 4 of us (including Cari) have always made together so we have done a lot of aligning already. My current view, which is mostly shared, is we’re currently underfunding x-risk even without longtermism math, both because FTXF went away and because I’ve updated towards shorter AI timelines in the past ~5 years. And even aside from that, we weren’t at full theoretical budget last year anyway. So that all nets out that to expected increase, not decrease.
I’d love to discover new large x-risk funders though and think recent history makes that more likely.
OK, thanks for sharing!
And yes I may well be oversimplifying Alexander’s view.
In your recent Cold Takes post you disclosed that your wife owns equity in both OpenAI and Anthropic. (She was appointed to a VP position at OpenAI, as was her sibling, after you joined OpenAI’s board of directors[1]). In 2017, under your leadership, OpenPhil decided to generally stop publishing “relationship disclosures”. How do you intend to handle conflicts of interest, and transparency about them, going forward?
You wrote here that the first intervention that you’ll explore is AI safety standards that will be “enforced via self-regulation at first, and potentially government regulation later”. AI companies can easily end up with “self-regulation” that is mostly optimized to appear helpful, in order to avoid regulation by governments. Conflicts of interest can easily influence decisions w.r.t. regulating AI companies (mostly via biases and self-deception, rather than via conscious reasoning).
EDIT: you joined OpenAI’s board of directors as part of a deal between OpenPhil and OpenAI that involved recommending a $30M grant to OpenAI.
Can Holden clarify if and if so what proportion of those shares in OpenAI and Anthropic are legally pledged for donation?
For context, my wife is the President and co-founder of Anthropic, and formerly worked at OpenAI.
80% of her equity in Anthropic is (not legally bindingly) pledged for donation. None of her equity in OpenAI is. She may pledge more in the future if there is a tangible compelling reason to do so.
I plan to be highly transparent about my conflict of interest, e.g. I regularly open meetings by disclosing it if I’m not sure the other person already knows about it, and I’ve often mentioned it when discussing related topics on Cold Takes.
I also plan to discuss the implications of my conflict of interest for any formal role I might take. It’s possible that my role in helping with safety standards will be limited to advising with no formal powers (it’s even possible that I’ll decide I simply can’t work in this area due to the conflict of interest, and will pursue one of the other interventions I’ve thought about).
But right now I’m just exploring options and giving non-authoritative advice, and that seems appropriate. (I’ll also note that I expect a lot of advice and opinions on standards to come from people who are directly employed by AI companies; while this does present a conflict of interest, and a more direct one than mine, I think it doesn’t and can’t mean they are excluded from relevant conversations.)
Thanks for the clarification.
I notice that I am surprised and confused.
I’d have expected Holden to contribute much more to AI existential safety as CEO of Open Philanthropy (career capital, comparative advantage, specialisation, etc.) than via direct work.
I don’t really know what to make of this.
That said, it sounds like you’ve given this a lot of deliberation and have a clear plan/course of action.
I’m excited about your endeavours in the project!
RE direct work, I would generally think of the described role as still a form of “leadership” — coordinating actors in the present — unlike “writing research papers” or “writing code”. I expect Holden to have a strong comparative advantage at leadership-type work.
Yes, it would be very different if he’d said “I’m going to skill up on ML and get coding”!
(I work at Open Phil, speaking for myself)
FWIW, I think this could also make a lot of sense. I don’t think Holden would be an individual contributor writing code forever, but skilling up in ML and completing concrete research projects seems like a good foundation for ultimately building a team doing something in AI safety.
I don’t think Holden agrees with this as much as you might think. For example, he spent a lot of his time in the last year or two writing a blog.
I’ve been meaning to ask: Are there plans to turn your Cold Takes posts on AI safety and The Most Important Century into a published book? I think the posts would make for a very compelling book, and a book could reach a much broader audience and would likely get much more attention. (This has pros and cons of course, as you’ve discussed in your posts.)
Amazon: The Most Important Century Paperback – February 12, 2022 by Holden Karnofsky
Neat! Cover jacket could use a graphic designer in my opinion. It’s also slotted under engineering? Am I missing something?
I threw that book together for people who want to read it on Kindle, but it’s quite half-baked. If I had the time, I’d want to rework the series (and a more recent followup series at https://www.cold-takes.com/tag/implicationsofmostimportantcentury/) into a proper book, but I’m not sure when or whether I’ll do this.
For what it’s worth, I don’t see an option to buy a kindle version on Amazon—screenshot here
I think this was a goof due to there being a separate hardcover version, which has now been removed—try again?
This link works.
Is it at all fair to say you’re shifting your strategy from a “marathon” to a “sprint” strategy? I.e. prioritising work that you expect to help soon instead of later.
Is this move due to your personal timelines shortening?
I wouldn’t say I’m in “sprinting” mode—I don’t expect my work hours to go up (and I generally work less than I did a few years ago, basically because I’m a dad now).
The move is partly about AI timelines, partly about the opportunities I see and partly about Open Philanthropy’s stage of development.
I’d love to chat with you about directions here, if you’re interested. I don’t know anyone with a bigger value of p(survival | West Wing levels of competence in major governments) - p(survival | leave it to OpenAI and DeepMind leadership). I’ve published technical AI existential safety research at top ML conferences/journals, and I’ve gotten two MPs in the UK onside this week. You can see my work at michael-k-cohen.com, and you can reach me at michael.cohen@eng.ox.ac.uk.
You may have already thought of this, but one place to start exploring what AI standards might look like is exploring what other safety standards for developing risky new things do in fact look like. The one I’m most familiar with (but not at all an expert on) is DO-178C Level A, the standard for developing avionics software where a bug could crash the plane. “Softer” examples worth looking at would include the SOC2 security certification standards.
I wrote a related thing here as a public comment to the NIST regulation framework developers, who I presume are high on your list to talk to as well: https://futuremoreperfect.substack.com/p/ai-regulation-wonkery
I’m in no position to judge how you should spend your time all things considered, but for what it’s worth, I think your blog posts on AI safety have been very clear and thoughtful, and I frequently recommend them to people (example). For example, I’ve started using the phrase “The King Lear Problem” from time to time (example).
Anyway, good luck! And let me know if there’s anything I can do to help you. 🙂
I think your first priority is promising and seemingly neglected (though I’m not familiar with a lot of work done by governance folk, so I could be wrong here). I also get the impression that MIRI folk believe they have an unusually clear understanding of risks, would like to see risky development slow down and are pessimistic about their near-term prospects for solving technical problems of aligning very capable intelligent systems and generally don’t see any clearly good next steps. It appears to me that this combination of skills and views positions them relatively well for developing AI safety standards. I’d be shocked if you didn’t end up talking to MIRI about this issue, but I just wanted to point out that from my point of view there seems to be a substantial amount of fit here.
I don’t think they claim to have better longer-term prospects, though.
I think they do? Nate at least says he’s optimistic about finding a solution given more time
“Believe” being the operative word here. I really don’t think they do.
I’m not sold on how well calibrated their predictions of catastrophe are, but I think they have contributed a large number of novel & important ideas to the field.
I don’t think they would claim to have significantly better predictive models in a positive sense, they just have far stronger models of what isn’t possible and cannot work for ASI, and it constrains their expectations about the long term far more. (I’m not sure I agree with, say, Eliezer about his view of uselessness of governance, for example—but he has a very clear model, which is unusual.) I also don’t think their view about timelines or takeoff speeds is really a crux—they have claimed that even if ASI is decades away, we still can’t rely on current approaches to scale.