(Posting in a personal capacity unless stated otherwise.) I help allocate Open Phil’s resources to improve the governance of AI with a focus on avoiding catastrophic outcomes. Formerly co-founder of the Cambridge Boston Alignment Initiative, which supports AI alignment/safety research and outreach programs at Harvard, MIT, and beyond, co-president of Harvard EA, Director of Governance Programs at the Harvard AI Safety Team and MIT AI Alignment, and occasional AI governance researcher. I’m also a proud GWWC pledger and vegan.
tlevin
There’s a grain that I agree with here, which is that people excessively plan around a median year for AGI rather than a distribution for various events, and that planning around that kind of distribution leads to more robust and high-expected-value actions (and perhaps less angst).
However, I strongly disagree with the idea that we already know “what we need.” Off the top of my head, several ways narrowing the error bars on timelines—which I’ll operationalize as “the distribution of the most important decisions with respect to building transformative AI”—would be incredibly useful:To what extent will these decisions be made by the current US administration, or by people governed by the current administration? This affects the political strategy everyone—including, I propose, PauseAI—should adopt.
To what extent will the people making the most important AI decisions remember stuff people said in 2025? This is very important for the relative usefulness of public communications versus research, capacity-building, etc.
Are these decisions soon enough that the costs of being “out of the action” outweigh the longer-term benefits of e.g. going to grad school, developing technical expertise, etc? Clearly relevant for lots of individuals who want to make a big impact.
When should philanthropists spend their resources? As I and others have written, there are several considerations that point towards spending later; these are weakened a lot if the key decisions are in the next few years.
To what extent will the most transformative models be technically similar to the ones we have today? That answer determines the value of technical safety research.
I also strongly disagree with the framing that the important thing is us knowing what we know. Yes, people who have been immersed in AI content for years often believe that very scary and/or awesome AI capabilities are coming within the decade. But most people, including most of the people who might take the most important actions, are not in this category and do not share this view (or at least don’t seem to have internalized it). Work that provides an empirical grounding for AI forecasts has already been very useful in bringing attention to AGI and its risks from a broader set of people, including in governments, who would otherwise be focused on any one of the million other problems in the world.
Giving now vs giving later, in practice, is a thorny tradeoff. I think these add up to roughly equal considerations, so my currently preferred policy is to split my donations 50-50, i.e. give 5% of my income away this year and save/invest 5% for a bigger donation later. (None of this is financial/tax advice! Please do your own thinking too.)
In favor of giving now (including giving a constant share of your income every year/quarter/etc, or giving a bunch of your savings away soon):
Simplicity.
The effects of your donation might have compounding returns, e.g. field-building gets more people doing great stuff, this can in turn build the field, etc., or be path-dependent, e.g. someone does some writing that establishes better concepts for the field.
Value drift: maybe you don’t trust your future self to give as much, or to be as good at picking good stuff. (Some commitment mechanisms exist for this, like DAFs, but that really only fixes the “give as much” problem, and there are lots of opportunities that DAFs can’t fund, such as 501c4 advocacy organizations, individuals, political campaigns, etc.)
Expropriation risk: you might lose the money, including via global catastrophe.
In favor of giving later:
Value of information: especially in a fast-changing field like AI, we’ll continue learning more about what kinds of interventions work as time goes on.
Philanthropic learning: basically the opposite of value drift: you specifically might become a wiser donor, especially if you’re currently young and/or new to the field.
Returns to scale: it’s probably better to make e.g. a single $150k donation than ten donations averaging $15k, because orgs can act pretty decisively with an amount like that, like hire somebody or run a program. (Eventually you hit diminishing returns, but not for most individual donors.)
Tax bunching (only applies to donations that you can write off): in my understanding, at least in the US, there’s a threshold below which you effectively can’t write off donations (the standard deduction), so there’s effectively a fixed cost in any year that you make donations. This makes donating a fixed amount every year a pretty suboptimal strategy, other things equal; if you’re donating an amount below or not that far above the standard deduction to c3 orgs every year, you might be able to save or donate significantly more if you instead donate once every few years.
Are you a US resident who spends a lot of money on rideshares + food delivery/pickup? If so, consider the following:
Costco members can buy up to four Uber gift cards of $50 value every two weeks (that is, 2 packs of 2 $50 gift cards). Now, and I think typically, these sell at 20% off face value.
Costco membership costs $65/year.
It takes ~2 minutes per gift card all-in.
You can use them on rides, scooters, and Uber Eats.
According to o3-mini-high, this means it’s worth it if you spend $1625 / (5 - how much you value your marginal minute) per year on these services, if you get no other use out of the Costco membership. (If you do, this number goes down, of course.)
Hooray, you now have more money for donations, consumption, savings, or investment for a small time cost!
I was not paid by Costco or Uber to say this, I swear.
I think the opposite might be true: when you apply it to broad areas, you’re likely to mistake low neglectedness for a signal of low tractability, and you should just look at “are there good opportunities at current margins.” When you start looking at individual solutions, it starts being quite relevant whether they have already been tried. (This point already made here.)
Would it be good to solve problem P?
Can I solve P?
What is gained by adding the third thing? If the answer to #2 is “yes,” then why does it matter if the answer to #3 is “a lot,” and likewise in the opposite case, where the answers are “no” and “very few”?Edit: actually yeah the “will someone else” point seems quite relevant.
Fair enough on the “scientific research is super broad” point, but I think this also applies to other fields that I hear described as “not neglected” including US politics.
Not talking about AI safety polling, agree that was highly neglected. My understanding, reinforced by some people who have looked into the actually-practiced political strategies of modern campaigns, is that it’s just a stunningly under-optimized field with a lot of low-hanging fruit, possibly because it’s hard to decouple political strategy from other political beliefs (and selection effects where especially soldier-mindset people go into politics).
I sometimes say, in a provocative/hyperbolic sense, that the concept of “neglectedness” has been a disaster for EA. I do think the concept is significantly over-used (ironically, it’s not neglected!), and people should just look directly at the importance and tractability of a cause at current margins.
Maybe neglectedness useful as a heuristic for scanning thousands of potential cause areas. But ultimately, it’s just a heuristic for tractability: how many resources are going towards something is evidence about whether additional resources are likely to be impactful at the margin, because more resources mean its more likely that the most cost-effective solutions have already been tried or implemented. But these resources are often deployed ineffectively, such that it’s often easier to just directly assess the impact of resources at the margin than to do what the formal ITN framework suggests, which is to break this hard question into two hard ones: you have to assess something like the abstract overall solvability of a cause (namely, “percent of the problem solved for each percent increase in resources,” as if this is likely to be a constant!) and the neglectedness of the cause.
That brings me to another problem: assessing neglectedness might sound easier than abstract tractability, but how do you weigh up the resources in question, especially if many of them are going to inefficient solutions? I think EAs have indeed found lots of surprisingly neglected (and important, and tractable) sub-areas within extremely crowded overall fields when they’ve gone looking. Open Phil has an entire program area for scientific research, on which the world spends >$2 trillion, and that program has supported Nobel Prize-winning work on computational design of proteins. US politics is a frequently cited example of a non-neglected cause area, and yet EAs have been able to start or fund work in polling and message-testing that has outcompeted incumbent orgs by looking for the highest-value work that wasn’t already being done within that cause. And so on.
What I mean by “disaster for EA” (despite the wins/exceptions in the previous paragraph) is that I often encounter “but that’s not neglected” as a reason not to do something, whether at a personal or organizational or movement-strategy level, and it seems again like a decent initial heuristic but easily overridden by taking a closer look. Sure, maybe other people are doing that thing, and fewer or zero people are doing your alternative. But can’t you just look at the existing projects and ask whether you might be able to improve on their work, or whether there still seems to be low-hanging fruit that they’re not taking, or whether you could be a force multiplier rather than just an input with diminishing returns? (Plus, the fact that a bunch of other people/orgs/etc are working on that thing is also some evidence, albeit noisy evidence, that the thing is tractable/important.) It seems like the neglectedness heuristic often leads to more confusion than clarity on decisions like these, and people should basically just use importance * tractability (call it “the IT framework”) instead.
It’s also just jargon-y. I call them “AI companies” because people outside the AGI memeplex don’t know what an “AI lab” is, and (as you note) if they infer from someone’s use of that term that the frontier developers are something besides “AI companies,” they’d be wrong!
Biggest disagreement between the average worldview of people I met with at EAG and my own is something like “cluster thinking vs sequence thinking,” where people at EAG are like “but even if we get this specific policy/technical win, doesn’t it not matter unless you also have this other, harder thing?” and I’m more like, “Well, very possibly we won’t get that other, harder thing, but still seems really useful to get that specific policy/technical win, here’s a story where we totally fail on that first thing and the second thing turns out to matter a ton!”
Skepticism towards claims about the views of powerful institutions
Thanks, glad to hear it’s helpful!
Re: more examples, I co-sign all of my teammates’ AI examples here—they’re basically what I would’ve said. I’d probably add Tarbell as well.
Re: my personal donations, I’m saving for a bigger donation later; I encounter enough examples of very good stuff that Open Phil and other funders can’t fund, or can’t fund quickly enough, that I think there are good odds that I’ll be able to make a really impactful five-figure donation over the next few years. If I were giving this year, I probably would’ve gone the route of political campaigns/PACs.
Re: sub-areas, there are some forms of policy advocacy and moral patienthood research for which small-to-medium-size donors could be very helpful. I don’t have specific opportunities in mind that I feel like I can make a convincing public pitch for, but people can reach out if they’re interested.
A case for donating to AI risk reduction (including if you work in AI)
I hope to eventually/maybe soon write a longer post about this, but I feel pretty strongly that people underrate specialization at the personal level, even as there are lots of benefits to pluralization at the movement level and large-funder level. There are just really high returns to being at the frontier of a field. You can be epistemically modest about what cause or particular opportunity is the best, not burn bridges, etc, while still “making your bet” and specializing; in the limit, it seems really unlikely that e.g. having two 20 hr/wk jobs in different causes is a better path to impact than a single 40 hr/wk job.
I think this applies to individual donations as well; if you work in a field, you are a much better judge of giving opportunities in that field than if you don’t, and you’re more likely to come across such opportunities in the first place. I think this is a chronically underrated argument when it comes to allocating personal donations.
Thanks for running this survey. I find these results extremely implausibly bearish on public policy—I do not think we should be even close to indifferent between improving the AI policy of the country that can make binding rules on all of the leading labs plus many key hardware inputs and has a $6 trillion budget and the most powerful military on earth by 5% and having $8.1 million more dollars for a good grantmaker, or having 32.5 “good video explainers,” or having 13 technical AI academics. I’m biased, of course, but IMO the surveyed population is massively overrating the importance of the alignment community relative to the US government.
How the AI safety technical landscape has changed in the last year, according to some practitioners
Fwiw, I think the main thing getting missed in this discourse is that even 3 out of your 50 speakers (especially if they’re near the top of the bill) are mostly known for a cluster of edgy views that are not welcome in most similar spaces, people who really want to gather to discuss those edgy and typically unwelcome views will be a seriously disproportionate share of attendees, and this will have significant repercussions for the experience of the attendees who were primarily interested in the other 47 speakers.
I recommend the China sections of this recent CNAS report as a starting point for discussion (it’s definitely from a relatively hawkish perspective, and I don’t think of myself as having enough expertise to endorse it, but I did move in this direction after reading).
From the executive summary:
Taken together, perhaps the most underappreciated feature of emerging catastrophic AI risks from this exploration is the outsized likelihood of AI catastrophes originating from China. There, a combination of the Chinese Communist Party’s efforts to accelerate AI development, its track record of authoritarian crisis mismanagement, and its censorship of information on accidents all make catastrophic risks related to AI more acute.
From the “Deficient Safety Cultures” section:
While such an analysis is of relevance in a range of industry- and application-specific cultures, China’s AI sector is particularly worthy of attention and uniquely predisposed to exacerbate catastrophic AI risks [footnote]. China’s funding incentives around scientific and technological advancement generally lend themselves to risky approaches to new technologies, and AI leaders in China have long prided themselves on their government’s large appetite for risk—even if there are more recent signs of some budding AI safety consciousness in the country [footnote, footnote, footnote]. China’s society is the most optimistic in the world on the benefits and risks of AI technology, according to a 2022 survey by the multinational market research firm Institut Public de Sondage d’Opinion Secteur (Ipsos), despite the nation’s history of grisly industrial accidents and mismanaged crises—not least its handling of COVID-19 [footnote, footnote, footnote, footnote]. The government’s sprint to lead the world in AI by 2030 has unnerving resonances with prior grand, government-led attempts to accelerate industries that have ended in tragedy, as in the Great Leap Forward, the commercial satellite launch industry, and a variety of Belt and Road infrastructure projects [footnote, footnote, footnote]. China’s recent track record in other hightech sectors, including space and biotech, also suggests a much greater likelihood of catastrophic outcomes [footnote, footnote, footnote, footnote, footnote].
From “Further Considerations”
In addition to having to grapple with all the same safety challenges that other AI ecosystems must address, China’s broader tech culture is prone to crisis due to its government’s chronic mismanagement of disasters, censorship of information on accidents, and heavy-handed efforts to force technological breakthroughs. In AI, these dynamics are even more pronounced, buoyed by remarkably optimistic public perceptions of the technology and Beijing’s gigantic strategic gamble on boosting its AI sector to international preeminence. And while both the United States and China must reckon with the safety challenges that emerge from interstate technology competitions, historically, nations that perceive themselves to be slightly behind competitors are willing to absorb the greatest risks to catch up in tech races [footnote]. Thus, even while the United States’ AI edge over China may be a strategic advantage, Beijing’s self-perceived disadvantage could nonetheless exacerbate the overall risks of an AI catastrophe.
Yes, but it’s kind of incoherent to talk about the dollar value of something without having a budget and an opportunity cost; it has to be your willingness-to-pay, not some dollar value in the abstract. Like, it’s not the case that the EA funding community would pay $500B even for huge wins like malaria eradication, end to factory farming, robust AI alignment solution, etc, because it’s impossible: we don’t have $500B.
And I haven’t thought about this much but it seems like we also wouldn’t pay, say, $500M for a 1-in-1000 chance for a “$500B win” because unless you’re defining “$500B win” with respect to your actual willingness-to-pay, you might wind up with many opportunities to take these kinds of moonshots and quickly run out of money. The dollar size of the win still has to ultimately account for your budget.
Well, it implies you could change the election with those amounts if you knew exactly how close the election would be in each state and spent optimally. But If you figure the estimates are off by an OOM, and half of your spending goes to states that turn out not to be useful (which matches a ~30 min analysis I did a few months ago), and you have significant diminishing returns such that $10M-$100M is 3x less impactful than the first $10M and $100M-$1B is another 10x less impactful, you still get:
First $10M is ~$10k per key vote = 1,000 votes (enough to swing 2000)
Next $90M is ~$30k per key vote = 3,000 votes
Next $900M is ~$90k per key vote = 10,000 votes
I think if you think there’s a major difference between the candidates, you might put a value on the election in the billions—let’s say $10B for the sake of calculation; so the first $10M would be worth it if there’s a 0.1% chance the election is decided by <1000 votes (which of course happened 6 elections ago!), the next $90M is worth it if there’s a 0.9% chance the election is decided by >1000 but <4000 votes, and the next $900M is worth it if there’s a 9% chance the election is decided by >4000 but <14000 votes. IMO the first two probably pass and the last one probably doesn’t, but idk.
I definitely agree there are plenty of ways we should reach elites and non-elites alike that aren’t statistical models of timelines, and insofar as the resources going towards timeline models (in terms of talent, funding, bandwidth) are fungible with the resources going towards other things, maybe I agree that more effort should be going towards the other things (but I’m not sure—I really think the timeline models have been useful for our community’s strategy and for informing other audiences).
But also, they only sometimes create a sense of panic; I could see specificity being helpful for people getting out of the mode of “it’s vaguely inevitable, nothing to be done, just gotta hope it all works out.” (Notably the timeline models sometimes imply longer timelines than the vibes coming out of the AI companies and Bay Area house parties.)