(Posting in a personal capacity unless stated otherwise.) I help allocate Open Phil’s resources to improve the governance of AI with a focus on avoiding catastrophic outcomes. Formerly co-founder of the Cambridge Boston Alignment Initiative, which supports AI alignment/safety research and outreach programs at Harvard, MIT, and beyond, co-president of Harvard EA, Director of Governance Programs at the Harvard AI Safety Team and MIT AI Alignment, and occasional AI governance researcher. I’m also a proud GWWC pledger and vegan.
tlevin
A case for donating to AI risk reduction (including if you work in AI)
Thanks for running this survey. I find these results extremely implausibly bearish on public policy—I do not think we should be even close to indifferent between improving the AI policy of the country that can make binding rules on all of the leading labs plus many key hardware inputs and has a $6 trillion budget and the most powerful military on earth by 5% and having $8.1 million more dollars for a good grantmaker, or having 32.5 “good video explainers,” or having 13 technical AI academics. I’m biased, of course, but IMO the surveyed population is massively overrating the importance of the alignment community relative to the US government.
How the AI safety technical landscape has changed in the last year, according to some practitioners
Fwiw, I think the main thing getting missed in this discourse is that even 3 out of your 50 speakers (especially if they’re near the top of the bill) are mostly known for a cluster of edgy views that are not welcome in most similar spaces, people who really want to gather to discuss those edgy and typically unwelcome views will be a seriously disproportionate share of attendees, and this will have significant repercussions for the experience of the attendees who were primarily interested in the other 47 speakers.
I recommend the China sections of this recent CNAS report as a starting point for discussion (it’s definitely from a relatively hawkish perspective, and I don’t think of myself as having enough expertise to endorse it, but I did move in this direction after reading).
From the executive summary:
Taken together, perhaps the most underappreciated feature of emerging catastrophic AI risks from this exploration is the outsized likelihood of AI catastrophes originating from China. There, a combination of the Chinese Communist Party’s efforts to accelerate AI development, its track record of authoritarian crisis mismanagement, and its censorship of information on accidents all make catastrophic risks related to AI more acute.
From the “Deficient Safety Cultures” section:
While such an analysis is of relevance in a range of industry- and application-specific cultures, China’s AI sector is particularly worthy of attention and uniquely predisposed to exacerbate catastrophic AI risks [footnote]. China’s funding incentives around scientific and technological advancement generally lend themselves to risky approaches to new technologies, and AI leaders in China have long prided themselves on their government’s large appetite for risk—even if there are more recent signs of some budding AI safety consciousness in the country [footnote, footnote, footnote]. China’s society is the most optimistic in the world on the benefits and risks of AI technology, according to a 2022 survey by the multinational market research firm Institut Public de Sondage d’Opinion Secteur (Ipsos), despite the nation’s history of grisly industrial accidents and mismanaged crises—not least its handling of COVID-19 [footnote, footnote, footnote, footnote]. The government’s sprint to lead the world in AI by 2030 has unnerving resonances with prior grand, government-led attempts to accelerate industries that have ended in tragedy, as in the Great Leap Forward, the commercial satellite launch industry, and a variety of Belt and Road infrastructure projects [footnote, footnote, footnote]. China’s recent track record in other hightech sectors, including space and biotech, also suggests a much greater likelihood of catastrophic outcomes [footnote, footnote, footnote, footnote, footnote].
From “Further Considerations”
In addition to having to grapple with all the same safety challenges that other AI ecosystems must address, China’s broader tech culture is prone to crisis due to its government’s chronic mismanagement of disasters, censorship of information on accidents, and heavy-handed efforts to force technological breakthroughs. In AI, these dynamics are even more pronounced, buoyed by remarkably optimistic public perceptions of the technology and Beijing’s gigantic strategic gamble on boosting its AI sector to international preeminence. And while both the United States and China must reckon with the safety challenges that emerge from interstate technology competitions, historically, nations that perceive themselves to be slightly behind competitors are willing to absorb the greatest risks to catch up in tech races [footnote]. Thus, even while the United States’ AI edge over China may be a strategic advantage, Beijing’s self-perceived disadvantage could nonetheless exacerbate the overall risks of an AI catastrophe.
Yes, but it’s kind of incoherent to talk about the dollar value of something without having a budget and an opportunity cost; it has to be your willingness-to-pay, not some dollar value in the abstract. Like, it’s not the case that the EA funding community would pay $500B even for huge wins like malaria eradication, end to factory farming, robust AI alignment solution, etc, because it’s impossible: we don’t have $500B.
And I haven’t thought about this much but it seems like we also wouldn’t pay, say, $500M for a 1-in-1000 chance for a “$500B win” because unless you’re defining “$500B win” with respect to your actual willingness-to-pay, you might wind up with many opportunities to take these kinds of moonshots and quickly run out of money. The dollar size of the win still has to ultimately account for your budget.
Well, it implies you could change the election with those amounts if you knew exactly how close the election would be in each state and spent optimally. But If you figure the estimates are off by an OOM, and half of your spending goes to states that turn out not to be useful (which matches a ~30 min analysis I did a few months ago), and you have significant diminishing returns such that $10M-$100M is 3x less impactful than the first $10M and $100M-$1B is another 10x less impactful, you still get:
First $10M is ~$10k per key vote = 1,000 votes (enough to swing 2000)
Next $90M is ~$30k per key vote = 3,000 votes
Next $900M is ~$90k per key vote = 10,000 votes
I think if you think there’s a major difference between the candidates, you might put a value on the election in the billions—let’s say $10B for the sake of calculation; so the first $10M would be worth it if there’s a 0.1% chance the election is decided by <1000 votes (which of course happened 6 elections ago!), the next $90M is worth it if there’s a 0.9% chance the election is decided by >1000 but <4000 votes, and the next $900M is worth it if there’s a 9% chance the election is decided by >4000 but <14000 votes. IMO the first two probably pass and the last one probably doesn’t, but idk.
It seems like you might be under-weighing the cumulative amount of resources—even if you have some pretty heavy decay rate (which it’s unclear you should—usually we think of philanthropic investments compounding over time), avoiding nuclear war was a top global priority for decades, and it feels like we have a lot of intellectual and policy “legacy infrastructure” from that.
Yeah, this is all pretty compelling, thanks!
I think some of the AI safety policy community has over-indexed on the visual model of the “Overton Window” and under-indexed on alternatives like the “ratchet effect,” “poisoning the well,” “clown attacks,” and other models where proposing radical changes can make you, your allies, and your ideas look unreasonable.
I’m not familiar with a lot of systematic empirical evidence on either side, but it seems to me like the more effective actors in the DC establishment overall are much more in the habit of looking for small wins that are both good in themselves and shrink the size of the ask for their ideal policy than of pushing for their ideal vision and then making concessions. Possibly an ideal ecosystem has both strategies, but it seems possible that at least some versions of “Overton Window-moving” strategies executed in practice have larger negative effects via associating their “side” with unreasonable-sounding ideas in the minds of very bandwidth-constrained policymakers, who strongly lean on signals of credibility and consensus when quickly evaluating policy options, than the positive effects of increasing the odds of ideal policy and improving the framing for non-ideal but pretty good policies.
In theory, the Overton Window model is just a description of what ideas are taken seriously, so it can indeed accommodate backfire effects where you argue for an idea “outside the window” and this actually makes the window narrower. But I think the visual imagery of “windows” actually struggles to accommodate this—when was the last time you tried to open a window and accidentally closed it instead? -- and as a result, people who rely on this model are more likely to underrate these kinds of consequences.
Would be interested in empirical evidence on this question (ideally actual studies from psych, political science, sociology, econ, etc literatures, rather than specific case studies due to reference class tennis type issues).
Yes, some regulations backfire, and this is a good flag to keep in mind when designing policy, but to actually make the reference-class argument here work, you’d have to show that this is what we should expect from AI policy, which would include showing that failures like NEPA are either much more relevant for the AI case or more numerous than other, more successful regulations, like (in my opinion) the Clean Air Act, Sarbanes-Oxley, bans on CFCs or leaded gasoline, etc. I know it’s not quite as simple as “I would simply design good regulations instead of bad ones,” but it’s also not as simple as “some regulations are really counterproductive, so you shouldn’t advocate for any.” Among other things, this assumes that nobody else will be pushing for really counterproductive regulations!
This post correctly identifies some of the major obstacles to governing AI, but ultimately makes an argument for “by default, governments will not regulate AI well,” rather than the claim implied by its title, which is that advocating for (specific) AI regulations is net negative—a type of fallacious conflation I recognize all too well from my own libertarian past.
Interesting! I actually wrote a piece on “the ethics of ‘selling out’” in The Crimson almost 6 years ago (jeez) that was somewhat more explicit in its EA justification, and I’m curious what you make of those arguments.
I think randomly selected Harvard students (among those who have the option to do so) deciding to take high-paying jobs and donate double-digit percentages of their salary to places like GiveWell is very likely better for the world than the random-ish other things they might have done, and for that reason I strongly support this op-ed. But I think for undergrads who are really committed to doing the most good, there are two things I would recommend instead. Both route through developing a solid understanding of the most important and tractable problems in the world, via reading widely, asking good questions of knowledgeable people, doing their own writing and seeking feedback, probably aggressively networking among the people working on these problems.
This enables much more effective earning to give — I think very plugged-in and reasonably informed donors can outperform even top grantmaking organizations in various ways, including helping organizations diversify their funding, moving faster, spotting opportunities that the grantmakers don’t, etc.
And it’s also basically necessary for doing direct work on the world’s most important problems. I think the generic advice to earn to give misses the huge variation in performance between individuals in direct work; if I understand correctly, 80k agrees with this and thinks this should have been much more emphasized in their early writing and advice. Many Harvard students, in my view, could relatively quickly become excellent in roles like think tank research in AI policy or biosecurity or operations at very impactful organizations. A smaller but nontrivial number could be excellent researchers on important philosophical or technical questions. I think it takes a lot of earning potential to beat those.
I object to calling funding two public defenders “strictly dominating” being one yourself; while public defender isn’t an especially high-variance role with respect to performance compared to e.g. federal public policy, it doesn’t seem that crazy that a really talented and dedicated public defender could be more impactful than the 2 or 3 marginal PDs they’d fund while earning to give.
The shape of my updates has been something like:
Q2 2023: Woah, looks like the AI Act might have a lot more stuff aimed at the future AI systems I’m most worried about than I thought! Making that go well now seems a lot more important than it did when it looked like it would mostly be focused on pre-foundation model AI. I hope this passes!
Q3 2023: As I learn more about this, it seems like a lot of the value is going to come from the implementation process, since it seems like the same text in the actual Act could wind up either specifically requiring things that could meaningfully reduce the risks or just imposing a lot of costs at a lot of points in the process without actually aiming at the most important parts, based on how the standard-setting orgs and member states operationalize it. But still, for that to happen at all it needs to pass and not have the general-purpose AI stuff removed.
November 2023: Oh no, France and Germany want to take out the stuff I was excited about in Q2. Maybe this will not be very impactful after all.
December 2023: Oh good, actually it seems like they’ve figured out a way to focus the costs France/Germany were worried about on the very most dangerous AIs and this will wind up being more like what I was hoping for pre-November, and now highly likely to pass!
The text of the Act is mostly determined, but it delegates tons of very important detail to standard-setting organizations and implementation bodies at the member-state level.
(Cross-posting from LW)
Thanks for these thoughts! I agree that advocacy and communications is an important part of the story here, and I’m glad for you to have added some detail on that with your comment. I’m also sympathetic to the claim that serious thought about “ambitious comms/advocacy” is especially neglected within the community, though I think it’s far from clear that the effort that went into the policy research that identified these solutions or work on the ground in Brussels should have been shifted at the margin to the kinds of public communications you mention.
I also think Open Phil’s strategy is pretty bullish on supporting comms and advocacy work, but it has taken us a while to acquire the staff capacity to gain context on those opportunities and begin funding them, and perhaps there are specific opportunities that you’re more excited about than we are.
For what it’s worth, I didn’t seek significant outside input while writing this post and think that’s fine (given the alternative of writing it quickly, posting it here, disclaiming my non-expertise, and getting additional perspectives and context from commenters like yourself). However, I have spoken with about a dozen people working on AI policy in Europe over the last couple months (including one of the people whose public comms efforts are linked in your comment) and would love to chat with more people with experience doing policy/politics/comms work in the EU.
We could definitely use more help thinking about this stuff, and I encourage readers who are interested in contributing to OP’s thinking on advocacy and comms to do any of the following:
Write up these critiques (we do read the forums!);
Join our team (our latest hiring round specifically mentioned US policy advocacy as a specialization we’d be excited about, but people with advocacy/politics/comms backgrounds more generally could also be very useful, and while the round is now closed, we may still review general applications); and/or
Introduce yourself via the form mentioned in this post.
It uses the language of “models that present systemic risks” rather than “very capable,” but otherwise, a decent summary, bot.
I hope to eventually/maybe soon write a longer post about this, but I feel pretty strongly that people underrate specialization at the personal level, even as there are lots of benefits to pluralization at the movement level and large-funder level. There are just really high returns to being at the frontier of a field. You can be epistemically modest about what cause or particular opportunity is the best, not burn bridges, etc, while still “making your bet” and specializing; in the limit, it seems really unlikely that e.g. having two 20 hr/wk jobs in different causes is a better path to impact than a single 40 hr/wk job.
I think this applies to individual donations as well; if you work in a field, you are a much better judge of giving opportunities in that field than if you don’t, and you’re more likely to come across such opportunities in the first place. I think this is a chronically underrated argument when it comes to allocating personal donations.