The Soul of EA is in Trouble

This is a Forum Team crosspost from Substack.


Whither cause prioritization and connection with the good?

There’s a trend towards people who once identified as Effective Altruists now identifying solely as “people working on AI safety.”[1] For those in the loop, it feels like less of a trend and more of a tidal wave. There’s an increasing sense that among the most prominent (formerly?) EA orgs and individuals, making AGI go well is functionally all that matters. For that end, so the trend goes, the ideas of Effective Altruism have exhausted their usefulness. They pointed us to the right problem – thanks; we’ll take it from here. And taking it from here means building organizations, talent bases, and political alliances at a scale incommensurate with attachment to a niche ideology or moralizing language generally. I think this a dangerous path to go down too hard and my impression is the EAs are going down the path quite hard.

I’ll acknowledge right off the bat that it is emphatically not the case that everyone doing some version of rebranding themselves to AI safety is making a mistake at all or even harming the ideas of EA on balance. A huge EA insight relative to other moral schools of thought is that excessive purity about particular means and motivation often comes at too great a cost to good outcomes in the world and you should be willing to trade these off at least somewhat. It is definitely the case that specific labels, ideologies, and communities bring with them baggage that makes building broad alliances unnecessarily difficult and not everyone working on AI safety or any other EA-inspired cause should feel obligated to foreground their inspiration and the intellectual history that led them to do what they’re doing.

A central point of my previous post on roughly this topic is that people have crossed the line of merely not-foregrounding EA towards things that look more like active disparagement. That seems like a straightforward mistake from many perspectives. Smart people will draw a line from Effective Altruism to specific perspectives on AI safety and associate one with the other. If you disparage EA, you disparage the specific parts of AI safety you, as part of the EA progeny, are supposed to care most about.

The worry I want to express here is sort of the inverse: if you glorify some relatively-value-neutral conception of AI safety as the summum bonum of what is or used to be EA, there is just a good chance that you will lose the plot and end up not pursuing the the actual highest good, the good itself.

What I see

The [fictionalized] impetus for writing this post came from going to a retreat held by what used to be a local EA group that had rebranded to a neutral name while keeping the same basic cause portfolio. The retreat was about 25 people, and I’d guess only 3-4 were vegan/​vegetarian. Beyond that, when someone gave a lightning talk on earning to give to the presumably-relatively-high-context attendees, it seemed to go over like it might with a totally neutral, cold audience. Most nodded along, but didn’t engage; a few came up to ask typical questions re e.g., shouldn’t the government do this, what about loans; and a few were mildly offended/​felt put-upon.

For those whose EA retreat experience is mostly pre-2023 like me, both the numbers and reactions here are kind of shocking. I would have expected the retreat to be ~70% vegetarian and for most of the response to be hard-nosed questions about the most effective interventions, not “huh, so do you think any charities actually work?” As you might predict, almost all the rest of the retreat was split between technical AI safety and AI policy, with some lip service to biosecurity along the way.

Perhaps the clearest and most predictive embodiment of the trend is 80,000 Hours’ new strategic focus on AI. 80k was always fundamentally about providing thorough, practical cause/​intervention prioritization and that exercise can be fairly regarded as the core of EA. They’re now effectively saying the analysis is done: doing the most good means steering AI development, so we’ll now focus only on the particulars of what to do in AI. Thanks, we’ll take it from here indeed.

Now, even though it’d be easy to frame these moves as reacting to external evidence – perhaps laudably noticing the acceleration of AI capabilities, and perhaps less laudably wanting to cut ties with the past after FTX – one claim is that this is a turn towards greater honesty and transparency with audiences. To some degree, it has always been the case that AI career changes have been the primary measure of success of EA commun– ahem– field-building programs and now we’re just being clearer about what we want and hope for from participants.

This response seems question-begging in this context. Do we want people to work on AI safety or do we want them to do the most good, all things considered? Arguably, we genuinely wanted the latter, so the process mattered here. Maybe someone’s personal fit and drive for animals really did make that the better overall outcome. Maybe we were wrong about some key assumption in the moral calculus of AI safety and would welcome being set straight.

Even putting the question begging concern to the side, exactly what people end up doing within “AI safety” matters enormously from the EA perspective. Don’t you remember all the years, up to and including the present, where it was hard to know whether someone really meant what we thought (or hoped) they did when they said “AI safety?” We actually care about the overall moral value of the long run future. Making AI less racist or preventing its use in petty scams doesn’t really cut it in those terms.

Some reduce the problem to AI-not-kill-everyone-ism, which seems straightforward enough and directed and the most robust source of value here, but I notice people in more sophisticated (and successful) orgs are skittish about parsing things in those terms, lest they turn off the most talented potential contributors and collaborators.

Even this assumes, however, that the problem and its dimensions are and will remain simple enough to communicate in principle without needing to delve into any philosophy or moralizing about the kind of future we want. The obviously-biggest bads will be obvious, and so too the obviously-biggest goods. Thank goodness that our new, most highly capable contributors won’t need to know the ins and outs of our end goals in order to drive progress towards them, they’d be a lot harder to recruit otherwise.

The threat means pose to ends

And this strategy spawns things like the BlueDot curriculum, whose most digestible summary reading on risks from AI covers discrimination, copyright infringement, worker exploitation, invasions of privacy, reduced social connection, and autonomous vehicle malfunctions before touching on what I might call “real risks.”[2] It might not be so bad if this was all just due diligence to cast the widest possible net before, in the course itself, participants would compare the seriousness of these risks. But on multiple occasions, I’ve had the sad experience of speaking to someone who had completed the course who seemed to not even have an awareness of existential risks as a concern.

I understand the temptation. The people I spoke to in this context were very impressive on paper. So you give them the course they want to take and maybe they get excited about doing work at an org you think is doing great and important work on AI. Once they’re there, they’ll catch on and see what’s up, or at least enough of them will do that to make this all worthwhile.

Well, then there’s the orgs. They’re also taking more and more steps to garner conventional credibility by working on more mundane and lower stakes questions than those aimed squarely at value. And it’s working. For those in the know, it’s hard to deny these EA-founded orgs are getting more prominent: better talent, more connections, more influence. A lot of it is a traceable consequence of moderating. The plan is that once there are clearer levers to pull to reduce existential risk (and I agree there aren’t really hugely ripe policy opportunities or ideas for this now), they’ll be in a great position to pull them.

Perhaps you see the worry. Compromise your goals now, pander to your constituents now, and later you’ll be able to cash it all in for what you really care about. The story of every politician ever. Begin as a young idealist, start making compromises, end up voting to add another $5 trillion to the debt because even though you’re retiring next term, you’d hate not to be a team player when these midterms are going to be so. close.

This isn’t just a problem for politics and public-facing projects. It’s a deep weakness of the human condition. People will often decide that some particular partner, or house, or car, or job, or number of kids will make them happy. So they fixate on whatever specific instrument of happiness they chose and after enough time goes by, they fully lose their original vision in the day-to-day onrush of things involved in pursuing the instrument. It’s much easier to simply become the mask you put on to achieve your goals than it is to always remind yourself it’s a mask. In competitive, high-stakes, often zero-sum competitions like policy, it is even harder to pay the tax of maintaining awareness of your mask, lest you fall behind or out of favor, and this is exactly the situation I see AI safety orgs headed towards.

All the same, I don’t think we’re at a point of crisis. None of these tradeoffs seem too dumb at the moment (with some exceptions) and I generally trust EAs to be able to pull off this move more than most. But we’re not setting ourselves up well to escape this trap when we consciously run away from our values and our roots. Likewise when we don’t acknowledge or celebrate people doing the hard work of reflecting directly on what matters in the present. This all corresponds too neatly to the hollowing out of principles-first EA community building, either from 80k or from local groups or university groups converting to AI safety or, tellingly, “AI Security” programs.

The social signals are also powerful. The serious, important people no longer dabble in cause prioritization, obsess about personal donations, or debate population ethics. They build fancy think tanks staffed with Ivy Leaguers and take meetings with important people in government and tech about the hot AI topic of the month. And so community builders take their cues and skip as much of the background ethics, assumptions, and world modeling as they can to get their audiences looking and acting like the big people as fast as possible.

Again, the fanciness and the meetings are good moves and have a lot of value, but if the people executing them never show up to EAG or speak to a university group about the fundamentals, when are they reflecting on those? Even back when they did do that, was it all so clear and resolved that it’d be easy to pick up again in 5 years when you need it? And what will the composition of all your new collaborators be by then? Will they have done any of this reflection or even be on board for the maybe-unpopular actions it recommends?

Losing something more

Beyond possibly falling into a classic trap of making your instrumental goals the enemy of your terminal goals, motivations and reflection just matter a lot for their own sake. If you don’t check in on yourself and your first principles, you’re at serious risk of getting lost both epistemically and morally. When you make arguments aimed at giving you power and influence, the next tradeoff you make is how much scrutiny to give instrumentally useful arguments, hires, and projects.

Another byproduct of checking in from first principles is who and what it connects you with. Everyone knows the vegans are the good guys. You should regard feeling alien and disconnected from them as a warning sign that you might not be aiming squarely as the good. And the specifics of factory farming feel particularly clarifying here. Even strong-identity vegans push the horrors of factory farming out of their heads most of the time for lack of ability to bear it. It strikes me as good epistemic practice for someone claiming that their project most helps the world to periodically stare these real-and-certain horrors in the face and explain why their project matters more – I suspect it cuts away a lot of the more speculative arguments and clarifies various fuzzy assumptions underlying AI safety work to have to weigh it up against something so visceral. It also forces you to be less ambiguous about how your AI project cashes out in reduced existential risk or something equivalently important. Economizing on the regulatory burden faced by downstream developers? Come now, is that the balance in which the lightcone hangs?

Then there is the burden of disease for humans. The thing I suspect brought most now-AI-safety people into the broader ecosystem. Mothers burying their children. The amount of money that you would personally sacrifice to stop it or some equivalent nightmare. Both this problem and this mode of thinking about tradeoffs are greatly if not wholly deemphasized in circles where they were once the cornerstones of how to think about your potential effects on the world. Sure, you don’t want to miss out on a phenomenal safety engineer because you offered too small a salary or set too strong an example of personal poverty, but is there really no place for this discourse nearby to you? Is this something you want distance from?

The confession I’ll make at this especially-moralizing juncture is that, ironically, I am a bad EA for basically the opposite reason that the AI safety identitarians are bad EAs. They care so much about putting every last chip down on the highest-marginal-EV bet that they risk losing themselves. I wallow in my conservatism and abstraction because I care more about the idea of EA than impact itself. That – my part – is really not what it’s supposed to be about.

You, reader, are not doomed to fall into one or the other of these traps though. There are people like Joe, or Benjamin, or Rohin, or Neel who do very impressive and important work on AI safety that is aimed where the value lies, but to my eye they also keep in touch with their moral compasses, with the urgency of animal and human suffering, and with the centrality of goodness itself. As individuals, I don’t think any of them disparage or even belittle by implication the practice of doing serious cross-cause prioritization.

Obviously, this is easier to do as an individual than as an organization. There’s clearly value to an organization making itself more legibly open to a broader range of partners and contributors. But as with all things, influence flows both ways. Your organization’s instrumental goals can rub off on you and how you orient yourself towards your life and work. Your terminal goals can be a north star for the shape your projects and initiatives take, even if there are hard tradeoffs to be made along the way. I worry that the people who care most about doing good in the world are being tempted by the former and becoming increasingly blind to the latter. I worry it’s being socially reinforced by people with weaker moral compasses who haven’t really noticed it’s a problem. I want both groups to notice and each of us individually to be the people we actually want to be.

  1. ^

    I would say “AI safety advocates,” but as will become clear, “advocacy” connotes some amount of moralizing and moralizing is the thing from which people are flinching.

  2. ^

    I pick on BlueDot because they’re the most public and legible, but I’ve seen even worse and more obfuscated curriculums on these terms from groups aiming at something very different than what their courses suggest.