Former AI safety research engineer, now AI governance researcher at OpenAI. Blog: thinkingcomplete.blogspot.com
richard_ngo
FWIW the reasons you’re giving here are closely related to the reasons why I’m sceptical that modern AI-focused EA is in fact as good.
In hindsight I shouldn’t have used the phrase “what made EA good”, since by this point I’m skeptical about both the AI safety version and the “original spirit” of EA. I guess that makes me one of the people you’re describing who joined a long time ago (in my case, over a decade) and have now disengaged.
I do think that int/a is less likely than EA to be significantly harmful, and I’m excited about that. Whether or not it has a decent chance of actually doing something meaningful will depend a lot on the vision of the founders (and Euan in particular). Right now I’m not seeing what will prevent it from dissolving into the background of general hippie-adjacent things (kinda like a lot of the Game B and metacrisis stuff seems to have done). But we’ll see.
I have some caution around pushing to generate precise object-level claims that “define int/a”, in that you have to believe these claims to be part of it.
Yeah, I phrased it badly when I said that the movement should be pinning down claims. I’m not suggesting that you use these claims to define membership. Indeed, even the framing of your original post feels too “we are a group defined by believing the same things” for my taste (as compared with, say, “we’re some collaborators with similar intellectual/emotional/ethical stances”).
But I’m excited about you (and the others you mention in this post) writing about the things you personally think the EA worldview gets wrong, ideally not just engaging with how the movement turned out in practice but the broken philosophical assumptions that led to practical mistakes.
As one example, EAs constantly use “value-aligned” as a metric of who to ally with. But it seems pretty plausible to me that SBF was extremely value-aligned with most of the stated philosophical principles of EA. The problem was that he wasn’t value-aligned with the background ethics of society that EA mostly takes for granted. Understanding this deeply enough I think would lead to you reconceptualize the whole concept of “value-aligned” towards things more reminiscent of int/a (in a way that would then have implications for e.g. what moral theories to believe, what alignment targets to aim AIs at, etc).
Yeah, I expected as much. Though as per my comment above, I’m much more concerned about representation of thought leaders. A better proxy for intellectual diversity is something like “are the few people from each of these clusters who are the biggest critics of the consensus view invited?” E.g. for the Pause AI cluster that’d probably be Holly; for the MIRI cluster that’d probably be Yudkowsky and Habryka; for the academic ML cluster that’d probably be Dan Hendrycks; for the sociopolitical safety cluster that’d probably be Ben Hoffman and Michael Vassar.
I don’t know exactly who was invited but I expect that the Summit gets a medium score on this metric: not great, not terrible.
IMO you should think of global health/factory farming etc as one paradigm of EA—which did focus on cost effectiveness—and AI safety as a different paradigm in which the concept of cost-effectiveness is simply not very useful, for a few reasons (see also this related comment):
Talent bottlenecks are a far bigger obstacle than financial bottlenecks, and you can’t buy talent. Often you can’t even spend money to persuade talent—e.g. the AI safety community ended up convincing several of the most influential AI researchers of AGI risk, and even they mostly don’t seem to be able to think clearly about the issue (e.g. it’s hard to imagine a coherent strategy behind SSI).
Insofar as there are financial bottlenecks, it’s mostly because the biggest funder is ideologically and politically constrained, and because the trust networks in AI safety aren’t robust enough to distribute most of the money available. This will be only more true as Anthropic equity becomes liquid.
There’s an extreme principal-agent problem where not only is it difficult for funders to tell who will do good research in advance, but it’s even difficult for them to tell what was good research in hindsight.
As you mention, a lot of the action is in figuring out how not to be net-negative.
It’s very hard to cash out what AI safety is even trying to achieve in terms of metrics of cost-effectiveness. Once you start talking about the transhuman future, then almost every metric you could come up with is better optimized via weird futuristic stuff than simply “keeping humans alive”.
Re “it’s pretty easy to show that it’s orders of magnitude more cost-effective than GiveDirectly”: the kinds of reasoning that one might use to show that are pretty similar to e.g. the kinds that a communist could use to show that a proletarian revolution is orders of magnitude more cost-effective than GiveDirectly. In other words, it’s mostly reasoning within the confines of a worldview, and then slapping on a “cost-effectiveness” framing at the end, rather than having the cost-effectiveness part of the reasoning be load-bearing in any meaningful way.
FWIW I think it’s historically pretty incorrect that the grounding in cost-effectiveness is what made EA good. E.g. insofar as you think that AI safety is valuable, reasoning about cost-effectiveness actually cut against EA’s ability to pivot towards that. Instead, the thing that helped most was something like intellectual openness.
Let’s limit ourselves for a moment to someone who wants to make their own life go well. They have various instinctive responses that were ingrained in them by evolution—e.g. a disgust response to certain foods, or a sense of anger at certain behavior. Should they follow those instincts even when they don’t know why evolution instilled them? Personally I think they often should, and that this is an example of rational non-model-based decision-making. (Note that this doesn’t rely on them understanding evolution—even 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in them—e.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
How does this relate to cluelessness? Mostly I don’t really know what the term means, it’s not something salient in my ontology. I don’t feel clueless about how to have a good life, and I don’t feel clueless about how the long-term future should be better, and these two things do in fact seem analogous. In both cases the idea of avoiding pointless internal conflict (and more generally, becoming wiser and more virtuous in other ways) seems pretty solidly good. (Also, the evolution thing isn’t central, you have similar dynamics with e.g. intuitions you’ve learned at a subconscious level, or behaviors that you’ve learned via reinforcement.)
Another way of thinking about it is that, when you’re a subagent within a larger agent, you can believe that “playing your role” within the larger agent is good even when the larger agent is too complex for you to model well (i.e. you have Knightian uncertainty about it).
And yet another way of thinking about it is “forward-chaining often works, even when you don’t have a back-chained reason why you think it should work”.
Cool project :) There’s definitely something very important in the rough direction you’re pointing. Some thoughts on how to gain more clarity on it:
I suspect that it’d be worth your time to think a bunch about the relationship between altruism and ethics. In some sense, I think of ethics (and particularly virtue ethics) as already a kind of “integral altruism”—i.e. ethics as a set of principles and heuristics by which we can remain in integrity with ourselves and others, thereby allowing our compassion to actually make the world better.
I think that the hippie/metamodern/etc communities are very good at some aspects of ethics, but quite bad at others. In general they tend to err on the side of agreeableness, rather than e.g. being honest about unpleasant truths. It feels valuable to take this broad worldview and then try to add a bunch of moral courage that it’s currently missing (analogous to how you can think of EA as adding moral courage to econ-brained thinking).
However, I feel pretty confused about how to actually help people aim their moral courage towards being ethical, since IMO neither EA nor most inner work helps much with this. One litmus test that I use to evaluate whether inner work is actually making people braver is whether they’re more willing to break political taboos afterwards (e.g. for people in the UK, by making a fuss about the Pakistani rape gangs); however, this seldom comes up positive. Another litmus test is whether they’re more willing to face the possibility of physical violence when appropriate (e.g. when a crazy person is being a bit menacing in public, do they still just look away?). These are just illustrative examples but hopefully they point at what I think is missing by default.
The stuff on cluelessness feels like it’s conceding a little too much to the EA/bayesian frame. It’s implying that you should have a model of the entire future in order to make decisions. But what I think you actually want to claim is that it’s sensible and even “rational” to make non-model-based decisions (e.g. via heuristics, intuitions, etc). Some other terms that might be better: bounded rationality, group agency, Knightian uncertainty. I sometimes use “distributed agency” or “coalitional agency”, but I think they won’t make sense to most of your readers.
The problem with stuff like systems thinking & complexity science is that it’s not really aiming to make the same kind of scientific progress as sciences like physics or evolutionary biology have made. More generally, it seems easy for movements like integral altruism to fall into the trap of not pinning down core ideas and claims. But insofar as integral altruism is true, it suggests that something important about the expected utility maximization paradigm is false, which someone should pin down. In other words, imagine that someone from the 22nd century comes back and tells you that something like integral altruism was actually scientifically/mathematically correct. What’s the version of integral altruism that actually leads to you figuring that out?
I filled in your form, and am excited to see where you take this!
As an event focused on x-risk, yes, I think this is fair.
This seems like a misinterpretation of Jan’s point. There are multiple intellectual clusters which at least claim to care about x-risk which aren’t well-described as the “EA/Constellation/Trajan House” cluster. The main ones which come to mind are:
The MIRI cluster
The Pause AI cluster
The academic ML cluster
The multi-agent/sociopolitical safety cluster (which isn’t very well-defined right now but I’d put both Jan and myself in this, broadly speaking)
The Anthropic cluster (which e.g. is more positive on racing than the EA/Constellation cluster, though I’m not claiming that there’s a coherent intellectual worldview behind this)
I would only describe a few people in each cluster as actual thought leaders or key thinkers. So compared with Jan my concern is less about who gets invited, and more that sampling any gathering as large as the Summit averages together responses from people with too wide a range of levels of leadership to be accurately described as “AI safety leaders”.
Seems very odd that this comment has 6 “helpful” votes. It’s clearly very unhelpful for actually resolving the disagreement, the only way it’s helpful is in providing an authority figure’s reassurance.
I used to not actually believe in heavy-tailed impact. On some level I thought that early rationalists (and to a lesser extent EAs) had “gotten lucky” in being way more right than academic consensus about AI progress. And I thought on some gut level that e.g. Thiel and Musk and so on kept getting lucky, because I didn’t want to picture a world in which they were actually just skillful enough to keep succeeding (due to various psychological blockers).
Now, thanks to dealing with a bunch of those blockers, I have internalized to a much greater extent that you can actually be good not just lucky. This means that I’m no longer interested in strategies that involve recruiting a whole bunch of people and hoping something good comes out of it. Instead I am trying to target outreach precisely to the very best people, without compromising much.
Relatedly, I’ve updated that the very best thinkers in this space are still disproportionately the people who were around very early. The people you need to soften/moderate your message to reach (or who need social proof in order to get involved) are seldom going to be the ones who can think clearly about this stuff. And we are very bottlenecked on high-quality thinking.
(My past self needed a lot of social proof to get involved in AI safety in the first place, but I also “got lucky” in the sense of being exposed to enough world-class people that I was able to update my mental models a lot—e.g. watching the OpenAI board coup close up, various conversations with OpenAI cofounders, etc. This doesn’t seem very replicable—though I’m trying to convey a bunch of the models I’ve gained on my blog, e.g. in this post.)
My 2025 donations (so far)
My story is: Elon changing the twitter censorship policies was a big driver of a chunk of Silicon Valley getting behind Trump—separate from Elon himself promoting Trump, and separate from Elon becoming a part of the Trump team.
And I think anyone who bought Twitter could have done that.
If anything being Elon probably made it harder, because he then had to face advertiser boycotts.
Agree/disagree?
To be clear, my example wasn’t “I’m trying to talk to people in the south about racism” It’s more like, “I’m trying to talk to people in the south about animal welfare, and in doing so, I bring up examples around South people being racist.”
Yeah I got that. Let me flesh out an analogy a little more:
Suppose you want to pitch people in the south about animal welfare. And you have a hypothesis for why people in the south don’t care much about animal welfare, which is that they tend to have smaller circles of moral concern than people in the north. Here are two types of example you could give:
You could give an example which fits with their existing worldview—like “having small circles of moral concern is what the north is doing when they’re dismissive of the south”. And then they’ll nod along and think to themselves “yeah, fuck the north” and slot what you’re saying into their minds as another piece of ammunition.
Or you could give an example that actively clashes with their worldview: “hey, I think you guys are making the same kind of mistake that a bunch of people in the south have historically made by being racist”. And then most people will bounce off, but a couple will be like “oh shit, that’s what it looks like like to have a surprisingly small circle of moral concern and not realize it”.
My claims:
Insofar as the people in the latter category have that realization, it will be to a significant extent because you used an example that was controversial to them, rather than one which already made sense to them.
People in AI safety are plenty good at saying phrases like “some AI safety interventions are net negative” and “unilateralist’s curse” and so on. But from my perspective there’s a missing step of… trying to deeply understand the forces that make movements net negative by their own values? Trying to synthesize actual lessons from a bunch of past fuck-ups?
I personally spent a long time being like “yeah I guess AI safety might have messed up big-time by leading to the founding of the AGI labs” but then not really doing anything differently. I only snapped out of complacency when I got to observe first-hand a bunch of the drama at OpenAI (which inspired this post). And so I have a hypothesis that it’s really valuable to have some experience where you’re like “oh shit, that’s what it looks like for something that seems really well-intentioned that everyone in my bubble is positive-ish about to make the world much worse”. That’s what I was trying to induce with my environmentalism slide (as best I can reconstruct, though note that the process by which I actually wrote it was much more intuitive and haphazard than the argument I’m making here).
I’m nervous that you and/or others might slide into clearly-incorrect and dangerous MAGA worldviews.
Yeah, that is a reasonable fear to have (which is part of why I’m engaging extensively here about meta-level considerations, so you can see that I’m not just running on reflexive tribalism).
Having said that, there’s something here reminiscent of I can tolerate anything except the outgroup. Intellectual tolerance isn’t for ideas you think are plausible—that’s just normal discussion. It’s for ideas you think are clearly incorrect, e.g. your epistemic outgroup. Of course you want to draw some lines for discussion of aliens or magic or whatever, but in this case it’s a memeplex endorsed (to some extent) by approximately half of America, so clearly within the bounds of things that are worth discussing. (You added “dangerous” too, but this is basically a general-purpose objection to any ideas which violate the existing consensus, so I don’t think that’s a good criterion for judging which speech to discourage.)
In other words, the optimal number of people raising and defending MAGA ideas in EA and AI safety is clearly not zero. Now, I do think that in an ideal world I’d be doing this more carefully. E.g. I flagged in the transcript a factual claim that I later realized was mistaken, and there’s been various pushback on the graphs I’ve used, and the “caused climate change” thing was an overstatement, and so on. Being more cautious would help with your concerns about “epistemic slight-of-hands”. But for better or worse I am temperamentally a big-ideas thinker, and when I feel external pressure to make my work more careful that often kills my motivation to do it (which is true in AI safety too—I try to focus much more on first-principles reasoning than detailed analysis). In general I think people should discount my views somewhat because of this (and I give several related warnings in my talk) but I do think that’s pretty different from the hypothesis you mention that I’m being deliberately deceptive.
a lot of your framing matches incredibly well with what I see as current right-wing talking points
Occam’s razor says that this is because I’m right-wing (in the MAGA sense not just the libertarian sense).
It seems like you’re downweighting this hypothesis primarily because you personally have so much trouble with MAGA thinkers, to the point where you struggle to understand why I’d sincerely hold this position. Would you say that’s a fair summary? If so hopefully some forthcoming writings of mine will help bridge this gap.
It seems like the other reason you’re downweighting that hypothesis is because my framing seems unnecessarily provocative. But consider that I’m not actually optimizing for the average extent to which my audience changes their mind. I’m optimizing for something closer to the peak extent to which audience members change their mind (because I generally think of intellectual productivity as being heavy-tailed). When you’re optimizing for that you may well do things like give a talk to a right-wing audience about racism in the south, because for each person there’s a small chance that this example changes their worldview a lot.
I’m open to the idea that this is an ineffective or counterproductive strategy, which is why I concede above that this one probably went a bit too far. But I don’t think it’s absurd by any means.
Insofar as I’m doing something I don’t reflectively endorse, I think it’s probably just being too contrarian because I enjoy being contrarian. But I am trying to decrease the extent to which I enjoy being contrarian in proportion to how much I decrease my fear of social judgment (because if you only have the latter then you end up too conformist) and that’s a somewhat slow process.
Thanks for the comment.
I think you probably should think of Silicon Valley as “the place” for politics. A bunch of Silicon Valley people just took over the Republican party, and even the leading Democrats these days are Californians (Kamala, Newsom, Pelosi) or tech-adjacent (Yglesias, Klein).
Also I am working on basically the same thing as Jan describes, though I think coalitional agency is a better name for it. (I even have a post on my opposition to bayesianism.)
Good questions. I have been pretty impressed with:
Balaji, who tweets about the dismantling of the American Empire (e.g. here, here) are the best geopolitical analysis I’ve seen of what’s going wrong with the current Trump administration
NS Lyons, e.g. here.
Some of Samo Burja’s concepts (e.g. live vs dead players) have proven much more useful than I expected when I heard about them a few years ago.
I think there are probably a bunch of other frameworks that have as much or more explanatory power as my two-factor model (e.g. Henrich’s concept of WEIRDness, Scott Alexander’s various models of culture war + discourse dynamics, etc). It’s less like they’re alternatives though and more like different angles on the same elephant.
Thanks for the thoughtful comment! Yeah the OpenAI board thing was the single biggest thing that shook me out of complacency and made me start doing sociopolitical thinking. (Elon’s takeover of twitter was probably the second—it’s crazy that you can get that much power for $44 billion.)
I do think I have a pretty clear story now of what happened there, and maybe will write about it explicitly going forward. But for now I’ve written about it implicitly here (and of course in the cooperative AI safety strategies post).
No central place for all the sources but the one you asked about is: https://www.sebjenseb.net/p/how-profitable-is-embryo-selection
Ah, gotcha. Yepp, that’s a fair point, and worth me being more careful about in the future.
I do think we differ a bit on how disagreeable we think advocacy should be, though. For example, I recently retweeted this criticism of Abundance, which is basically saying that they overly optimized for it to land with those who hear it.
And in general I think it’s worth losing a bunch of listeners in order to convey things more deeply to the ones who remain (because if my own models of movement failure have been informed by environmentalism etc, it’s hard to talk around them).
But in this particular case, yeah, probably a bit of an own goal to include the environmentalism stuff so strongly in an AI talk.
Hmm, I think you’re overstating the disanalogy here. In the case of the individual life, you say “the reasons to trust “that some process designed their body and mind to function well” are relatively strong”. But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children. There’s a lot of stuff that’s unprecedented, and which you only get one shot at.
Re the experience of AI safety so far: I certainly do think that a bunch of actions taken by the AI safety movement have backfired. I also think that a bunch have succeeded. If you think of the AI safety movement as a young organism going through its adolescence, it’s gaining a huge amount of experience from all of these interactions with the world—and the net value of the things that it’s done so far may well be dominated by what updates it makes based on that experience.
I guess that’s where we disagree. It seems like you think that we should update towards being clueless. Whereas I think we can extract a bunch of generalizable lessons that make us less clueless than we used to be—and one of those lessons is that many of these mistakes could have been prevented by using the right kinds of non-model-based decision-making.
EDIT: another way of trying to get at the crux: do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldn’t be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?