Thanks! I agree strongly with that.
William_MacAskill
(Also, thank you for doing this analysis, it’s great stuff!)
😢
Rutger Bregman isn’t on the Forum, but sent me this message and gave me permission to share:
Great piece! I strongly agree with your point about PR. EA should just be EA, like the Quakers just had to be Quakers and Peter Singer should just be Peter Singer.
Of course EA had to learn big lessons from the FTX saga. But those were moral and practical lessons so that the movement could be proud of itself again. Not PR-lessons. The best people are drawn to EA not because it’s the coolest thing on campus, but because it’s a magnet for the most morally serious + the smartest people.
As you know, I think EA is at it’s best when it’s really effective altruism (“I deeply care about all the bad stuff in the world, desperately want to make it difference, so I gotta think really fcking hard about how I can make the biggest possible difference”) and not altruistic rationalism (“I’m super smart, and I might as well do a lot of good with it”).
This ideal version EA won’t appeal to all super talented people of course, but that’s fine. Other people can build other movements for that. (It’s what we’re trying to do at The School for Moral Ambition..)
Argh, thanks for catching that! Edited now.
If this perspective involves a strong belief that AI will not change the world much, then IMO that’s just one of the (few?) things that are ~fully out of scope for Forethought
I disagree with this. There would need to be some other reason for why they should work at Forethought rather than elsewhere, but there are plausible answers to that — e.g. they work on space governance, or they want to write up why they think AI won’t change the world much and engage with the counterarguments.
I can’t speak to the “AI as a normal technology” people in particular, but a shortlist I created of people I’d be very excited about includes someone who just doesn’t buy at all that AI will drive an intelligence explosion or explosive growth.
I think there are lots of types of people where it wouldn’t be a great fit, though. E.g. continental philosophers; at least some of the “sociotechnical” AI folks; more mainstream academics who are focused on academic publishing. And if you’re just focused on AI alignment, probably you’ll get more at a different org than you would at Forethought.
More generally, I’m particularly keen on situations where V(X, Forethought team) is much greater than than V(X) + V(Forethought team), either because there are synergies between X and the team, or because X is currently unable to do the most valuable work they could in any of the other jobs they could be in.
Thanks for writing this, Lizka!
Some misc comments from me:
I have the worry that people will see Forethought as “the Will MacAskill org”, at least to some extent, and therefore think you’ve got to share my worldview to join. So I want to discourage that impression! There’s lots of healthy disagreement within the team, and we try to actively encourage disagreement. (Salient examples include disagreement around: AI takeover risk; whether the better futures perspective is totally off-base or not; moral realism / antirealism; how much and what work can get punted until a later date; AI moratoria / pauses; whether deals with AIs make sense; rights for AIs; gradual disempowerment).
I think from the outside it’s probably not transparent just how involved some research affiliates or other collaborators are, in particular Toby Ord, Owen Cotton-Barratt, and Lukas Finnveden.
I’d in particular be really excited for people who are deep in the empirical nitty-gritty — think AI2027 and the deepest criticisms of that; or gwern; or Carl Shulman; or Vaclav Smil. This is something I wish I had more skill and practice in, and I think it’s generally a bit of a gap in the team.
While at Forethought, I’ve been happier in my work than I have in any other job. That’s a mix of: getting a lot of freedom to just focus on making intellectual progress rather various forms of jumping through hoops; the (importance)*(intrinsic interestingness) of the subject matter; the quality of the team; the balance of work ethic and compassion among people — it really feels like everyone has each other’s back; and things just working and generally being low-drama.
I’m not even sure your arguments would be weak in that scenario.
Thanks—classic Toby point! I agree entirely that you need additional assumptions.
I was imagining someone who thinks that, say, there’s a 90% risk of unaligned AI takeover, and a 50% loss of EV of the future from other non-alignment issues that we can influence. So EV of the future is 5%.
If so, completely solving AI risk would increase the EV of the future to 50%; halving both would increase it only to 41%.
But, even so, it’s probably easier to halve both than to completely eliminate AI takeover risk, and more generally the case for a mixed strategy seems strong.
Haha, thank you for the carrot—please have one yourself!
”Harangue” was meant to be a light-hearted term. I agree, in general, on carrots rather than sticks. One style of carrot is commenting things like “Great post!”—even if not adding any content, I think it probably would increase the quantity of posts on the Forum, and somewhat act as a reward signal (more than just karma).
making EA the hub for working on “making the AI transition go well”
I don’t think EA should be THE hub. In an ideal world, loads of people and different groups would be working on these issues. But at the moment, really almost no one is. So the question is whether it’s better if, given that, EA does work on it, and at least some work gets done. I think yes.
(Analogy: was it good or bad that in the earlier days, there was some work on AI alignment, even though that work was almost exclusively done by EA/rationalist types?)
Bootstrapping to viatopia
I think it’s likely that without a long (e.g. multi-decade) AI pause, one or more of these “non-takeover AI risks” can’t be solved or reduced to an acceptable level.
I don’t understand why you’re framing the goal as “solving or reducing to an acceptable level”, rather than thinking about how much expected impact we can have. I’m in favour of slowing the intelligence explosion (and in particular of “Pause at human-level”.) But here’s how I’d think about the conversion of slowdown/pause into additional value:
Let’s say the software-only intelligence explosion lasts N months. The value of any slowdown effort is given by that’s at least as concave as log in the length of time of the SOIE.
So, if log, you get as much value from going from 6 months to 1 year as you do going from 1 decade to 2 decades. But the former is way easier to achieve than the latter. And, actually, I think the function is more-concave than log—the gains from 6 months to 1 year are greater than the gains from 1 decade to 2 decades. Reasons: I think that’s how it is in most areas of solving problems (esp research problems); there’s an upper bound on how much we can achieve (if the problem gets totally solved) so it must be more-concave than log. And I think there are particular gains from people not getting taken by surprise, and bootstrapping to viatopia (new post), which we get from relatively short pauses.
Whereas it seems like maybe you think it’s convex, such that smaller pauses or slowdowns do very little? If so, I don’t see why we should think that, especially in light of great uncertainty about how difficult these issues are.
Then, I would also see a bunch of ways of making progress on these issues that don’t involve slowdowns. Like: putting in the schlep to RL AI and create scaffolds so that we can have AI making progress on these problems months earlier than we would have done otherwise; having the infrastructure set up such that people actually do point AI towards these problems; having governance set up such that the most important decision-makers are actually concerned about these issues and listening to the AI-results that are being produced, etc. As well as the lowest-hanging fruit in ways to prevent very bad outcomes on these issues e.g. AI-enabled coups (like getting agreement for AI to be law-following, or auditing models for backdoors), or people developing extremely partisan AI advisers that reinforce their current worldview.
Thanks, Nick, that’s helpful. I’m not sure how much we actually disagree — in particular, I wasn’t meaning this post to be a general assessment of EA as a movement, rather than pointing to one major issue — but I’ll use the opportunity to clarify my position at least.
The EA movement is not (and should not be) dependent on continuous intellectual advancement and breakthrough for success. When I look at your 3 categories for the “future” of EA, they seem to refer more to our relevance as thought leaders, rather than what we actually achieve in the world
It’s true in principle that EA needn’t be dependent in that way. If we really had found the best focus areas, had broadly allocated right % of labour to each, and have prioritised within them well, too, and the best focus areas didn’t change over time, then we could just focus on doing and we wouldn’t need any more intellectual advancement. But I don’t think we’re at that point. Two arguments:
1. An outside view argument: In my view, we’re more likely than not to see more change and more intellectual development in the next two decades than we saw in the last couple of centuries. (I think we’ve already seen major strategically-relevant change in the last few years.) It would be very surprising if the right prioritisation prior to this point is the right prioritisation through this period, too.
2. An inside view argument: Look at my list of other cause areas. Some might still turn out to be damp squibs, but I’m confident some aren’t. The ideal portfolio involves a lot of effort on some of these areas, and we need thought and research in order to know whichn ones and how best to address them.
I love your list of achievements—I agree the EA movement has had a lot of wins and we should celebrate that. But EA is about asking whether we’re doing the most good, not just a lot of good. And, given the classic arguments around fat-tailed distributions and diminishing returns within any one area, I think if we mis-prioritise we lose a meaningful % of the impact we could have had.
So, I don’t care about intellectual progress intrinsically. I’m making the case that we need it in order to do as much good as we could.
More generally, I think a lot of social movements lose out on a lot of the impact they could have had (even on their own terms) via “ossification”—getting stuck on a set of ideas or priorities that it becomes hard, culturally, to change. E.g. environmentalists opposing nuclear, animal welfare advocates focusing on veganism, workers’ rights opposing capitalism, etc. I think this occurs for structural reasons that we should expect to apply to EA, too.
Thanks—I agree the latter is important, and I think it’s an error if “Attending carefully to the effect of communicating ideas in different ways” (appreciating that most of your audience is not extremely high-decoupling, etc) is rounded off to being overly focused on PR.
I agree with you that in the intervening time, the pendulum has swung too far in the other direction, and am glad to see your pushback.
Thank you for clarifying—that’s really helpful to hear!
”I think that most of the intellectual core continues to hold EA values and pursue the goals they pursue for EA reasons (trying to make the world better as effectively as possible, e.g. by trying to reduce AI risk), they’ve just updated against that path involving a lot of focus on EA itself”
And I agree strongly with this — and I think if it’s a shame if people interpret the latter as meaning “abandoning EA” rather than “rolling up our sleeves and getting on with object-level work.”
Effective altruism in the age of AGI
Thank you so much for writing this; I found a lot of it quite moving.
Since I read Strangers Drowning, this quote has really stuck in my mind:
”for do-gooders, it is always wartime”
And this from what you wrote resonates deeply, too:
”appreciate how wonderful it is to care about helping others.”
″celebrate being part of a community that cares so much that we want to do so as effectively as we can.”
Meditation and the cultivation of gratitude has been pretty transformative in my own life for my own wellbeing and ability to cope with living in a world in which it’s always wartime. I’m so glad you’ve had the same experience.
(cross-posted from LW)
Hey Rob, thanks for writing this, and sorry for the slow response. In brief, I think you do misunderstand my views, in ways that Buck, Ryan and Habryka point out. I’ll clarify a little more.Some areas where the criticism seems reasonable:
I think it’s fair to say that I worded the compute governance sentence poorly, in ways Habryka clarified.
I’m somewhat sympathetic to the criticism that there was a “missing mood” (cf e.g. here and here), given that a lot of people won’t know my broader views. I’m very happy to say: “I definitely think it will be extremely valuable to have the option to slow down AI development in the future,” as well as “the current situation is f-ing crazy”. (Though there was also a further vibe on twitter of “we should be uniting rather than disagreeing” which I think is a bad road to go down.)
Now, clarifying my position:
Here’s what I take IABI to be arguing (written by GPT5-Pro, on the basis of a pdf, in an attempt not to infuse my biases):
The book argues that building a superhuman AI would be predictably fatal for humanity and therefore urges an immediate, globally enforced halt to AI escalation—consolidating and monitoring compute under treaty, outlawing capability‑enabling research, and, if necessary, neutralizing rogue datacenters—while mobilizing journalists and ordinary citizens to press leaders to act.
And what readers will think the book is about (again written by GPT5-Pro):
A “shut‑it‑all‑down‑now” manifesto warning that any superintelligent AI will wipe us out unless governments ban frontier AI and are prepared to sabotage or bomb rogue datacenters—so the public and the press must demand it.
The core message of the book is not saying merely “AI x-risk is worryingly high” or “stopping or slowing AI development would be one good strategy among many.” I wouldn’t disagree with the former at all, and the latter disagreement would be more about the details.
Here’s a different perspective:
AI takeover x-risk is high, but not extremely high (e.g. 1%-40%). The right response is an “everything and the kitchen sink” approach — there are loads of things we can do that all help a bit in expectation (both technical and governance, including mechanisms to slow the intelligence explosion), many of which are easy wins, and right now we should be pushing on most of them.
This is my overall strategic picture. If the book had argued for that (or even just the “kitchen sink” approach part) then I might have disagreed with the arguments, but I wouldn’t feel, “man, people will come away from this with a bad strategic picture”.
(I think the whole strategic picture would include:
There are a lot of other existential-level challenges, too (including human coups / concentration of power), and ideally the best strategies for reducing AI takeover risk shouldn’t aggravate these other risks.
But I think that’s fine not to discuss in a book focused on AI takeover risk.)
This is also the broad strategic picture, as I understand it, of e.g. Carl, Paul, Ryan, Buck. It’s true that I’m more optimistic than they are (on the 80k podcast I say 1-10% range for AI x-risk, though it depends on what exactly you mean by that) but I don’t feel deep worldview disagreement with them.
With that in mind, some reasons why I think the promotion of the Y&S view could be meaningfully bad:
If it means more people don’t pursue the better strategy of focusing on the easier wins.
Or they end up making the wrong tradeoffs. (e.g. intense centralisation of AI development in a way that makes misaligned human takeover risk more likely)
Or people might lapse into defeatism: “Ok we’re doomed, then: a decades-long international ban will never happen, so it’s pointless to work on AI x-risk.” (We already see this reaction to climate change, given doomerist messaging there. To be clear, I don’t think that sort of effect should be a reason for being misleading about one’s views.)
Overall, I feel pretty agnostic on whether Y&S shouting their message is on net good for the world.
I think I’m particularly triggered by all this because of a conversation I had last year with someone who takes AI takeover risk very seriously and could double AI safety philanthropy if they wanted to. I was arguing they should start funding AI safety, but the conversation was a total misfire because they conflated “AI safety” with “stop AI development”: their view was that that will never happen, and they were actively annoyed that they were hearing what they considered to be such a dumb idea. My guess was that EY’s TIME article was a big factor there.
Then, just to be clear, here are some cases where you misunderstand me, just focusing on the most-severe misunderstandings:
he’s more or less calling on governments to sit back and let it happen
I really don’t think that!
He thinks feedback loops like “AIs do AI capabilities research” won’t accelerate us too much first.
I vibeswise disagree, because I expect massive acceleration and I think that’s *the* key challenge: See e.g. PrepIE, 80k podcast.
But there is a grain of truth in that my best guess is a more muted software-only intelligence explosion than some others predict. E.g. a best guess where, once AI fully automates AI R&D, we get 3-5 years of progress in 1 year (at current rates), rather than 10+ years’ worth, or rather than godlike superintelligence. This is the best analysis I know of on the topic. This might well be the cause of much of the difference in optimism between me and e.g. Carl.
(Note I still take the much larger software explosions very seriously (e.g. 10%-20% probability). And I could totally change my mind on this — the issue feels very live and open to me.)
Will thinks government compute monitoring is a bad idea
Definitely disagree with this one! In general, society having more options and levers just seems great to me.
he’s sufficiently optimistic that the people who build superintelligence will wield that enormous power wisely and well, and won’t fall into any traps that fuck up the future
Definitely disagree!
Like, my whole bag is that I expect us to fuck up the future even if alignment is fine!! (e.g. Better Futures)
He’s proposing that humanity put all of its eggs in this one basket
Definitely disagree! From my POV, it’s the IABI perspective that is closer to putting all the eggs in one basket, rather than advocating for the kitchen sink approach.
It seems hard to be more than 90% confident in the whole conjunction, in which case there’s a double-digit chance that the everyone-races-to-build-superintelligence plan brings the world to ruin.
But “10% chance of ruin” is not what EY&NS, or the book, is arguing for, and isn’t what I was arguing against. (You could logically have the view of “10% chance of ruin and the only viable way to bring that down is a global moratorium”, but I don’t know anyone who has that view.)
a conclusion like “things will be totally fine as long as AI capabilities trendlines don’t change.”
Also not true, though I am more optimistic than many on the takeover side of things.
to advocate that we race to build it as fast as possible
Also not true—e.g. I write here about the need to slow the intelligence explosion.
There’s a grain of truth in that I’m pretty agnostic on whether speeding up or slowing down AI development right now is good or bad. I flip-flop on it, but I currently lean towards thinking speed up at the moment is mildly good, for a few reasons: it stretches out the IE by bringing it forwards, means there’s more of a compute constraint and so the software-only IE doesn’t go as far, and means society wakes up earlier, giving more time to invest more in alignment of more-powerful AI.
(I think if we’d gotten to human-level algorithmic efficiency at the Dartmouth conference, that would have been good, as compute build-out is intrinsically slower and more controllable than software progress (until we get nanotech). And if we’d scaled up compute + AI to 10% of the global economy decades ago, and maintained it at that level, that also would have been good, as then the frontier pace would be at the rate of compute-constrained algorithmic progress, rather than the rate we’re getting at the moment from both algorithmic progress AND compute scale-up.)
In general, I think that how the IE happens and is governed is a much bigger deal than when it happens.
like, I still associate Will to some degree with the past version of himself who was mostly unconcerned about near-term catastrophes and thought EA’s mission should be to slowly nudge long-term social trends.
Which Will-version are you thinking of? Even in DGB I wrote about preventing near-term catastrophes as a top cause area.
I think Will was being unvirtuously cagey or spin-y about his views
Again really not intended! I think I’ve been clear about my views elsewhere (see previous links).
Ok, that’s all just spelling out my views. Going back, briefly, to the review. I said I was “disappointed” in the book — that was mainly because I thought that this was E&Y’s chance to give the strongest version of their arguments (though I understood they’d be simplified or streamlined), and the arguments I read were worse than I expected (even though I didn’t expect to find them terribly convincing).
Regarding your object-level responses to my arguments — I don’t think any of them really support the idea that alignment is so hard that AI takeover x-risk is overwhelmingly likely, or that the only viable response is to delay AI development by decades. E.g.
As Joe Collman notes, a common straw version of the If Anyone Builds It, Everyone Dies thesis is that “existing AIs are so dissimilar” to a superintelligence that “any work we do now is irrelevant,” when the actual view is that it’s insufficient, not irrelevant.
But if it’s a matter of “insufficiency”, the question is how one can be so confident that any work we do now (including with ~AGI assistance, including if we’ve bought extra time via control measures and/or deals with misaligned ~AGIs) is insufficient, such that the only thing that makes a meaningful difference to x-risk, even in expectation, is a global moratorium. And I’m still not seeing the case for that.
(I think I’m unlikely to respond further, but thanks again for the engagement.)
I, of course, agree!
One additional point, as I’m sure you know, is that potentially you can also affect P(things go really well | AI takeover). And actions to increase ΔP(things go really well | AI takeover) might be quite similar to actions that increase ΔP(things go really well | no AI takeover). If so, that’s an additional argument for those actions compared to affecting ΔP(no AI takeover).
Re the formal breakdown, people sometimes miss the BF supplement here which goes into this in a bit more depth. And here’s an excerpt from a forthcoming paper, “Beyond Existential Risk”, in the context of more precisely defining the “Maxipok” principle. What it gives is very similar to your breakdown, and you might find some of the terms in here useful (apologies that some of the formatting is messed up):
”An action x’s overall impact (ΔEVx) is its increase in expected value relative to baseline. We’ll let C refer to the state of existential catastrophe, and b refer to the baseline action. We’ll define, for any action x: Px=P[¬C | x] and Kx=E[V |¬C, x]. We can then break overall impact down as follows:
ΔEVx = (Px – Pb) Kb+ Px(Kx– Kb)
We call (Px – Pb) Kb the action’s existential impact and Px(Kx– Kb) the action’s trajectory impact. An action’s existential impact is the portion of its expected value (relative to baseline) that comes from changing the probability of existential catastrophe; an action’s trajectory impact is the portion of its expected value that comes from changing the value of the world conditional on no existential catastrophe occurring.
We can illustrate this graphically, where the areas in the graph represent overall expected value, relative to a scenario with a guarantee of catastrophe:
With these in hand, we can then define:
Maxipok (precisified): In the decision situations that are highest-stakes with respect to the longterm future, if an action is near‑best on overall impact, then it is close-to-near‑best on existential impact.
[1] Here’s the derivation. Given the law of total expectation:
E[V|x] = P(¬C | x)E[V |¬C, x] + P(C | x)E[V |C, x]
To simplify things (in a way that doesn’t affect our overall argument, and bearing in mind that the “0” is arbitrary), we assume that E[V |C, x] = 0, for all x, so:
E[V|x] = P(¬C | x)E[V |¬C, x]
And, by our definition of the terms:
P(¬C | x)E[V |¬C, x] = PxKx
So:
ΔEVx= E[V|x] – E[V|b] = PxKx – PbKb
Then adding (PxKb – PxKb) to this and rearranging gives us:
ΔEVx = (Px–Pb)Kb + Px(Kx–Kb)”