Seth Herd comments on Should we aim for flourishing over mere survival? The Better Futures series.

Seth Herd 4 Aug 2025 23:06 UTC
12 points
3 ∶ 2
Copied from my comment on LW, because it may actually be more relevant over here where not everyone is convinced about alignment being hard. It’s a really sketchy presentation of what I think are strong arguments for why the consensus on this is wrong on this.

I really wish I could agree. I think we should definitely think about flourishing when it’s a win/win with survival efforts. But saying we’re near the ceiling on survival looks wildly too optimistic to me. This is after very deeply considering our position and the best estimate of our odds, primarily surrounding the challenge of aligning superhuman AGI (including surrounding societal complications).
There are very reasonable arguments to be made about the best estimate of alignment/AGI risk. But disaster likelihoods below 10% really just aren’t viable when you look in detail. And it seems like that’s what you need to argue that we’re near ceiling on survival.
The core claim here is “we’re going to make a new species which is far smarter than we are, and that will definitely be fine because we’ll be really careful how we make it” in some combination with “oh we’re definitely not making a new species any time soon, just more helpful tools”.
When examined in detail, assigning a high confidence to those statements is just as silly as it looks at a glance. That is obviously a very dangerous thing and one we’ll do pretty much as soon as we’re able.
90% plus on survival looks like a rational view from a distance, but there are very strong arguments that it’s not. This won’t be a full presentation of those arguments; I haven’t written it up satisfactorily yet, so here’s the barest sketch.
Here’s the problem: The more people think seriously about this question, the more pessimistic they are.
(edit—we asymptote at different points but almost universally far above 10% p(doom))
And those who’ve spent more time on this particular question should be weighted far higher. Time-on-task is the single most important factor for success in every endeavor. It’s not a guarantee but it’s by far the most important factor. It dwarfs raw intelligence as a predictor of success in every domain (although the two are multiplicative).

The “expert forecasters” you cite don’t have nearly the time-on-task of thinking about the AGI alignment problem. Those who actually work in that area are very systematically more pessimistic the longer and more deeply we’ve thought about it. There’s not a perfect correlation, but it’s quite large.
This should be very concerning from an outside view.
This effect clearly goes both ways, but that only starts to explain the effect. Those who intuitively find AGI very dangerous are prone to go into the field. And they’ll be subject to confirmation bias. But if they were wrong, a substantial subset should be shifting away from that view after they’re exposed to every argument for optimism. This effect would be exaggerated by the correlation between rationalist culture and alignment thinking; valuing rationality provides resistance (but certainly not immunity!) to motivated reasoning/confirmation bias by aligning ones’ motivations with updating based on arguments and evidence.
I am an optimistic person, and I deeply want AGI to be safe. I would be overjoyed for a year if I somehow updated to only 10% chance of AGI disaster. It is only my correcting for my biases that keeps me looking hard enough at pessimistic arguments to believe them based on their compelling logic.
And everyone is affected by motivated reasoning, particularly the optimists. This is complex, but after doing my level best to correct for motivations, it looks to me like the bias effects have far more leeway to work when there’s less to push against. The more evidence and arguments are considered, the less bias takes hold. This is from the literature on motivated reasoning and confirmation bias, which was my primary research focus for a few years and a primary consideration for the last ten.
That would’ve been better as a post or a short form, and more polished. But there it is FWIW, a dashed-off version of an argument I’ve been mulling over for the past couple of years.
I’ll still help you aim for flourishing, since having an optimistic target is a good way to motivate people to think about the future.

Edit: I realize this isn’t an airtight argument and apologize for the tone of confidence in the absence of presenting the whole thing carefully and with proper references.
- Arepo 5 Aug 2025 6:43 UTC
  15 points
  6 ∶ 3
  Parent
  Here’s the problem: The more people think seriously about this question, the more pessimistic they are.
  Citation needed on this point. I you’re underrepresenting the selection bias for a start—it’s extremely hard to know how many people have engaged with and rejected the doomer ideas since they have far less incentive to promote their views. And those who do often find sloppy argument and gross misuses of the data in some of the prominent doomer arguments. (I didn’t have to look too deeply to realise the orthogonality thesis was a substantial source of groupthink)
  Even within AI safety workers, it’s far from clear to me that the relationship you assert exists. My impression of the AI safety space is that there are many orgs working on practical problems that they take very seriously without putting much credence in the human-extinction scenarios (FAR.AI, Epoch, UK AISI off the top of my head).
  One guy also looked at the explicit views of AI experts and found if anything an anticorrelation between their academic success and their extinction-related concern. That was looking back over a few years and obviously a lot can change in that time, but the arguments for AI extinction had already been around for well over a decade at the time of that survey.
  The “expert forecasters” you cite don’t have nearly the time-on-task of thinking about the AGI alignment problem.
  This is true for forecasting in every domain. There are virtually always domain experts who have spent their careers thinking about any given question, and yet superforecasters seem to systematically outperform them. If this weren’t true, superforecasting wouldn’t be a field—we’d just go straight to the domain experts for our predictions.
  What links here?
  - But Have They Engaged With The Arguments? [Linkpost] by Sharmake (14 Sep 2025 17:39 UTC; 29 points)
  - Will Aldred 6 Aug 2025 9:54 UTC
    11 points
    1 ∶ 0
    Parent
    Just want to quickly flag that you seem to have far more faith in superforecasters’ long-range predictions than do most people who have worked full-time in forecasting, such as myself.
    @MichaelDickens’ ‘Is It So Much to Ask?’ is the best public writeup I’ve seen on this (specifically, on the problems with Metaculus’ and FRI XPT’s x-risk/extinction forecasts, which are cited in the main post above). I also very much agree with:
    Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions
    Here are some reasons why we might expect longer-term predictions to be more difficult:
    No fast feedback loops for long-term questions. You can’t get that many predict/check/improve cycles, because questions many years into the future, tautologically, take many years to resolve. There are shortcuts, like this past-casting app, but they are imperfect.
    It’s possible that short-term forecasters might acquire habits and intuitions that are good for forecasting short-term events, but bad for forecasting longer-term outcomes. For example, “things will change more slowly than you think” is a good heuristic to acquire for short-term predictions, but might be a bad heuristic for longer-term predictions, in the same sense that “people overestimate what they can do in a week, but underestimate what they can do in ten years”. This might be particularly insidious to the extent that forecasters acquire intuitions which they can see are useful, but can’t tell where they come from. In general, it seems unclear to what extent short-term forecasting skills would generalize to skill at longer-term predictions.
    “Predict no change” in particular might do well, until it doesn’t. Consider a world which has a 2% probability of seeing a worldwide pandemic, or some other large catastrophe. Then on average it will take 50 years for one to occur. But at that point, those predicting a 2% will have a poorer track record compared to those who are predicting a ~0%.
    In general, we have been in a period of comparative technological stagnation, and forecasters might be adapted to that, in the same way that e.g., startups adapted to low interest rates.
    Sub-sampling artifacts within good short-term forecasters are tricky. For example, my forecasting group Samotsvety is relatively bullish on transformative technological change from AI, whereas the Forecasting Research Institute’s pick of forecasters for their existential risk survey was more bearish.
    —Nuño Sempere
    How much weight should we give to these aggregates?
    My personal tier list for how much weight I give to AI x-risk forecasts to the extent I defer:
    Individual forecasts from people who seem to generally have great judgment, and have spent a ton of time thinking about AI x-risk forecasting e.g. Cotra, Carlsmith
    Samotsvety aggregates presented here
    A superforecaster aggregate (I’m biased re: quality of Samotsvety vs. superforecasters, but I’m pretty confident based on personal experience)
    Individual forecasts from AI domain experts who seem to generally have great judgment, but haven’t spent a ton of time thinking about AI x-risk forecasting (this is the one I’m most uncertain about, could see anywhere from 2-4)
    Everything else I can think of I would give little weight to.^[1]^[2]
    —Eli Lifland
    Separately, I think you’re wrong about UK AISI not putting much credence on extinction scenarios? I’ve seen job adverts from AISI that talk about loss of control risk (i.e., AI takeover), and I know people working at AISI who—last I spoke to them—put ≫10% on extinction.
    ^
    Why do I give little weight to Metaculus’s views on AI? Primarily because of the incentives to make very shallow forecasts on a ton of questions (e.g. probably <20% of Metaculus AI forecasters have done the equivalent work of reading the Carlsmith report), and secondarily that forecasts aren’t aggregated from a select group of high performers but instead from anyone who wants to make an account and predict on that question.
    ^
    Why do I give little weight to AI expert surveys such as When Will AI Exceed Human Performance? Evidence from AI Experts? I think most AI experts have incoherent and poor views on this because they don’t think of it as their job to spend time thinking and forecasting about what will happen with very powerful AI, and many don’t have great judgment.
    - Arepo 6 Aug 2025 10:43 UTC
      2 points
      0 ∶ 0
      Parent
      You might be right re forecasting (though someone willing in general to frequently bet on 2% scenarios manifesting should fairly quickly outperform someone who frequently bets against them—if their credences are actually more accurate).
      I think you’re wrong about UK AISI not putting much credence on extinction scenarios? I’ve seen job adverts from AISI talking about loss of control risk (i.e., AI takeover), and how ‘the risks from AI are not sci-fi, they are urgent.’ And I know people working at AISI who, last I spoke to them, put ≫10% on extinction.
      The two jobs you mention only refer to ‘loss of control’ as a single concern among many - ‘risks with security implications, including the potential of AI to assist with the development of chemical and biological weapons, how it can be used to carry out cyber-attacks, enable crimes such as fraud, and the possibility of loss of control.’
      I’m not claiming that these orgs don’t or shouldn’t take the lesser risks and extreme tail risks seriously (I think they should and do), but denying the claim that people who ‘think seriously’ about AI risks necessarily lean towards high extinction probabilities.
  - elifland 6 Aug 2025 16:30 UTC
    8 points
    1 ∶ 0
    Parent
    There are virtually always domain experts who have spent their careers thinking about any given question, and yet superforecasters seem to systematically outperform them.
    I don’t think this has been established. See here
  - Seth Herd 5 Aug 2025 22:12 UTC
    3 points
    1 ∶ 2
    Parent
    I don’t have a nice clean citation. I don’t think one exists. I’ve looked at an awful lot of individual opinions and different surveys. I guess the biggest reason I’m convinced this correlation exists is that arguments for low p(doom) very rarely actually engage arguments for risk at their strong points (when they do the discussions are inconclusive in both directions—I’m not arguing that alignment is hard, but that it’s very much unknown how hard it is).
    There appears to be a very high correlation between misunderstanding the state of play, and optimism. And because it’s a very complex state of arguments, the vast majority of the world misunderstands it pretty severely.
    I very much wish it was otherwise; I am an optimist who has become steadily more pessimistic as I’ve made alignment my full-time focus—because the arguments against are subtle (and often poorly communicated) but strong.
    They arguments for the difficulty of alignment are far too strong to be rationally dismissed down to the 1.4% or whatever it was that the superforecasters arrived at. They have very clearly missed some important points of argument.
    The anticorrelation with academic success seems quite right and utterly irrelevant. As a career academic, I have been noticing for decades that academic success has some quite perverse incentives.
    
    I agree that there are bad arguments for pessimism as well as optimism. The use of bad logic in some prominent arguments says nothing about the strength of other arguments. Arguments on both sides are far from conclusive. So you can hope arguments for the fundamental difficulty of aligning network-based AGI are wrong, but assigning a high probability they’re wrong without understanding them in detail and constructing valid counterarguments is tempting but not rational.
    If there’s a counterargument you find convincing, please point me to it! Because while I’m arguing from the outside view, my real argument is that this is an issue that is unique in intellectual history, so it can really only be evaluated from the inside view. So that’s where most of my thoughts on the matter go.
    All of which isn’t to say the doomers are right and we’re doomed if we don’t stop building network-based AGI. I’m saying we don’t know. I’m arguing that assigning a high probability right now based on limited knowledge to humanity accomplishing alignment is not rationally justified.
    
    I think that fact is reflected in the correlation of p(doom) with time-on-task only on alignment specifically. If that’s wrong I’d be shocked, because it looks very strong to me, and I do work hard to correct for my own biases. But it’s possible I’m wrong about this correlation. If so it will make my day and perhaps my month or year!
    
    It is ultimately a question that needs to be resolved at the object level; we just need to take guesses about how to assign resources based on outside views.
    - Arepo 6 Aug 2025 4:28 UTC
      3 points
      2 ∶ 2
      Parent
      because it’s a very complex state of arguments, the vast majority of the world misunderstands it pretty severely… They have very clearly missed some important points of argument.
      This seems like an argument from your own authority. I’ve read a number of doomer arguments and personally found them unconvincing, but I’m not asking anyone to take my word for it. Of course you can always say ‘you’ve read the wrong arguments’, but in general, if your argument amounts to ‘you need to read this 10s-of-thousands-of-words-argument’ there’s no reason for an observer to believe that you understand it better than other intelligent individuals who’ve read it and reject it.
      Therefore this:
      If there’s a counterargument you find convincing, please point me to it!
      … sounds like special pleading. You’re trying to simultaneously claim both that a) the arguments for doom are so so complicated that no-one’s anti-doom views have any weight unless they’ve absorbed a nebulous gestalt of pro-doomer literature and b) that the purported absence of a single gestalt-rebutting counterpoint justifies a doomer position.
      And to be clear, I don’t think the epistemic burden should be equalised—I think it should be the other way around. Arguments for extinction by AI are necessarily built on a foundation of a priori and partially empirical premises, such that the dissolution of any collapses the whole argument. To give a few examples, such arguments require one to believe:
      No causality of intelligence on goals, or too weak causality to outweigh other factors
      AI to develop malevolent goals despite the process of developing it inherently involving incremental steps towards making it closer to doing what its developers want
      AI to develop malevolent goals despite every developer working on it wanting it not to kill them
      Instrumental convergence
      Continued exponential progress
      Without the (higher) exponential energy demands we’re currently seeing
      An ability to adapt to rapidly changing circumstances that’s entirely absent from modern deep learning algorithms
      That the ceiling of AI will be sufficiently higher than that of humans to manipulate us on an individual or societal level without anyone noticing and sounding the alarm (or so good at manipulating us that even with the alarm sounding we do its bidding)
      That it gets to this level fast enough/stealthily enough that no-one shuts it down
      And to specifically believe that AI extinction is the most important thing to work on requires further assumptions, like
      nothing else (including AI) is likely to do serious civilisational damage before AI wipes us out
      or that civilisational collapse is sufficiently easy to recover from that it’s basically irrelevant to long term expectation
      it will be morally bad if AI replaces us
      that the thesis of the OP is false, and that survival work is either the same as or higher EV than flourishing work
      that there’s anything we can actually do to prevent our destruction, assuming all the above propositions are true
      Personally I weakly believe most of these propositions, but even if I weakly believed all of them, that would still leave me with extremely low total concern for Yudkowskian scenarios.
      Obviously there are weaker versions of the AI thesis like ‘AI could cause immense harm, perhaps by accident and perhaps by human intent, and so is an important problem to work on’ which it’s a lot more reasonable to believe.
      But when you assert that ‘The more people think seriously about this question, the more pessimistic they are’, it sounds like you mean they become something like Yudkowskyesque doomers—and I think that’s basically false outside certain epistemic bubbles.
      Inasmuch as it’s true that people who go into the field both tend to be the most pessimistic and don’t tend to exit the field in large numbers after becoming insufficiently pessimistic, I’ll bet you for any specific version of that claim you want to make, something extremely similar is true of biorisk, climate change, s-risks in general, and longterm animal welfare in particular. I’d bet at slightly longer odds the same is true of national defence, global health, pronatalism, antinatalism, nuclear disarmament, conservation, gerontology and many other high-stakes areas.
      I think that fact is reflected in the correlation of p(doom) with time-on-task only on alignment specifically
      I think most people who’ve put much thought into it agree that the highest probability of human extinction by the end of the century comes from misaligned AI. But that’s not sufficient to justify a strong p(doom) position, let alone a ‘most important cause’ position. I also think it comes from a largely unargued-for (and IMO clearly false) assumption that we’d lose virtually no longterm expected value from civilisational collapse.
- Ben_West🔸 5 Aug 2025 5:04 UTC
  9 points
  0 ∶ 0
  Parent
  I think his use of “ceiling” is maybe somewhat confusing: he’s not saying that survival is near 100% (in the article he uses 80% as his example, and my sense is that this is near his actual belief). I interpret him to just mean that we are notably higher on the vertical axis than the horizontal one:
  - William_MacAskill 6 Aug 2025 9:57 UTC
    2 points
    0 ∶ 1
    Parent
    Man, was that unclear?
    
    Sorry for sucking at basic communication, lol.
  - Seth Herd 5 Aug 2025 21:48 UTC
    0 points
    0 ∶ 1
    Parent
    I see! Thanks for the clarification. It’s a fascinating argument if I’m understanding it correctly now: it could be worth substantially increasing our risk of extinction if we more substantially increased our odds of capturing more of the potential value in our light cone.
    
    I’m not a dedicated utilitarian, so I typically tend to value futures with some human flourishing and little suffering vastly higher than futures with no sentient beings. But I am actually convinced that we should tilt a little toward futures with more flourishing.
    
    Aligning AGI seems like the crux for both survival and flourishing (and aligning society, in the likely case that “aligned” AGI is intent-aligned to take orders from individuals). But there will be small changes in strategy that emphasize flourishing vs mere survival futures, and I’ll lean toward those based on this discussion, because outside of myself and my loved ones, my preferences become largely utilitarian.
    
    It should also be born in mind that creating misaligned AGI runs a pretty big risk of wiping out not just us but any other sentient species in the lightcone.
    - William_MacAskill 6 Aug 2025 10:13 UTC
      4 points
      0 ∶ 0
      Parent
      Thanks—sorry my initial post was unclear.
      
      ”I’m not a dedicated utilitarian, so I typically tend to value futures with some human flourishing and little suffering vastly higher than futures with no sentient beings. But I am actually convinced that we should tilt a little toward futures with more flourishing.”
      
      See the next essay, “no easy eutopia” for more on this!
- David Mathers🔸 5 Aug 2025 8:34 UTC
  7 points
  1 ∶ 0
  Parent
  “This effect would be exaggerated by the correlation between rationalist culture and alignment thinking”
  
  Being part of rationalist culture is a sign that someone highly values rationality, yes. But it’s also a sign that the belong to a relatively small group, with a strong sense of superiority to normies, some experiments in communal living, and a view of those outside the group as often morally and intellectually corrupt (“low decouplers” “not truth-seeking” etc.) Groups like that are not usually known for dispassionately and objectively looking at the evidence on beliefs that are central to group identity, and belief that AI risk is high seems fairly central to rationalist identity to me. It certainly could be (I mean this non-sarcastically) that rationalist culture is an exception to the general rule because it places such a high value on updating on new evidence and changing your mind, but I don’t think we can be confident that rationalists are more likely to evaluate information on AI risk fairly than other people of comparable intelligence. Though I agree they are certainly better informed on AI risk than Good Judgment superforecasters, and as a GJ superforecaster, my views on AI risk have trended towards those of the rationalists recently (though still far away from >50% p|doom).
  
  I agree optimists have other biases too. Including most simply status quo bias, which is, frankly, generally not a “bias” at all for most of the things GJ people forecast, so much as a fallible but useful heuristic, but which is probably not a great idea to apply to a genuinely revolutionary new technology.
  - Seth Herd 5 Aug 2025 21:38 UTC
    1 point
    0 ∶ 0
    Parent
    Agreed on all counts, except that a strong value on rationality seems very likely to be an advantage in on-average reaching more-correct beliefs. Feeling good about changing one’s mind instead of bad is going to lead to more belief changes, and those tend to lead toward truth.
    
    Good points on the rationalist community being a bit insular. I don’t think about that much myself because I’ve never been involved with the bay area rationalist community, just LessWrong.
- [ ]
  [deleted]

Seth Herd comments on Should we aim for flourishing over mere survival? The Better Futures series.

Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions

How much weight should we give to these aggregates?