Why are longtermists so much less focused on human extinction these days?

Tldr;

  • EAs seem more focused on non-extinction risks from AI than they used to be

  • I haven’t seen much discussion of why, making reference to the very longrun

  • Changes in the world this might be responding to are: alignment seeming more promising than it used to, the speed of AI progress, and concerns around US governance.

  • There are also plausible sociological reasons for the shift, so I feel unsure how far to update

What’s the change and why care about it?

My sense is that there’s been a significant shift in how much longtermists prioritise non-extinction risks over the last few years. A decade ago people who were trying to ensure the flourishing of the universe trillions of years from now were very often focused on avoiding events that would kill all humans. My sense is that now they’re much more often focused on situations which they don’t think pose a risk of human extinction, such as extreme power concentration.

This change has felt confusing to me for a couple of reasons:

  • There’s reasonably little written about why longtermists should change their prioritisation in this direction. The notable exception is this paper by Will MacAskill.

  • To my mind, there’s a clear reason for longtermists to focus on extinction risks:

    • It seems extremely hard for any of our actions now to predictably affect what the world looks like in trillions of years time. (Events which predictably affect the very longrun future are often called ‘lock-in’ events.)

    • Preventing humans from going extinct in the next decade continues to affect the future indefinitely—human extinction seems like a clear ‘lock-in’ event.

You might think that our actual actions won’t be affected much by whether the risks we should prioritise most are totalitarianism or human extinction. There are indeed many actions which are useful for both of these. Pausing AI development will give us more time to allow the world to prepare itself for all sorts of risks that transformative AI might bring. Frontier AI companies having good whistleblower provisions gives us a better hope of finding out about many of the largest of those risks before they come about.

But there are ways in which you could disproportionately affect extinction risks compared to other risks. One is working directly to mitigate the ways humans could all plausibly be killed, such as AI enabled biorisks. Another is to focus more on the risk of misaligned AI risk than the risk of AI misuse by humans. The reason is that AI may be inherently alien to humans and therefore likely to have goals which are far reaching and extremely different to ours, and therefore cause a future which doesn’t include us. “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else”. By comparison, humans tend to have goals for which it’s useful to have other humans surviving.

I wanted to get a sense of why there’s been a shift in favour of longtermists prioritising extreme power concentration amongst humans compared to AI takeover, in order to understand better what’s highest priority for me to focus on. Below are the reasons I came across when talking to people about what has caused this trend.

Differences in the world that motivate the change

Alignment is more promising than some people expected.

For many longtermists, by far the most likely cause of human extinction is a misaligned AI. The reason is that humanity is actually reasonably resilient to large shocks, even ones that kill the vast majority of the population. That means natural disasters are reasonably unlikely to wipe everyone out. Whereas a misaligned AI for whom humans were in the way might take a more intentional approach to wiping us out.

Ten years ago, it seemed plausible that AI wouldn’t be able to understand human goals at all, so the idea of AI pursuing totally alien goals seemed more likely. We now have empirical evidence of AI understanding humans reasonably well—being able to get a fairly brief and vague prompt and accurately determine what we’d like it to do and how. To the extent their goals diverge from ours, they’re often very intelligible do us (such as wanting to complete the task and therefore cheating on it). There are some structural reasons for that—using large language models extensively imports a lot of human concepts into models.

Even aside from alignment being less challenging than some expected, the fact that there are now more people working on it provides some reason for thinking that marginal work on it might be less useful than it was in the past.

AI progress has been continuous, and medium-fast

In a world where AI progressed from largely useless to superintelligent in the course of days, there’s little opportunity for humans to figure out how to control it or wield it to their ends. In a world where AI progress is very slow, we have time to adjust our governance mechanisms to the existence of ASI. But where progress is continuous and medium-fast, some humans have enough time to notice it would likely be useful to control super-intelligence and make plans to do so, without there being time for things like governments

US governance is in a worse state than we might have expected for the point at which we hit transformative AI.

The federal government seems unusually opposed to safety-focused AI legislation (making it more likely than it might have been that I company execs could amass unusual levels of power), and less respectful of separation of powers. It’s possible that the people running AI companies also seem more power-seeking /​ less safety conscious than we would have expected (though that seems less compelling to me).

How much of an update should we make?

The above all seem like solid reasons to shift some prioritisation from avoiding AI takeover to avoiding extreme human power concentration.

I’m still not sure if I buy that these risks are in the same ballpark, even if they’re now closer. My hesitation stems from a strong intuition that it’s extremely hard to predictably affect anything over the long-term. Our actions have a very strong tendency to ‘wash-out’ - often over mere years, but particularly after centuries, millennia and onwards.

There are reasons to think that AGI will make it less likely that actions wash out—for example, it will plausibly enable immortality (perhaps by uploading rather than biologically). Dictatorships have often historically been brought down by the natural death of the dictator, so the impossibility of that could really increase the chance of lock in over the very long run. (For more on how AGI might enable authoritarian lock in, see the MacAskill and Finnveden et al papers linked about.) On the other hand, technological breakthroughs have historically swung the balances of power and otherwise washed out previous actions, and transformative AI will precipitate a time of unusually rapid technological progress.

There are also various things which might have causally led to a shift in views without epistemically justifying them:

  • When OpenAI and DeepMind were set up, their comms about their aims were heavily focused on keeping the world safe. Things like the OpenAI board turning over has made salient how much some of the key players seem squarely focused on amassing power, and how scary that is. Being viscerally aware of that (even if the extent to which it’s the case is within the bounds of what you would have theoretically expected) makes it harder to look away from risks from human power concentration.

  • The US has become more divided and tribal, making it feel more urgent and important to work on preserving democracy

  • As AI progress is discussed more in the mainstream, there’s a larger coalition working on ensuring that the transition to superintelligent AI proceeds safely, many of whose values don’t extend to the extremely longrun future. Preventing extreme concentration of power is a unifying issue in the sense that lots of people find it more plausible than extinction, so this provides a (perhaps unconscious) incentive to emphasise it more.

The existence of causal reasons like those above make me wary of over updating in towards prioritising extreme power concentration. On the other hand, I think it’s plausible that I previously focused too hard on human extinction compared to other plausible lock-in events. A couple of the reasons I might have done that:

  • I have a strong intuition about it being very hard to affect things over the longrun. And there is plenty of evidence about people trying to cause lasting change and totally failing. But I think it might be a mistake to trust my intuitions about this question at all. We should expect the world to be unrecognisable after a transition to ASI, so why would we expect it it to be similarly easy/​hard to lock things in for the future?

  • It does seem right to me that there is a chance of us being able to lock particular values or governance structures for the extremely long run, and also a chance that preventing human extinction does not lock in a change in value of the universe (for example, if humans went extinct decades later it would wash out the effect of saving them this decade). I think it’s easy to round the former off to ‘basically not lock in’ and the latter to ‘basically lock in’ in a way I don’t endorse.

All this mostly leaves me still feeling pretty unclear about how to prioritise between risks that might and might not plausibly cause human extinction. I’m inclined to prioritise those that won’t somewhat higher than I have historically. I still feel nervous that I’m partly doing that for bad reasons. Those reasons are less the explicit causal ones I listed, and more deferring to the ‘general zeitgeist’ without fully understanding its reasoning. I’d be very keen to hear other people’s views on this question.

Thanks to Arden Koehler for making this post (and so many of the things I write) much better, and to various people for the ideas behind the post, including Alex Lawsen and Nick Beckstead.