AI Safety Has a Very Particular Worldview

This is a Draft Amnesty Week draft. It may not be polished, up to my usual standards, fully thought through, or fully fact-checked.

Epistemic status: This post is based on vibes from interacting with people in AI safety over several years. I’m trying to paint a broad brush on a large group of people, so obviously the things I say here won’t map perfectly onto everyone in the field.

Summary

Much of AI safety in EA is shaped by a very particular worldview. This worldview then shapes conversations, communities, funding, incentives, emotions, and more. There also tends to be a positive feedback loop, where this worldview gets reinforced. A quick browse around the EA Forum has led me to realize that apparently some have never even heard of arguments outside of this narrow worldview. I find this concerning.

What’s this worldview about?

What even is a worldview? It is the set of beliefs about fundamental aspects of Reality that ground and influence all one’s perceiving, thinking, knowing, and doing (source). For example, a social justice worldview primarily sees the world as dynamics between the oppressed and the oppressor. In any worldview, there is a central organizing theme on what the world is “about” — everything else is just commentary.

EAs in AI safety also often have a particular worldview. This worldview is characterized by the belief that it is very likely that in the near-term (perhaps 2027, or in the next five years, or the next decade, or whatever), we will achieve this thing called artificial general intelligence (AGI) which may continue to scale to artificial superintelligence (ASI) soon after. It’s what those in the ‘normalist view’ refer to as the ‘superintelligence worldview’. It’s about “feeling the AGI” or being “AGI-pilled”.[1]

In this worldview, an AI takeover is so salient that it’s no longer about whether it will happen, but how it will happen. Is it FOOM? Or gradual? Pick your favourite ‘threat model’.

This also sometimes has an unfortunate implication that those who have this worldview think about others who don’t share their worldview as being one of the following:

  • Have not yet engaged seriously with the arguments

  • Have really bad epistemics and are constantly having motivated reasoning

  • Are actually evil

  • Are actually way too freaked out but just trying to ‘cope’

AI-centrism

A big part of this worldview is about having an AI-centric view. Just like how someone with a social justice worldview sees the world mostly through the lens of the oppressor versus the oppressed, the AI safety in EA worldview sees the world mostly through the lens of AGI versus humans.

The implicit first step is to place the AGI in the centre of everything. Then, reasonable claims like the orthogonality thesis and the instrumental convergence thesis are used. Once this is established, the rest of the arguments to existential risk would come naturally, and it naturally leads to the conclusion that we’re all going to die from a power-seeking AI.

The problem is not that these theses are wrong — in fact, they’re perfectly logical in theory. Rather, they are just assumed to also be true in practice. For instance, the instrumental convergence thesis, which relies on having the agent behave like strong optimizers, is implicitly assumed to be true even though AIs today behave nothing like strong optimizers.

Risks, capabilities, and model evals

Everyone is worried about AI risks. But what’s unique about those who have this worldview is that they are worried about AGI (which could eventually become ASI), which, if misaligned, could literally destroy the world.

How do we know if we’re getting close to AGI? Model capabilities. How do we measure model capabilities? Model evals. How do we know if they will actually take over the world? Model propensities. How do we what propensities these models have? Again, model evals.

So in this worldview, results from model evals count as strong evidence for risks. Model evals tell us about a lot about model capabilities and propensities, which in turn tell us a lot about our chances of surviving an AGI existential catastrophe. Numbers go up = scary.

While they recognize that model evals are not perfect, they rarely challenge the underlying assumption that model evals translate well into capabilities which then translate well into risks. To them, it’s unthinkable to not be worried by charts that show models performing better and better on all sorts of benchmarks. They do not seriously consider that the argument they find meaningful or convincing only makes sense from within the worldview, not outside of it.

On thinking concretely

Another part of this worldview seems to be about reasoning from high-level abstract concepts without being concrete.

Or rather, when those in this worldview talk about being concrete, what they mean is concrete stories on how we achieve AGI, not concrete pathways of how AGI would lead to actual physical harm. To them, anything that happens downstream of the AI is largely irrelevant, because if we all die anyway, it doesn’t matter how exactly we die.

Just as how evidence that is meaningful to those inside this worldview tend to be rather meaningless to those outside of this worldview, what is concrete to those within this worldview often tend to be not at all concrete to those outside of the worldview.

On forecasting

Given all of the above, those who adopt this worldview tend to be very interested in forecasting, but in a very specific way. It often involves forecasting when AGI will arrive (based on extrapolation of certain data points) and how it would unfold.

There is a tendency where conversations around theory of change of AI safety research agenda starts with a question like “what’s your threat model”, where the answer would generally be some variation of “AIs will be very capable and misaligned and will take over the world”. Those within the worldview might disagree with specifics of the answer, but those outside the worldview might reject the question itself.[2]

How did this happen?

So how did so much of those in AI safety within EA end up having this peculiar worldview?

I think it largely boils down to selection effects. The AI safety field naturally selects for people who adopt this worldview because it makes the problem of AI risk feel more visceral to them. Those who feel a strong sense of urgency would naturally be motivated to engage and contribute to the field.

Then the positive feedback loop starts. These people become more senior and start shaping the field. Research agenda is set around these worldviews. Grants are given to those who do research in topics that fit this worldview. More people join, and newcomers start adopting this worldview.

And the process repeats.

So what?

Even if some people in EA have this worldview, does it matter?

I think it does. I find it quite concerning that we constantly spread the meme that AGI is coming soon with arguments that only make sense to those within this particular worldview. It also concerns me that we even seem to be prioritizing the transition to AGI so much that it is becoming the mission of EA.

So there are probably a few things we could do.

First, let’s stop thinking about AGI as this thing that is extremely qualitatively different from whatever we have right now. It’s not going to be a binary and straightforward “I know it when I see it” thing. If you had gone back in time and showed today’s ChatGPT to someone ten years ago, there’s a good chance they’d be convinced that ChatGPT is already AGI — they probably wouldn’t think “ChatGPT is definitely not AGI but something else will surely be AGI”. We already have AIs that are dumb in some ways, human-level in some ways, and superhuman in many ways. Yet our lives remain largely the way it has been, changing over time as they always do.

Second, let’s at least recognize that the world is a complex place with 8 billion humans and all sorts of systems that underlie human civilization. The world is not just a bunch of humans sitting around, where a wild AGI would suddenly appear and take over the world.

Third, let’s consider that the problem is not about us not sufficiently updating our priors with new evidence. Rather, we are updating our priors with evidence that provides very little information, built on an assumptions-laden worldview as our foundation. Consider that model evals tell us very little about capabilities and propensities, and capabilities and propensities tell us very little about actual risks.

Lastly, it goes without saying, but for many EAs in AI safety, it’s probably worth having a richer life outside of EA. There’s a big world out there, and interacting with the rest of the world might help realize that sometimes we might be stuck in a very particular idiosyncratic worldview.

  1. ^

    To be clear, I’m not saying that an AGI can never exist in the way that people think about them. After all, humans are algorithms wrapped in organic matter that constitutes the very definition of (non-artificial) general intelligence. What I’m saying is that people are often making an assumption that a very scary “AGI” is very likely to come into existence very soon, and it will transform the world in unimaginable ways, including leading us towards an existential catastrophe.

  2. ^

    As another example, in the debate between Daniel Kokotajlo and Sayash Kapoor, Daniel asked Sayash about what his story is (i.e. how does he see the future playing out). Eventually, outside of the debate, Sayash (and Arvind Narayanan) wrote in their blog that “This kind of scenario forecasting is only a meaningful activity within their worldview. We are concrete about the things we think we can be concrete about.”