Talking publicly about AI risk

In the past year, I have started talking about AI risk publicly—in mainstream newspapers, public radio, national TV, some of the most popular podcasts. The twist, and reason why you probably haven’t noticed is I’m doing this in Czech. This has a large disadvantage—the positive impact is quite limited, compared to English. On the other hand, it also had a big advantage—the risk is very low, because it is very hard for memes and misunderstandings to escape the language bubble. Overall I think this is great for experiments with open public communication .

Following is an off-the-cuff list of notes and suggestions. In my view the debate in Czech media and Czech social networks is on average marginally more sensible and more informed than in English so far, so perhaps part of this was successful and could be useful for others.

Context: my views

For context, it’s probably good to briefly mention some of my overall views on AI risk, because they are notably different from some other views.

I do expect

1. Continuous takeoff, and overall a large amount of continuity of agency (note that continuous does not imply things move slowly)

2. Optimization and cognition distributed across many systems to be more powerful than any single system, making a takeover by a single system possible but unlikely

3. Multiagent interactions to matter

4. I also do expect the interactions between the memetics and governance and the so-called “technical problem” to be strong and important

As a result I also expect

5. There will be warning shots

6. There will be cyborg periods

7. World will get weird

8. Coordination mechanisms do matter

This perspective may be easier to communicate than e.g. sudden foom—although I don’t know.

In the following I’ll usually describe my approach, and illustrate it by actual quotes from published media interviews I’m sort of happy about (translated, unedited). Note that the specific ways how to say something or metaphors are rarely original.

Aim to explain, not to persuade

Overall I usually try to explain stuff and answer questions, rather than advocate for something. I’m optimistic about the ability of the relevant part of the public to actually understand a large part of the risk at a coarse-grained level, given enough attention.

So even though we invented these machines ourselves, we don’t understand them well enough?
We know what’s going on there at the micro level. We know how the systems learn. If one number in a series changes, we know how the next one changes. But there are tens of billions of such numbers. In the same way, we have some idea of how a neuron works, and we have maps of a network of thousands of neurons. But that doesn’t tell us that much about how human thinking works at the level of ideas.

Small versions of scaled problems

Often, I think the most useful thing to convey is a scaled-down, easier version of the scaled problem, such that thinking about the smaller version leads to correct intuitions about the scaled problem, or solutions to the scaled-down problem may generalise to the later, scaled problem.

This often requires some thought or finding a good metaphor.

Couldn’t we just shut down such a system?
We already have a lot of systems that we could hypothetically shut down, but if you actually tried to do so, it would be very difficult. For example, it is practically impossible to shut down the New York Stock Exchange, because there will always be enough people defending it. If the model manages to penetrate deeply enough into humanity’s activities, it is imaginable that humanity will actually lose control of it at some point.

Don’t focus on one scenario

Overall, I think it’s possible to explain the fact that in face of AI risk there isn’t one particular story we can identify and prevent, but the problem is that unaligned powerful systems can find many ways to threaten you.

How can we imagine the threat associated with the development of artificial intelligence? I expect not as shooting robots.
Not shooting robots, of course. I’d put it another way. One of the reasons humans have become dominant on the planet is their intelligence. Not only that, but it is also our ability to work together in large groups and share ideas.

However, if you look at the evolution of mankind from the perspective of a chimpanzee or a mammoth, at one point something started happening for them that they no longer understood. Today, people can tame the chimpanzee or kill it in a staggering number of ways that they no longer understand. The chimp doesn’t understand what happens when the tranquilliser injection hits.

And we could be in a similar position the moment we lose control of artificial intelligence.

…

I’m rather reluctant to describe one particular scenario. If a system is much more intelligent than I am, it can naturally develop ways to limit or threaten me that I can’t even imagine.

Some of the people who study the risks of AI think that someone is going to plug in a large system, it will somehow escape, and we will lose control of it very quickly. It could even happen that we’re all dead at once and we don’t know why.

Personally, I think it’s more likely that there will be some kind of continuous loss of control. Nothing dramatic will happen, but more likely we’ll get to a state where we don’t understand what’s going on with the world, and we won’t be able to influence it. Or we’ll get a sense of some unrealistic picture of the world in which we’re happy and we won’t complain, but we won’t decide anything.

But if I describe a very specific scenario to you, you would argue that something like that can be prevented in advance or easily solved. But that’s the problem I was describing a moment ago. We’d just be talking like two chimpanzees who also can’t even imagine most of the ways humans can threaten them.

The doom memeplex

Overall, I’m not excited about summoning the doom memeplex into public consciousness. (In a large contrast to Eliezer Yudkowsky, who seems to be investing heavily into this summon). Why?

Mostly, I don’t trust that the doom memeplex is all that helpful in solving the alignment problem. I broadly agree with Valentine that being in a state of mental pain and terror isn’t a great state to see the problem clearly enough.

Also similarly to AIs, memeplexes are much easier to summon than control.

Also similarly to some AIs, memeplexes have convergent instrumental goals and self-interests. Some worrisome implications are, among others:

- it’s not in the self-interest of the doom memeplex to recognize alignment solutions

- it is in the self-interest of he doom memeplex to reward high p(doom) beliefs

- the AGI doom memeplex has, to some extent, a symbiotic relationship with the race toward AGI memeplex

One specific implication is I find it much more productive and useful to focus on the fact that in AI risk scenarios we lose control over the future, rather than “your kids will die”.

A colleague of mine describes it with the metaphor that we can find ourselves in the role of a herd of cows, whose fate is being determined by the farmer. And I don’t think we want to get to that stage. While there are many interesting questions about what might happen to such a herd afterwards, I think they are distracting. We need to focus on not losing control of the AI

Relevant maps

Large part of the difference between how the general public understands AI and ML specialists understand AI is in what maps people rely on. The public often uses a map “like a human, but running on a computer” and has easy access to maps like “like a Google size corporation, but automated”. In contrast, many ML practitioners rely on maps like “my everyday experience with training ML models, which are small and sort of dumb”.

Good understanding of AI risk usually requires thinking about multiple maps, but basic understanding of the risk can be actually based on the maps accessible to the public.

The metaphor of maps is also useful in explaining why ML expertise is not enough, how it is possible that AI experts disagree, and how a layperson can orient toward who to trust.

For example, Yann LeCun, vice president of Meta and head of AI research at Facebook. Facebook has one of the worst reputations among the big players in terms of approach to safety. LeCun is a great expert in machine learning, but he also claims that the problem is much smaller than we think because, after all, as humanity we can tame various intelligent systems, such as raising children or regulating corporations. When the Vice President of Meta says this, it fills me with more dread than optimism. If I were to take the child-rearing metaphor seriously, it’s like believing we can raise alien children.
…
For me, VP of Meta drawing his confidence from humanity’s ability to align corporations is a reason to worry, not source of confidence.

The aim here is to explain that when e.g. Yann LeCun makes confident claims about AI risk, these are not mostly based on his understanding of machine learning, but often on analogies with different systems the reader knows and is able to evaluate independently.

The overall experience

I’m quite picky in who to talk to, refusing the majority of interview requests, but conditional on that, my experience so far was generally positive. In particular after the interest in AI exploded after the release of ChatGPT and GPT4, technical journalists became reasonably informed. There is no need to justify the plausibility of powerful AI systems anymore. Also the idea of AI risk is firmly in the Overton window, and privately, a decent fraction of the people I talked with admitted being concerned themselves.

Some of the resulting artefacts in machine translation:

What to do in English

I’d be pretty excited if the AI alignment community was able to generate more people able to represent various views of the field publicly, in a manner which makes both the public and policymakers more informed. I think this is particularly good fit for people with some publicly legible affiliations, such as in academic roles or tech companies not participating in the race directly.