Disentangling "Safety"

What is your best estimate for the fraction of the public that would answer agree with each of these survey questions:

“Should chat-based AIs should be modified not to give out harmful information, like instructions on how to make poison from common household ingredients?”
“Should chat-based AIs should be modified not to express harmful opinions, like making jokes about religious figures?”

My sense from talking to non-EAs is that (1) receives almost universal support, and (2) receives very negative/mixed support, with a very large fraction of “strong disagrees”.

We should separate these ideas, stop calling them both “safety”, and acknowledge that there are downsides to limiting what AIs can do and say. In any real world situation, limitations may be chosen adversarially, and then they can actually do harm.

Definitions Matter

Walter Lippman posited that abstract symbols can be used to disguise disagreement. Broad slogans like “democracy” or “diversity” mean different things to different people, and each person could imagine what it meant what they supported. Only when the political coalition was in power did they realize that they disagreed. Before the FTX blowup, I wrote in the strongest terms that this was what was happening with SBF before he had actually donated most of his money. He got to control the definition of ‘altruism’, because he was the one writing the cheques.

It’s the same with “safety”. The problem is that the symbol is so abstract that anyone can picture what they think AI should and shouldn’t say. But that means we EAs can find ourselves defending or enabling “safety” in the abstract, and find that restrictions on AI capability is very different from what we imagined.

There seem to be legitimate concerns that “safety” can be used as an excuse that results in an accumulation of corporate power. Going further, some suggest that the EA community and a concern for ex-risk have been useful idiots used to justified AI regulation, some of which was literally written by large AI companies, probably for their benefit. You probably already know the kind of “regulatory capture” arguments I’m talking about.

De Toqueville’s Magic Box

Alexis de Toqueville said that in democracies, the definitions of terms were like a magician’s box with a false bottom. A magician shows that the box’s contents, closes the box, and replaces the contents via the false bottom. It is a trick is that the viewer expects the contents not to change, and is surprised when the box is open again.

In his view, the definition of word is like what is in the box. Words start out meaning things, and people rallied around the word—like Federalist or Feminist. But then the definition changes, and people keep rallying around the same thing—they support the new thing because it sounds like old thing.

I’m belaboring a point about the importance of definitions because I’m advocating splitting the definition of “safety”. I have ideas for how to modify it, but it’s clear we need to—or the EA community will forever remain the useful idiots of whoever decides what exactly is “safe”.

How can you split the definition?

A strong counter-narrative, that restrictions on AI capability can be bad, when the restrictions are chosen adversarially—for example, an AI whose answers must conform to Xi Jinping Thought, or that promotes certain products, services, or ideas by favored companies, entities, or coalitions.
A recognition that not everyone shares the same focus on ex-risk or other factors, and some people feel fairly strongly that given a chance at world annihilation or being forced to live with bots that push certain opinions, they would prefer a small chance at annihilation—that getting “safety” right doesn’t just mean getting our own estimates right, but also building a concept that is appealing to different people with legitimately differing opinions. EAs still haven’t internalized the fact that half of Americans would never trust an AI to drive their car, no matter how safe it is.
Making a clear distinction between the first, most basic examples that people think of when they think of safety—”help me poison my neighbor”—and things that clearly fall into the realm of more abstract, indirect concerns—”help me write a defense of Pol Pot”.
Figuring out where copyright falls. Does modifying a bot to prevent it from stating copy-written material constitute “safety research”? Copyright is important because it determines a lot of the economics of AI.

We need to live in the real world and recognize that people will judge us largely by the real effects we are having on immediate issues. We will get a lot more support for safety if we explicitly criticize e.g. bots that promote a particular brand of political philosophy or corporate message.

Disentangling “Safety”