(A clearer and more fleshed-out version of this argument is now a top-level post. Read that instead.)
I strongly dislike most AI risk analogies that I see EAs use. While I think analogies can be helpful for explaining a concept to people for the first time, I think they are frequently misused, and often harmful. The fundamental problem is that analogies are consistently mistaken for, and often deliberately intended as arguments for particular AI risk positions. And the majority of the time when analogies are used this way, I think they are misleading and imprecise, routinely conveying the false impression of a specific, credible model of AI, when in fact no such credible model exists.
Here are two particularly egregious examples of analogies I see a lot that I think are misleading in this way:
The analogy that AIs could be like aliens.
The analogy that AIs could treat us just like how humans treat animals.
I think these analogies are typically poor because, when evaluated carefully, they establish almost nothing of importance beyond the logical possibility of severe AI misalignment. Worse, they give the impression of a model for how we should think about AI behavior, even when the speaker is not directly asserting that this is how we should view AIs. In effect, almost automatically, the reader is given a detailed picture of what to expect from AIs, inserting specious ideas of how future AIs will operate into their mind.
While their purpose is to provide knowledge in place of ignorance, I think these analogies primarily misinform or confuse people rather than enlighten them; they give rise to unnecessary false assumptions in place of real understanding.
In reality, our situation with AI is disanalogous to aliens and animals in numerous important ways. In contrast to both aliens and animals, I expect AIs will be born directly into our society, deliberately shaped by us, for the purpose of filling largely human-shaped holes in our world. They will be socially integrated with us, having been trained on our data, and being fluent in our languages. They will interact with us, serving the role of assisting us, working with us, and even providing friendship. AIs will be evaluated, inspected, and selected by us, and their behavior will be determined directly by our engineering. We can see LLMs are already being trained to be kind and helpful to us, having first been shaped by our combined cultural output. If anything I expect this trend of AI assimilation into our society will intensify in the foreseeable future, as there will be consumer demand for AIs that people can trust and want to interact with.
This situation shares almost no relevant feature with our relationship to aliens and animals! These analogies are not merely slightly misleading: they are almost completely wrong.
Again, I am not claiming analogies have no place in AI risk discussions. I’ve certainly used them a number of times myself. But I think they can, and frequently are, used carelessly, and seem to regularly slip various incorrect illustrations of how future AIs will behave into people’s minds, even without any intent from the person making the analogy. It would be a lot better if, overall, as a community, we reduced our dependence on AI risk analogies, and in their place substituted them with detailed object-level arguments.
I am not claiming analogies have no place in AI risk discussions. I’ve certainly used them a number of times myself.
Yes you have!—including just two paragraphs earlier in that very comment, i.e. you are using the analogy “future AI is very much like today’s LLMs but better”. :)
Cf. what I called “left-column thinking” in the diagram here.
For all we know, future AIs could be trained in an entirely different way from LLMs, in which case the way that “LLMs are already being trained” would be pretty irrelevant in a discussion of AI risk. That’s actually my own guess, but obviously nobody knows for sure either way. :)
I read your first paragraph and was like “disagree”, but when I got to the examples, I was like “well of I agree here, but that’s only because those analogies are stupid”.
At least one analogy I’d defend is the Sorcerer’s Apprentice one. (Some have argued that the underlying model has aged poorly, but I think that’s a red herring since it’s not the analogy’s fault.) I think it does share important features with the classical x-risk model.
(A clearer and more fleshed-out version of this argument is now a top-level post. Read that instead.)
I strongly dislike most AI risk analogies that I see EAs use. While I think analogies can be helpful for explaining a concept to people for the first time, I think they are frequently misused, and often harmful. The fundamental problem is that analogies are consistently mistaken for, and often deliberately intended as arguments for particular AI risk positions. And the majority of the time when analogies are used this way, I think they are misleading and imprecise, routinely conveying the false impression of a specific, credible model of AI, when in fact no such credible model exists.
Here are two particularly egregious examples of analogies I see a lot that I think are misleading in this way:
The analogy that AIs could be like aliens.
The analogy that AIs could treat us just like how humans treat animals.
I think these analogies are typically poor because, when evaluated carefully, they establish almost nothing of importance beyond the logical possibility of severe AI misalignment. Worse, they give the impression of a model for how we should think about AI behavior, even when the speaker is not directly asserting that this is how we should view AIs. In effect, almost automatically, the reader is given a detailed picture of what to expect from AIs, inserting specious ideas of how future AIs will operate into their mind.
While their purpose is to provide knowledge in place of ignorance, I think these analogies primarily misinform or confuse people rather than enlighten them; they give rise to unnecessary false assumptions in place of real understanding.
In reality, our situation with AI is disanalogous to aliens and animals in numerous important ways. In contrast to both aliens and animals, I expect AIs will be born directly into our society, deliberately shaped by us, for the purpose of filling largely human-shaped holes in our world. They will be socially integrated with us, having been trained on our data, and being fluent in our languages. They will interact with us, serving the role of assisting us, working with us, and even providing friendship. AIs will be evaluated, inspected, and selected by us, and their behavior will be determined directly by our engineering. We can see LLMs are already being trained to be kind and helpful to us, having first been shaped by our combined cultural output. If anything I expect this trend of AI assimilation into our society will intensify in the foreseeable future, as there will be consumer demand for AIs that people can trust and want to interact with.
This situation shares almost no relevant feature with our relationship to aliens and animals! These analogies are not merely slightly misleading: they are almost completely wrong.
Again, I am not claiming analogies have no place in AI risk discussions. I’ve certainly used them a number of times myself. But I think they can, and frequently are, used carelessly, and seem to regularly slip various incorrect illustrations of how future AIs will behave into people’s minds, even without any intent from the person making the analogy. It would be a lot better if, overall, as a community, we reduced our dependence on AI risk analogies, and in their place substituted them with detailed object-level arguments.
Yes you have!—including just two paragraphs earlier in that very comment, i.e. you are using the analogy “future AI is very much like today’s LLMs but better”. :)
Cf. what I called “left-column thinking” in the diagram here.
For all we know, future AIs could be trained in an entirely different way from LLMs, in which case the way that “LLMs are already being trained” would be pretty irrelevant in a discussion of AI risk. That’s actually my own guess, but obviously nobody knows for sure either way. :)
I read your first paragraph and was like “disagree”, but when I got to the examples, I was like “well of I agree here, but that’s only because those analogies are stupid”.
At least one analogy I’d defend is the Sorcerer’s Apprentice one. (Some have argued that the underlying model has aged poorly, but I think that’s a red herring since it’s not the analogy’s fault.) I think it does share important features with the classical x-risk model.