Against most, but not all, AI risk analogies
I personally dislike most AI risk analogies that I’ve seen people use. While I think analogies can be helpful for explaining concepts to people and illustrating mental pictures, I think they are frequently misused, and often harmful. At the root of the problem is that analogies are consistently mistaken for, and often deliberately intended as arguments for particular AI risk positions. And a large fraction of the time[1] when analogies are used this way, I think they are misleading and imprecise, routinely conveying the false impression of a specific, credible model of AI, even when no such credible model exists.
Here is a random list of examples of analogies that I found in the context of AI risk (note that I’m not saying these are bad in every context):
Stuart Russell: “It’s not exactly like inviting a superior alien species to come and be our slaves forever, but it’s sort of like that.”
Rob Wiblin: “It’s a little bit like trying to understand how octopuses are going to think or how they’ll behave — except that octopuses don’t exist yet, and all we get to do is study their ancestors, the sea snail, and then we have to figure out from that what’s it like to be an octopus.”
Eliezer Yudkowsky: “The character this AI plays is not the AI. The AI is an unseen actress who, for now, is playing this character. This potentially backfires if the AI gets smarter.”
Nate Soares: “My guess for how AI progress goes is that at some point, some team gets an AI that starts generalizing sufficiently well, sufficiently far outside of its training distribution, that it can gain mastery of fields like physics, bioengineering, and psychology [...] And in the same stroke that its capabilities leap forward, its alignment properties are revealed to be shallow, and to fail to generalize. The central analogy here is that optimizing apes for inclusive genetic fitness (IGF) doesn’t make the resulting humans optimize mentally for IGF.”
Norbert Wiener: “when a machine constructed by us is capable of operating on its incoming data at a pace which we cannot keep, we may not know, until too late, when to turn it off. We all know the fable of the sorcerer’s apprentice...”
Geoffry Hinton: “It’s like nuclear weapons. If there’s a nuclear war, we all lose. And it’s the same with these things taking over.”
Joe Carlsmith: “I think a better analogy for AI is something like an engineered virus, where, if it gets out, it gets harder and harder to contain, and it’s a bigger and bigger problem.”
Ajeya Cotra: “Corporations might be a better analogy in some sense than the economy as a whole: they’re made of these human parts, but end up pretty often pursuing things that aren’t actually something like an uncomplicated average of the goals and desires of the humans that make up this machine, which is the Coca-Cola Corporation or something.”
Ezra Klein: “As my colleague Ross Douthat wrote, this is an act of summoning. The coders casting these spells have no idea what will stumble through the portal.”
SKLUUG: “AI risk is like Terminator! AI might get real smart, and decide to kill us all! We need to do something about it!”
These analogies cover a wide scope, and many of them can indeed sometimes be useful in conveying meaningful information. My point is not that they are never useful, but rather that these analogies can be shallow and misleading. The analogies establish almost nothing of importance about the behavior and workings of real AIs, but nonetheless give the impression of a model for how we should think about AIs.
And notice how these analogies can give an impression of a coherent AI model even when the speaker is not directly asserting it to be a model. Regardless of the speaker’s intentions, I think the actual effect is frequently to plant a detailed-yet-false picture in the audience’s mind, giving rise to specious ideas about how real AIs will operate in the future. Because the similarities are so shallow, reasoning from these analogies will tend to be unreliable.
A central issue here is that these analogies are frequently chosen selectively — picked on the basis of evoking a particular favored image, rather on the basis of identifying the most natural point of comparison possible. Consider this example from Ajeya Cotra,
Rob Wiblin: I wanted to talk for a minute about different analogies and different mental pictures that people use in order to reason about all of these issues. [...] Are there any other mental models or analogies that you think are worth highlighting?
Ajeya Cotra: Another analogy that actually a podcast that I listen to made — it’s an art podcast, so did an episode on AI as AI art started to really take off — was that it’s like you’re raising a lion cub, or you have these people who raise baby chimpanzees, and you’re trying to steer it in the right directions. And maybe it’s very cute and charming, but fundamentally it’s alien from you. It doesn’t necessarily matter how well you’ve tried to raise it or guide it — it could just tear off your face when it’s an adult.
Is there any reason why Cotra chose “chimpanzee” as the point of comparison when “golden retriever” would have been equally valid? It’s hard to know, but plausibly, she didn’t choose golden retriever because that would have undermined her general thesis.
I agree that if her goal was to convey the logical possibility of misalignment, then the analogy to chimpanzees works well. But if her goal was to convey the plausibility of misalignment, or anything like a “mental model” of how we should think of AI, I see no strong reason to prefer the chimpanzee analogy over the golden retriever analogy. The mere fact that one analogy evokes a negative image and the other evokes a positive image seems, by itself, no basis for any preference in their usage.
Or consider the analogy to human evolution. If you are trying to convey the logical possibility of inner misalignment, the analogy to human evolution makes sense. But if you are attempting to convey the plausibility of inner misalignment, or a mental model of inner misalignment, why not choose instead to analogize the situation to within-lifetime learning among humans? Indeed, as Quintin Pope has explained, the evolution analogy seems to have some big flaws:
“human behavior in the ancestral environment” versus “human behavior in the modern environment” isn’t a valid example of behavioral differences between training and deployment environments. Humans weren’t “trained” in the ancestral environment, then “deployed” in the modern environment. Instead, humans are continuously “trained” throughout our lifetimes (via reward signals and sensory predictive error signals). Humans in the ancestral and modern environments are different “training runs”.
As a result, human evolution is not an example of:
We trained the system in environment A. Then, the trained system processed a different distribution of inputs from environment B, and now the system behaves differently.
It’s an example of:
We trained a system in environment A. Then, we trained a fresh version of the same system on a different distribution of inputs from environment B, and now the two different systems behave differently.
Many proponents of AI risk already seem quite happy to critique plenty of AI analogies, such as the anthropomorphic analogy, or “it’s like a toaster” and “it’s like Google Maps”. And of course, in these cases, we can easily identify the flaws:
Ajeya Cotra: I think the real disanalogy between Google Maps and all of this stuff and AI systems is that we are not producing these AI systems in the same way that we produced Google Maps: by some human sitting down, thinking about what it should look like, and then writing code that determines what it should look like.
To be clear, I agree Google Maps is a bad analogy, and it should rightly be criticized. But is the chimp analogy really so much better? Shouldn’t we be applying the same degree of rigor against our own analogies too?
My point is not simply “use a different analogy”. My point is that we should largely stop relying on analogies in the first place. Use detailed object-level arguments instead!
ETA: To clarify, I’m not against using analogies in every case. I’m mostly just wary of having our arguments depend on analogies, rather than detailed models. See this footnote for more information about how I view the proper use of analogies.[1]
While the purpose of analogies is to provide knowledge in place of ignorance — to explain an insight or a concept — I believe many AI risk analogies primarily misinform or confuse people rather than enlighten them; they can insert unnecessary false assumptions in place of real understanding. The basic concept they are intended to convey may be valuable to understand, but riding along with that concept is a giant heap of additional speculation.
Part of this is that I don’t share other people’s picture about what AIs will actually look like in the future. This is only a small part of my argument, because my main point is that that we should rely much less on arguments by analogy, rather than switch to different analogies that convey different pictures. But this difference in how I view the future still plays a significant role in my frustration at the usage of AI risk analogies.
Maybe you think, for example, that the alien and animal analogies are great for reasons that I’m totally missing. But it’s still hard for me to see that. At least, let me compare my picture, and maybe you can see where I’m coming from.
Again: The next section is not an argument. It is a deliberately evocative picture, to help compare my expectations of the future against the analogies I cited above. My main point in this post is that we should move away from a dependence on analogies, but if you need a “picture” of what I expect from AI, to compare it to your own, here is mine.
The default picture, as I see it — the thing that seems to me like a straightforward extrapolation of current trends in 2024 into the medium-term future, as AIs match and begin to slightly exceed human intelligence — looks nothing like the caricatures evoked by most of the standard analogies. In contrast to the AIs-will-be-alien model, I expect AIs will be born directly into our society, deliberately shaped by us, for the purpose of filling largely human-shaped holes in our world. They will be socially integrated with us and will likely substantially share our concepts about the social and physical world, having been trained on our data and being fluent in our languages. They will be numerous and everywhere, interacting with us constantly, assisting us, working with us, and even providing friendship to hundreds of millions of people. AIs will be evaluated, inspected, and selected by us, and their behavior will be determined directly by our engineering.
I feel this picture is a relatively simple extension of existing trends, with LLMs already being trained to be kind and helpful to us, and collaborate with us, having first been shaped by our combined cultural output. I expect this trend of assimilation into our society will intensify in the foreseeable future, as there will be consumer demand for AIs that people can trust and want to interact with. Progress will likely be incremental rather than appearing suddenly with the arrival of a super-powerful agent. And perhaps most importantly, I expect oversight and regulation will increase dramatically over time as AIs begin having large-scale impacts.
It is not my intention to paint a picture of uniform optimism here. There are still plenty of things that can go wrong in the scenario I have presented. And much of it is underspecified because I simply do not know what the future will bring. But at the very least, perhaps you can now sympathize with my feeling that most existing AI risk analogies are deeply frustrating, given my perspective.
Again, I am not claiming analogies have no place in AI risk discussions. I’ve certainly used them a number of times myself. But I think they can, and frequently are used carelessly, and seem to regularly slip various incorrect illustrations of how future AIs will behave into people’s mental models, even without any intent from the person making the analogy. In my opinion, it would be a lot better if, overall, we reduced our dependence on AI risk analogies, and in their place substituted them with specific object-level points.
- ^
To be clear, I’m not against all analogies. I think that analogies can be good if they are used well in context. More specifically, analogies generally serve one of three purposes:
1. Explaining a novel concept to someone
2. Illustrating, or evoking a picture of a thing in someone’s head
3. An example in a reference class, to establish a base rate, or otherwise form the basis of a modelI think that in cases (1) and (2), analogies are generally bad as arguments, even if they might be good for explaining something. They’re certainly not bad if you’re merely trying to tell a story, or convey how you feel about a problem, or convey how you personally view a particular thing in your own head.
In case (3), I think analogies are generally weak arguments, until they are made more rigorous. Moreover, when the analogy is used selectively, it is generally misleading. The rigorous way of setting up this type of argument is to deliberately try to search for all relevant examples in the reference class, without discriminating in favor of ones that merely evoke your preferred image, to determine the base rate.
- 13 Jan 2024 6:21 UTC; 27 points) 's comment on Matthew_Barnett’s Quick takes by (
You seem to be saying that there is some alternative that establishes something about “real AIs,” but then you admit these real AIs don’t exist yet, and you’re discussing “expectations of the future” by proxy. I’d like to push back, and say that I think you’re not really proposing an alternative, or that to the extent you are, you’re not actually defending that alternative clearly.
I agree that arguing by analogy to discuss current LLM behavior is less useful than having a working theory of interpretability and LLM cognition—though we don’t have any such theory, as far as I can tell—but I have an even harder time understanding what you’re proposing is a superior way of discussing a future situation that isn’t amenable to that type of theoretical analysis, because we are trying to figure out where we do and do not share intuitions, and which models are or are not appropriate for describing the future technology. And I’m not seeing a gears level model proposed, and I’m not seeing concrete predictions.
Yes, arguing by analogy can certainly be slippery and confusing, and I think it would benefit from grounding in concrete predictions. And use of any specific base rates is deeply contentious, since reference classes are always debateable. But at least it’s clear what the argument is, since it’s an analogy. In opposition to that, arguing by direct appeal to your intuitions, where you claim your views are a “straightforward extrapolation of current trends” is being done without reference to your reasoning process. And that reasoning process, because it doesn’t have a explicit gears level model, is based on informal human reasoning and therefore, as Lakens argues, deeply rooted in metaphor anyways, seems worse—it’s reasoning by analogy with extra steps.
For example, what does “straightforward” convey, when you say “straightforward extrapolation”? Well, the intuition the words build on is that moving straight, as opposed to extrapolating exponentially or discontinuously, is better or simpler. Is that mode of prediction easier to justify than reasoning via analogies to other types of minds? I don’t know, but it’s not obvious, and dismissing one as analogy but seeing the other as “straightforward” seems confused.
Even if there are risks to using analogies with persuasion, we need analogies in order to persuade people. While a lot of people here are strong abstract thinkers, this is really rare. Most people need something more concrete to latch onto. Uniform disarmament here is a losing strategy; and not justified here as I don’t think the analogies are as weak as you think. If you tell me what you consider to be the two weakest analogies above, I’m sure I’d be pretty to steelman at least one of them.
If we want to improve epistemics, a better strategy would probably be to always try to pair analogies (at least for longer texts/within reason). So identify an analogy to describe how you think about AI, identify an alternate plausible analogy for how you should think about it and then explain why your analogy is better/whereabouts you believe AI lies between the two.
Of course! Has there ever been a single person in the entire world who has embraced all analogies instead of useful and relevant analogies?
Maybe you’re claiming that AI risk proponents reject analogies in general when someone is using an analogy that supports the opposite conclusion, but accepting the validity of analogies when it supports their conclusion. If this were the case, it would be bad, but I don’t actually think this is what is happening. My guess would be that you’ve seen situations where someone has used an analogy to critique AI safety and then the AI safety person said something along the lines, “Analogies are often misleading” and you took this as a rejection of analogies in general as opposed to a reminder to check whether the analogy actually applies.
Then perhaps you can reply to the examples I used in the post when arguing that analogies are often used selectively? I named two examples: (1) a preference for an analogy to chimps rather than to golden retrievers when arguing about AI alignment, and (2) a preference for an analogy to human evolution rather than an analogy to within-lifetime learning when arguing about inner misalignment.
I do think that a major element of my thesis is that many analogies appear to be chosen selectively. While I advocate that we should not merely switch analogies, I think if we are going to use analogies-as-arguments anyway, then we should try to find ones that are the most plausible, and natural. And I don’t currently see much reason to prefer to chimp and evolution analogies over their alternatives in that case.
I actually thought that the discussion of the chimp analogy was handled pretty well in the podcast. Ajeya brought up that example and then Rob explicitly brought up an alternate mental model of it being a tool (like Google Maps). Discussing multiple possible mental models is exactly want you want to be doing to guard against biases. I agree that it would have be nice to discuss an analogy more like a golden retriever or kid as well, but there’s always additional issues that could be discussed.
I agree Ajeya didn’t really provide her reasons for seeing the chimp analogy as useful there, but I think it’s valuable as a way of highlighting the AI equivalent of the nature vs. nurture debate. Many people talk about AI’s using the analogy of children and they assume that we can produce moral AI’s by just treating them well/copying good human parenting strategies. I think the chimp analogy is useful as a way of highlighting that appearance can be decieving.
The tool analogy appeared to have been brought up as a way of strawmanning/weakmanning people who disagree with them. I think the analogy to Google Maps is not actually representative of how most intelligent AI optimists reason about AI as of 2023 (even if Holden Karnofsky used it in 2012, before the deep learning revolution). The full quote was,
As I said in the post, I think the chimp analogy can be good for conveying the logical possibility of misalignment. Indeed, appearances can be deceiving. I don’t see any particularly strong reasons to think appearances actually are deceiving here. What evidence is there that AIs won’t actually just be aligned by default given good “parenting strategies” i.e. reasonably good training regimes? (And again, I’m not saying AIs will necessarily be aligned by default. I just think this question is uncertain, and I don’t think the chimp analogy is actually useful as a mental model of the situation here.)
There are lots of people who think about AI as a tool.
A lot of people think about AI in all sorts of inaccurate ways, including those who argue for AI pessimism. “AI is like Google Maps” is not at all how most intelligent AI optimists such as Nora Belrose, Quintin Pope, Robin Hanson, and so on, think about AI in 2024. It’s a weakman, in a pretty basic sense.
I think that neither of those are selective uses of analogies. They do point to similarities between things we have access to and future ASI that you might not think are valid similarities, but that is one thing that makes analogies useful—they can make locating disagreements in people’s models very fast, since they’re structurally meant to transmit information in a highly compressed fashion.
Interesting post! I think analogies are good for public communication but not for understanding things at a deep level. They’re like a good way to quickly template something you haven’t thought about at all with something you are familiar with. I think effective mass communication is quite important and we shouldn’t let the perfect be the enemy of the good.
I wouldn’t consider my Terminator comparison an analogy in the sense of the other items on this list. Most of the other items have the character of “why might AI go rogue?” and then they describe something other than AI that is hard to understand or goes rogue in some sense and assert that AI is like that. But Terminator is just literally about an AI going rogue. It’s not so much an analogy as a literal portrayal of the concern. My point wasn’t so much that you should proactively tell people that AI risk is like Terminator, but that people are just going to notice this on their own (because it’s incredibly obvious), and contradicting them makes no sense.