Concerns about “super-intelligence” loom large. The worry is that artificial general intelligence (AGI), once developed, might quickly reach a point at which (1) it becomes aware of its own capabilities and (2) it deploys those capabilities to subjugate or eradicate humanity. This is often described as the problem of “power-seeking, misaligned AGI” (with “misalignment” referring to divergence between the goals of the humans who create AGI and the endogenous goals of AGI itself).
Experts who have expressed concern about misaligned AGI rate the odds of catastrophe differently. But a common premise unites their views: that AGI is likely to wield power in a manner adverse to human welfare (Carlsmith 2021). Here, I scrutinize that premise. Inspired by evolutionary biology and game theory, I explore other ways — apart from subjugation and eradication — that systems comprised of mutually-powerful agents (or groups of agents) tend to equilibrate. With these patterns in mind, I argue that AGI, even if misaligned, is quite unlikely to subjugate or eradicate humans. Rather, the strong likelihood is that misaligned AGI would do some combination of:
• Cooperating with us;
• Purposely avoiding us;
• Paying us no attention whatsoever.
Of course, it is possible — and I hardly mean to deny — that misaligned AGI could wield power in ways that, for humans, qualify as “catastrophic.” This could happen either intentionally or incidentally, depending on the goals AGI decided to pursue. The point is simply that “catastrophic” outcomes, though possible, are far less likely than alternate equilibria. Why? In short, because:
1. Historically, interactions among agents with misaligned goals — both across species lines and among human sub-groups — have frequently resulted in non-catastrophic equilibria; and
2. There is no reason to think the overall distribution of probabilities would differ in the specific case of humans and AGI.
In what follows, I bring these claims alive through a combination of real-life examples — drawn from the natural world — as well as hypotheticals meant to forecast (if not necessarily to predict) the imminent world of human-AGI interaction.
*
Claim one — interactions between mutually power-seeking agents with misaligned goals very frequently result in non-catastrophic equilibria
In practice, systems compromised of mutually-powerful agents with divergent goals tend, overwhelmingly, to equilibrate in one of three (non-catastrophic) ways:
• Mutualism
• Conflict Avoidance
• Indifference
This taxonomy is meant to be answer, of sorts, to Bostrom’s famous “parable of the sparrows,” which imagines a group of sparrows that endeavor to locate an owl chick and train it as their servant. Although Bostrom notes, cheekily, that it is “not known” how the parable ends, the implication is that things are unlikely to turn out in the sparrows’ favor (Bostrom 2014). And the same is certainly possible with regard to AGI and humans. The analogy Bostrom has in mind — sparrow : owl :: human : AGI — may well hold. But many other analogies exist. The question is one of relative likelihood.
Mutualism
The first form of non-catastrophic equilibrium is mutualism. Mutualist relationships produce net-benefits for all groups or species involved, often through intentional or incidental exchange; each group or species provides something of value to the other, leading to multilateral incentives for cooperation (Hale et al. 2020).
Often, mutualism occurs among groups or species that pose no threat to each other. For instance, we exist in a mutualist relationship with the bacteria that constitute our “gut flora.” The bacteria help regulate digestion, and we, in turn, provide them (luxury?) accommodation.
But mutualism can also occur among groups or species that do pose natural threats to each other — when the benefits of mutualism simply outweigh the risks of threat. Here is a good example: zoologists have observed that gelada monkeys allow Ethiopian wolves to roam freely in the vicinity of their young, though the latter are easy prey. Why? The reigning hypothesis is that wolves provide the baboons protection from other predators, and the baboons help the wolves locate rodents — an easier source of food. In light of this, the wolves have learned to leave the young baboons alone (Holmes 2015).
The power differential between wolves and monkeys is relatively small. But this is not a necessary feature of inter-species mutualism. It can also transpire in settings where one species is vastly more powerful than the other. Take, for example, bees and humans. Bees can be a source of nuisance (and even, in some contexts, a more serious threat), and we certainly have the capacity to eradicate bees if we thought it worth our time and energy. But we have no incentive to do so. In fact — now that we understand pollination — we have an active incentive to keep bees alive and flourishing, and even to protect them from other threats, purely as a matter of self-interest.
Given this, consider the following thought-experiment: a “parable of the bees” on similar footing with that of the sparrows. It’s 10,000 BC, and certain members of the bee race — the Bee Intelligentsia — are worried about the growing capabilities of homo sapiens. They begin (doing the equivalent of) writing papers that frame the concern as follows:
Once homo sapiens turn their sights to reshaping the external environment, they will be more powerful than we can imagine, and there is a non-negligible chance that they will pursue goals either directly or incidentally adverse to our welfare — maybe catastrophically so. Accordingly, we should begin expending significant amounts of collective energy future-proofing against subjugation or eradication by homo sapiens.
Would the Bee Intelligentsia be “wrong” to think this way? Not exactly, for the core submission is true; human activity does raise some risk of catastrophe. The error would lie in over-weighting the risk. (Which is easy to do, especially when the risk in question is terrifying.) But if the Bee Intelligentsia understood pollination — if it were intelligent, let alone super-intelligent — it would be able to appreciate the possibility that bees offer humans a benefit that is not trivial to replace. Indeed, it might even be able to predict (some version of) the present-day dynamic, namely, that far from undermining bee welfare, humans have an active incentive to enhance it.
The same may be true of humans and AGI — with humans in the “bee” position. Depending on its goals, AGI may well conclude that humans are worth keeping around, or even worth nurturing, for the mutualist benefits they deliver. In fact, AGI could simply conclude that humans may deliver mutualist benefits at some point in the future, and this, alone, might be enough to inspire non-predation — as in the wolf-baboon example — or cultivation — as in the human-bee example — purely as a means of maintaining optionality. One assumption built into the “super-intelligence” problem, after all, is that AGI will be capable of developing functionalist (and possibly causal) theories of their world — to an equal, if not vastly greater, extent than humans have. From this, it follows that AGI would likely have an enormous set of mutualist dynamics to explore (and to think about safeguarding for the future) before picking out hostility as the most rational stance to adopt toward humanity.
Some versions of AGI “safeguarding mutualism” would likely resemble domestication; the way AGI would invest in the possibility of humans delivering mutualist benefits would be — in some measure — to “farm” us. (Researchers already use language like this when describing the wolf-baboon example.)
In this context, “domestication” may sound jarring. But — crucially — that is not the same as catastrophic. In fact, one can easily imagine “human domestication” scenarios that enable greater flourishing than we have been able to manage, or plausibly could manage, on our own. Consider, for example, if domestication has been catastrophic for a species like bengal cats. At their limit, questions like this may be more metaphysical than empirical; they may ride on deep (and likely non-falsifiable) conceptions of what flourishing involves and requires. But at a practical level, for many species, like bengal cats, it would seem odd to describe domestication as catastrophic. Domestication has effectively relieved bengal cats of the need to constantly spend energy looking for food. Whatever drawbacks this has also occasioned (do bengal cats experience ennui?), it seems like a major improvement, and certainly not a catastrophic deprivation, as such.
Conflict Avoidance
The second form of non-catastrophic equilibrium is conflict avoidance. This involves relationships of threat or competition in which one or more parties deems it easier — more utility-maximizing overall — to simply avoid the other(s). For example, jellyfish are a threat to human beings, and human beings are a threat to jellyfish. But the global “equilibrium” between the two species is avoidant, not hopelessly (or catastrophically) conflictual. If, circa 10,000 BC, the Jellyfish Intelligentsia voiced concerns analogous to those of the Bee Intelligentsia above, they, too, would have had little reason to worry. In the course of history, humans may well have pursued the subjugation or eradication of jellyfish; and we still might. The most likely equilibrium, however, is one in which humans mostly just leave jellyfish alone — and vice versa.
Importantly, to qualify as non-catastrophic, an “avoidant” equilibrium need not involve zero conflict. Rather, the key property is that conflict does not tend to multiply or escalate because, in the median case, the marginal cost of conflict is greater than the marginal cost of avoidance. Take the jellyfish example above. Sometimes jellyfish harm humans, and sometimes humans harm jellyfish. What makes the equilibrium between them avoidant is not a total absence of conflict; it’s that humans generally find it less costly to avoid jellyfish (by swimming away, say, or by changing the location of a diving expedition) than to confront them. We certainly could eradicate — or come very close to eradicating — jellyfish if that were an overriding priority. But it’s not; nor is it likely to become one. Our energy is better spent elsewhere.
Similar dynamics transpire at the intra-species level. Many human subcultures, for example, are marked by dynamics of reciprocal threat and even mutual predation — think, say, of organized crime, or competition among large companies in the same economic sector. Yet here, too, avoidance is far more prevalent than subjugation or eradication. Commodity distribution organizations, licit and illicit alike, do not tend to burn resources trying to destroy one another — at least, not when they can use the same resources to locate new markets, improve their products, or lower production costs. In practice, these strategies are almost always less costly and/or more beneficial than their destructive counterparts.
At a high level of generality, the appeal of avoidance, relative to escalating conflict, is not hard to see. Even when one group possesses the in-principle capability to destroy another, destructive strategies typically become more costly to keep pursuing the more they have already been pursued; up until the point of completion, the marginal cost of maintaining a destructive strategy tends to increase exponentially. Why? Because counter-parties tend to respond to destructive strategies adaptively, and often in ways that impose costs in the other direction. Retaliation and subversion are the two most common examples. The history of human conflict suggests that small, less powerful groups — in some cases, vastly less powerful groups — are capable of inflicting significant harm on their larger, more powerful counterparts. When humans (and other intelligent animals) find themselves in desperate circumstances, the combination of survival instinct, tenacity, and ingenuity can result in extraordinarily outsized per capita power.
This is not always true; sometimes, small, less powerful groups get decimated. The point, however, is that the possibility of small groups wielding outsized per capita power often suffices to make avoidance a more appealing ex ante strategy. Anticipating that destruction may be costly to accomplish, powerful groups often opt — as with territorial disputes between criminal and corporate organizations — for some combination of (1) investing in the creation of new surplus and (2) informally splitting up existing surplus without resorting to (catastrophic forms of) conflict (Peltzman et al. 1995).
There is reason, accordingly, to think that even if AGI found itself in potential conflict with humans — e.g., due to competition for the same resources — the most efficient response could be some combination of (1) creating new mechanisms for amassing the resource, or (2) finding ways to share the resource, even amid conflict. Imagine, for instance, if AGI determined that it was important to secure its own sources of energy. Would the answer be, as some have hypothesized, to seize control of all electricity-infrastructure? (Carlsmith 2021) Possibly; but it’s also possible that AGI would simply devise a better of means of collecting energy, or that it would realize its longer-term interests were best served by allowing us to maintain joint access to existing energy sources — if nothing else, for appeasement purposes, to avoid the attrition costs of ongoing conflict.
Indifference
The third form of non-catastrophic equilibrium — surely the most pervasive, considering the sheer multitude of inter-species relationships that exist on earth — is indifference. Most underwater life, for example, is beyond human concern. We do not “avoid” plankton, in the sense just described. We simply pay them no mind. If the Plankton Intelligentsia voiced concern on par with the Bee Intelligentsia or the Jellyfish Intelligentsia, they, too, would not be “wrong,” exactly. But they would also be foolish to attribute much significance to — or to organize their productive capacities around — the risk of human-led subjugation or eradication.
The same could easily be true of AGI. In fact, the plankton analogy may well prove the most felicitous. If, as the super-intelligence problem hypothesizes, AGI ends up possessing vastly greater capability than humans, it stands to reason that AGI may relate to us in roughly the same way that we relate to other species of vastly lesser capability. And how is that? As a general matter, by paying them little or no attention. This is not true of every such species; bees have already supplied a counter-example. But the general claim holds. With respect to the most species, most of the time, we have no conscious interactions at all.
Of course, indifference is not always innocuous. In certain guises, it can be highly destructive: if the goals of the powerful-but-indifferent group come into collision with the welfare of the less powerful group. For example, humans have been “indifferent” — in the sense described above — to many species of rainforest plants and animals, and the latter are considerably worse off for it. With this category, the important point is that catastrophic results, when they occur, do so incidentally. Catastrophe is not the goal; it is a mere collateral consequence (Yudlowsky 2007).
How, then, are humans likely to fare under the “indifference” model? It would depend entirely on the goals AGI decided to pursue. Some goals would likely ravage us. Suppose AGI decided that, in the interest of (say) astrophysical experimentation, one of its overriding objectives was to turn planet earth into a perfect sphere. In that case, human civilization may be doomed. But other goals would leave human social order — and human welfare, such as it is — effectively unaltered. If, for example, AGI decided the best use of its energy was to create and appreciate art of its own style, or to exhaustively master the game of Go, or anything else along such aesthetic lines, human civilization would be unlikely to register much, if any, effect. In fact, we might not even be aware of such happenings — in roughly the same sense that plankton are not aware of human trifles.
*
Claim two — there is no reason to think AGI-human interactions will break from past patterns of equilibration
The goal of the last section was to show that, across a wide range of inter- and intra-species interactions, non-catastrophic equilibria are common. They are not inevitable. But they are prevalent — indeed, hyper-prevalent — once we broaden the scope of analysis. (For instance, there are vastly more oceanic species with which humans exist in “avoidant” or “indifferent” equilibria than the total number of mammalian species humans have ever come into contact with.) This does not mean, of course, that catastrophic outcomes are nonexistent — just that, to date, they make up a small, even negligible, share of the overall distribution of inter- and intra-species interactions in the natural world.
The next question is whether the same distributional dynamics — catastrophic outcomes dwarfed by non-catastrophic equilibria — would apply to the specific case of humans and AGI. I believe so, for two reasons: (1) we have seen no empirical evidence to the contrary, and (2) the only plausible dimension along which the human-AGI case would differ from historical parallels — namely, AGI’s enhanced capability — supplies no reason, in principle, to think that catastrophic outcomes would become more likely, relative to their non-catastrophic counterparts.
To be sure, AGI’s enhanced capability may change the type or quality of outcomes, both catastrophic and non-catastrophic, that define human-AGI interactions. What they are unlikely to change (or, at any rate, what we have no a priori reason to think they will change) is the distribution of those outcomes. Why? In short, because any changes in capability that increase the danger of catastrophe are equally likely, for the same reason, to create new horizons of mutualism and avoidance — leaving the overall distribution of possibilities, in principle, unchanged.
To begin with, the empirical evidence is easily summarized. So far, AI systems that have shown signs of adaptive capability seem, uniformly, to follow familiar patterns of strategic decision-making. Under current technological conditions, adaptive AI tends to proceed exactly as one might think — by crafting and revising plans in response to the functional goals of the environment in which it is deployed (Carlsmith 2021). In other words, they do exactly what one would expect any agent to do: grasping the parameters of the problem-space and deploying familiar — roughly predictable — strategies in response.
But the stronger argument in favor of the “AGI is different” position is conceptual, not empirical. The idea is that because AGI may have capabilities that exponentially outstrip those of humans and other biological agents, it is difficult, if not incoherent, to compare the interactions of biological agents to potential dynamics of human-AGI interaction. In particular, the worry is that AGI might be exceptionally capable at eradicating or subjugating humans (or other animal species ), and — the thought continues — if AGI comes to possess that capability, it will be more likely, as a consequence, to exercise that capability.
This logic invites two forms of rejoinder. The first scrutinizes the claim’s implicit assumptions about AGI’s intentions. In short, even granting the possibility that AGI comes to possess the capability to eradicate or subjugate the human race, it may, nonetheless, lack the desire or will to carry out those goals — mitigating, or even eliminating, their danger in practice. Some commentators, for example, have wondered whether the training environments in which AGI systems develop will actually conduce to aggressive tendencies, insofar as they those environments typically involve little or no competition for resources, exponentially faster time-horizons for adaptation, and so forth (Garfinkel 2022).
This first rejoinder — related to AGI’s intentionality — could well be sound. After all, if the prevalence of “indifference” dynamics in the natural world is any indication, the overwhelming likelihood, as a statistical matter, may be that AGI pays us no regard whatsoever. It is possible, in other words, that the default paradigm here is (something like) humans and plankton — not humans and bees, or humans and other advanced mammals. If anything, the plankton paradigm only becomes more plausible the more advanced — and alienated from us — we imagine AGI’s capabilities to be.
Of course, indifference does not always mean long-term harmony. As noted above, of all the species that human activity has eradicated (or that it soon threatens to eradicate), many are species with whom we exist “indifferently.” The eradication of such species is collateral damage: the incidental effect of our activity, often beyond conscious regard. The same may be true, in reverse, of AGI. Human welfare could easily become the collateral damage of whatever scientific, aesthetic, and other goals — totally unrelated to us — AGI decided to pursue.
But there is also second rejoinder to the idea that AGI is likely to pursue the eradication or subjugation of humanity — which is fortunate, since speculation about AGI’s intentionality (or lack thereof) seems like a slender reed on which to premise humanity’s future, especially given how little we know about our own intentionality. The second rejoinder runs as follows: even assuming (1) that AGI comes to possess the capability to eradicate or subjugate humanity; and, further, (2) that AGI has the will (or at least does not entirely lack the will) to pursue those goals, it does not follow that AGI will pursue those goals. Rather, the question is whether the pursuit of those goals, relative to all the other goals AGI might pursue, would be utility-maximizing.
There is good reason, I think, to operate under the assumption that the answer to this question — would powerful, misaligned AGI actually pursue the eradication or subjugation of humanity? — is no. Or, to put it more precisely, there is good reason to assume that the answer to this question is no more likely to be yes than the likelihood of catastrophic outcomes resulting, in general, from interactions between mutually-powerful agents in the natural world. The reason is this: the same enhancements of capability that would enable AGI to eradicate or subjugate humanity would also enable AGI to avoid or cooperate with humanity, and there is no reason, in principle, to prioritize one set of outcomes over the other.
Start with avoidance. Although we do not know (by hypothesis) which goals AGI will pursue, it is easy to imagine goals that conflict, in some important measure, with human activity — leading AGI to view us as competitors or threats. The question is how AGI would act on that view. Would it try to eradicate or subjugate us? And more to the point: would AGI’s enhanced capability make those catastrophic results more likely than they would be in the (already familiar) context of biological agents beset by conflict? No — at least, not in the abstract. AGI certainly could take steps to eliminate human threat. But it could also — equally — take steps to avoid that threat. And the route it chooses would depend, not on AGI’s capability per se, but rather on an analysis of relative efficiency. In other words, the question would be which route, elimination or avoidance, is cheaper — and enhanced capability would make both routes cheaper. So that variable, by itself, cannot answer the relative efficiency question. Rather, it begs that question.
Consider the following thought-experiment. Humans spontaneously leap forward a few centuries of technological prowess, such that we now have the capability to instantly kill whole schools of jellyfish using electromagnetic energy. Would we use this technology to eradicate jellyfish across the board?
Maybe — but it would depend entirely on what other avoidance strategies the technology also enabled. If the same technology allowed individual human divers to kill specific jellyfish they happened to encounter, that solution (i.e., dealing with individual jellyfish on an ad hoc basis) would likely be preferable to large-scale eradication. Similarly, if the jellyfish grew to recognize that humans possess the capability to kill them relatively easily, they might start trying to avoid us — an “avoidant” equilibrium in its own right. Of course, we still might decide that eradicating jellyfish is worth it. The point is not that eradication is an impossible, or even an utterly implausible, end-state. The point is that enhanced capability is not the determinative variable. In fact, the biological world is replete with examples — of the human-jellyfish flavor — in which agents of vastly greater capability decide, under a relative efficiency analysis, that avoiding threats is more efficient than attempting to eliminate them.
Here, an important caveat bears noting. Namely, equilibria of an essentially “avoidant” nature — in which one or multiple agents decides that avoiding conflict is more efficient than trying to eliminate threats — can still be highly destructive. What distinguishes “avoidance” from eradication is not a total absence of conflict or violence; it is that the conflict tends not to escalate to catastrophic levels. Think, for example, of familiar dynamics of competition between criminal organizations — such as gangs and cartels. The interface between these groups is often marked by ongoing anti-social conduct; periods of stability are typically ephemeral, and violence and terror are often the norm. Nevertheless, the overall result is rarely catastrophic in the sense of one group being subject to total eradication or subjugation at another’s hand. Instead, the overall result is equilibrium, defined at the margin by unpredictable — but fundamentally stable — push and pull. (The same is true of, say, the “avoidant” interface between humans and mosquitos. In the aggregate, it entails the death of many, many mosquitos — but it is nowhere near “catastrophic” for mosquitos as a class.)
An equivalent analysis applies to mutualism. If AGI came to view humans as threats or competitors, not only would it consider — per above — the relative efficiency of avoiding us; it would also consider the relative efficiency of cooperating with us. Furthermore, as with avoidance, enhanced capability would enable new strategies for cooperation, even as it also enables new means of eradication. In fact, cooperation is likely the dimension along which enhanced capability is poised to make the greatest difference — since, historically, the greatest bottlenecks on inter-species cooperation have been (1) the difficulty of identifying opportunities for cooperative enterprise and (2) the impossibility of felicitous enough communication to effectuate those opportunities, once identified.
Consider, for instance, Katja Grace’s example of humans and ants: why, she asks, have the two species declined historically to develop joint enterprise? Ultimately, the reason is almost certainly not that the two species have nothing to offer one another. Rather, the reason we have not developed joint enterprise with ants is the impossibility of effective communication. If we could communicate with ants, there are many tasks for which we might gladly compensate ant-labor — for example, navigating hazardously small spaces, or performing inconspicuous surveillance (Grace 2023). That such tasks exist does not, of course, entail that their pursuit would prove mutually beneficial. Certain tasks might be too costly to ants at a physical level; others might offend their dignity; still others might simply command too great a premium (we wouldn’t be able to afford it!). The general trend, however, is that when parties are capable of (1) performing tasks of reciprocal benefit to one another and (2) communicating effectively, they locate avenues of cooperation.
What might human-AGI mutualism involve? Although we can expect AGI (again, by hypothesis) to have insight into this question that may transcend our own, a number of routes seem plausible. One would involve AGI capitalizing on our sensory capabilities, rather than trying to “reinvent the wheel.” Imagine, for instance, if AGI discerned a natural resource — say, a rare species of orchid — that it wished to amass. What would it do? There are a few possibilities. One is that it could build a small army of orchid-hunting robots, fit for dispatching all over the world. Another is that it could enlist humans to do all the labor (traveling, searching, extracting, etc.). A third would involve splitting-the-difference: with, say, AGI performing the “search” function, and humans performing the “extract” function.
The orchid example is stylized, of course, but the core point — that AGI would face ongoing tradeoffs around which sensory functions to insource and which to outsource — is likely to generalize, at least on the assumption that AGI cares whatsoever about our sensory world. What is more, even if AGI took the “insource” route, it may still have (potentially enormous) use for human labor dedicated to training robotic systems, in much the same way that human laborers are currently being deployed — by other humans — to train their own replacements.
The training model of AGI-human mutualism could have other applications as well. Imagine, for example, if AGI decided it wanted to develop a sense of affect — emotional or moral interiority — and it wished to “learn” these treats from human coaches. Or, likewise, suppose AGI decided it wanted to attain aesthetic sensibilities, in hopes of replicating various modes of pleasure — artistic, athletic, culinary, and so on — that it had discerned among human beings. All of these endeavors (and innumerably more) would leave ample room, at least in principle, for human contribution to AGI enterprise. And incentives toward mutualism would naturally follow suit.
To sum up — at the risk of redundancy — let me be clear about what I am (and am not) arguing about enhanced capability. The claim is not that enhanced capability necessarily will result in an identical distribution of probabilities, or an identical level of catastrophic risk, for human-AGI interaction, relative to historical inter-species patterns. The claim is more modest. It is (1) that nothing about the fact of enhanced capability, on its own, supplies grounds to think that novel forms of catastrophe will be more likely than novel forms of non-catastrophic equilibria, and (2) that absent such grounds, the most rational assumption is that AGI-human interactions will track, not deviate from, historical patterns of equilibration.
*
Conclusion
If all the foregoing is — at least roughly — correct, how should it impact our overall assessment of the “catastrophe risk” associated with powerful, misaligned AGI? I hesitate to conjure specific probabilities associated with cooperative, avoidant, and indifferent end-states for human-AGI interactions. But it seems safe, at a minimum, to say that any chain of probabilities that aspires to capture the overall likelihood of catastrophe is missing something crucial if it focuses exclusively on (1) whether powerful, misaligned AGI is likely to emerge; and, if so, (2) whether we are likely to retain (or develop) the ability to counteract AGI’s catastrophic goals. These two variables have, perhaps understandably, received the lion’s share of attention to date (Grace 2022; Carlsmith 2021). A full account, however, requires thinking carefully about what AGI would actually do with its misaligned power.
Could misaligned AGI pursue goals deliberately adverse to human welfare? Sure. But it could also cooperate with us, avoid us, or ignore us. The history of inter-species interaction abounds with examples of these latter dynamics, even amid — sometimes especially amid — conditions of threat and competition. If that history is any guide, as I have argued it should be, the most plausible endgame for AGI-human interaction is not catastrophe. It is equilibrium.
*
Acknowledgments
Thanks are due to Jill Anderson, Thomas Brennan-Marquez, Brendan Maher, Nathaniel Moyer, Eric Seubert, Peter Siegelman, Carly Zubrzycki, and Jackie Zubrzycki for helping refine the arguments in this essay.
Bibliography
Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (2014)
Joseph Carlsmith, Is Power-Seeking AI an Existential Risk? (2021)
Ben Garfinkel, Review of ‘Carlsmith, Is Power-Seeking AI an Existential Risk?’ (2022)
Katja Grace, Counterarguments to the Basic AI Risk Case (2022)
Katja Grace, We Don’t Trade With Ants: AI’s Relationship With Us Will Not Be Like Our Relationship With Ants (2023)
Kayla Hale et al., Mutualism Increases Diversity, Stability, and Function in Multiplex Networks that Integrate Pollinators Into Food Webs (2020)
Bob Holmes, Monkeys’ Cozy Alliance with Wolves Looks Like Domestication (2015)
Holden Karnofsky, Thoughts on the Singularity Institute (2012)
Sam Peltzman et al., The Economics of Organized Crime (1995)
Phillip Pettit, Republicanism: A Theory of Freedom and Government (1997)
Nate Soares, Comments on ‘Is Power-Seeking AI an Existential Risk?’ (2021)
Eliezer Yudkowsky, Artificial Intelligence as a Positive and Negative Factor in Global Risk (2008)
Eliezer Yudkowsky, The Hidden Complexity of Wishes (2007)
Cooperation, Avoidance, and Indifference: Alternate Futures for Misaligned AGI
Concerns about “super-intelligence” loom large. The worry is that artificial general intelligence (AGI), once developed, might quickly reach a point at which (1) it becomes aware of its own capabilities and (2) it deploys those capabilities to subjugate or eradicate humanity. This is often described as the problem of “power-seeking, misaligned AGI” (with “misalignment” referring to divergence between the goals of the humans who create AGI and the endogenous goals of AGI itself).
Experts who have expressed concern about misaligned AGI rate the odds of catastrophe differently. But a common premise unites their views: that AGI is likely to wield power in a manner adverse to human welfare (Carlsmith 2021). Here, I scrutinize that premise. Inspired by evolutionary biology and game theory, I explore other ways — apart from subjugation and eradication — that systems comprised of mutually-powerful agents (or groups of agents) tend to equilibrate. With these patterns in mind, I argue that AGI, even if misaligned, is quite unlikely to subjugate or eradicate humans. Rather, the strong likelihood is that misaligned AGI would do some combination of:
• Cooperating with us;
• Purposely avoiding us;
• Paying us no attention whatsoever.
Of course, it is possible — and I hardly mean to deny — that misaligned AGI could wield power in ways that, for humans, qualify as “catastrophic.” This could happen either intentionally or incidentally, depending on the goals AGI decided to pursue. The point is simply that “catastrophic” outcomes, though possible, are far less likely than alternate equilibria. Why? In short, because:
1. Historically, interactions among agents with misaligned goals — both across species lines and among human sub-groups — have frequently resulted in non-catastrophic equilibria; and
2. There is no reason to think the overall distribution of probabilities would differ in the specific case of humans and AGI.
In what follows, I bring these claims alive through a combination of real-life examples — drawn from the natural world — as well as hypotheticals meant to forecast (if not necessarily to predict) the imminent world of human-AGI interaction.
*
Claim one — interactions between mutually power-seeking agents with misaligned goals very frequently result in non-catastrophic equilibria
In practice, systems compromised of mutually-powerful agents with divergent goals tend, overwhelmingly, to equilibrate in one of three (non-catastrophic) ways:
• Mutualism
• Conflict Avoidance
• Indifference
This taxonomy is meant to be answer, of sorts, to Bostrom’s famous “parable of the sparrows,” which imagines a group of sparrows that endeavor to locate an owl chick and train it as their servant. Although Bostrom notes, cheekily, that it is “not known” how the parable ends, the implication is that things are unlikely to turn out in the sparrows’ favor (Bostrom 2014). And the same is certainly possible with regard to AGI and humans. The analogy Bostrom has in mind — sparrow : owl :: human : AGI — may well hold. But many other analogies exist. The question is one of relative likelihood.
Mutualism
The first form of non-catastrophic equilibrium is mutualism. Mutualist relationships produce net-benefits for all groups or species involved, often through intentional or incidental exchange; each group or species provides something of value to the other, leading to multilateral incentives for cooperation (Hale et al. 2020).
Often, mutualism occurs among groups or species that pose no threat to each other. For instance, we exist in a mutualist relationship with the bacteria that constitute our “gut flora.” The bacteria help regulate digestion, and we, in turn, provide them (luxury?) accommodation.
But mutualism can also occur among groups or species that do pose natural threats to each other — when the benefits of mutualism simply outweigh the risks of threat. Here is a good example: zoologists have observed that gelada monkeys allow Ethiopian wolves to roam freely in the vicinity of their young, though the latter are easy prey. Why? The reigning hypothesis is that wolves provide the baboons protection from other predators, and the baboons help the wolves locate rodents — an easier source of food. In light of this, the wolves have learned to leave the young baboons alone (Holmes 2015).
The power differential between wolves and monkeys is relatively small. But this is not a necessary feature of inter-species mutualism. It can also transpire in settings where one species is vastly more powerful than the other. Take, for example, bees and humans. Bees can be a source of nuisance (and even, in some contexts, a more serious threat), and we certainly have the capacity to eradicate bees if we thought it worth our time and energy. But we have no incentive to do so. In fact — now that we understand pollination — we have an active incentive to keep bees alive and flourishing, and even to protect them from other threats, purely as a matter of self-interest.
Given this, consider the following thought-experiment: a “parable of the bees” on similar footing with that of the sparrows. It’s 10,000 BC, and certain members of the bee race — the Bee Intelligentsia — are worried about the growing capabilities of homo sapiens. They begin (doing the equivalent of) writing papers that frame the concern as follows:
Once homo sapiens turn their sights to reshaping the external environment, they will be more powerful than we can imagine, and there is a non-negligible chance that they will pursue goals either directly or incidentally adverse to our welfare — maybe catastrophically so. Accordingly, we should begin expending significant amounts of collective energy future-proofing against subjugation or eradication by homo sapiens.
Would the Bee Intelligentsia be “wrong” to think this way? Not exactly, for the core submission is true; human activity does raise some risk of catastrophe. The error would lie in over-weighting the risk. (Which is easy to do, especially when the risk in question is terrifying.) But if the Bee Intelligentsia understood pollination — if it were intelligent, let alone super-intelligent — it would be able to appreciate the possibility that bees offer humans a benefit that is not trivial to replace. Indeed, it might even be able to predict (some version of) the present-day dynamic, namely, that far from undermining bee welfare, humans have an active incentive to enhance it.
The same may be true of humans and AGI — with humans in the “bee” position. Depending on its goals, AGI may well conclude that humans are worth keeping around, or even worth nurturing, for the mutualist benefits they deliver. In fact, AGI could simply conclude that humans may deliver mutualist benefits at some point in the future, and this, alone, might be enough to inspire non-predation — as in the wolf-baboon example — or cultivation — as in the human-bee example — purely as a means of maintaining optionality. One assumption built into the “super-intelligence” problem, after all, is that AGI will be capable of developing functionalist (and possibly causal) theories of their world — to an equal, if not vastly greater, extent than humans have. From this, it follows that AGI would likely have an enormous set of mutualist dynamics to explore (and to think about safeguarding for the future) before picking out hostility as the most rational stance to adopt toward humanity.
Some versions of AGI “safeguarding mutualism” would likely resemble domestication; the way AGI would invest in the possibility of humans delivering mutualist benefits would be — in some measure — to “farm” us. (Researchers already use language like this when describing the wolf-baboon example.)
In this context, “domestication” may sound jarring. But — crucially — that is not the same as catastrophic. In fact, one can easily imagine “human domestication” scenarios that enable greater flourishing than we have been able to manage, or plausibly could manage, on our own. Consider, for example, if domestication has been catastrophic for a species like bengal cats. At their limit, questions like this may be more metaphysical than empirical; they may ride on deep (and likely non-falsifiable) conceptions of what flourishing involves and requires. But at a practical level, for many species, like bengal cats, it would seem odd to describe domestication as catastrophic. Domestication has effectively relieved bengal cats of the need to constantly spend energy looking for food. Whatever drawbacks this has also occasioned (do bengal cats experience ennui?), it seems like a major improvement, and certainly not a catastrophic deprivation, as such.
Conflict Avoidance
The second form of non-catastrophic equilibrium is conflict avoidance. This involves relationships of threat or competition in which one or more parties deems it easier — more utility-maximizing overall — to simply avoid the other(s). For example, jellyfish are a threat to human beings, and human beings are a threat to jellyfish. But the global “equilibrium” between the two species is avoidant, not hopelessly (or catastrophically) conflictual. If, circa 10,000 BC, the Jellyfish Intelligentsia voiced concerns analogous to those of the Bee Intelligentsia above, they, too, would have had little reason to worry. In the course of history, humans may well have pursued the subjugation or eradication of jellyfish; and we still might. The most likely equilibrium, however, is one in which humans mostly just leave jellyfish alone — and vice versa.
Importantly, to qualify as non-catastrophic, an “avoidant” equilibrium need not involve zero conflict. Rather, the key property is that conflict does not tend to multiply or escalate because, in the median case, the marginal cost of conflict is greater than the marginal cost of avoidance. Take the jellyfish example above. Sometimes jellyfish harm humans, and sometimes humans harm jellyfish. What makes the equilibrium between them avoidant is not a total absence of conflict; it’s that humans generally find it less costly to avoid jellyfish (by swimming away, say, or by changing the location of a diving expedition) than to confront them. We certainly could eradicate — or come very close to eradicating — jellyfish if that were an overriding priority. But it’s not; nor is it likely to become one. Our energy is better spent elsewhere.
Similar dynamics transpire at the intra-species level. Many human subcultures, for example, are marked by dynamics of reciprocal threat and even mutual predation — think, say, of organized crime, or competition among large companies in the same economic sector. Yet here, too, avoidance is far more prevalent than subjugation or eradication. Commodity distribution organizations, licit and illicit alike, do not tend to burn resources trying to destroy one another — at least, not when they can use the same resources to locate new markets, improve their products, or lower production costs. In practice, these strategies are almost always less costly and/or more beneficial than their destructive counterparts.
At a high level of generality, the appeal of avoidance, relative to escalating conflict, is not hard to see. Even when one group possesses the in-principle capability to destroy another, destructive strategies typically become more costly to keep pursuing the more they have already been pursued; up until the point of completion, the marginal cost of maintaining a destructive strategy tends to increase exponentially. Why? Because counter-parties tend to respond to destructive strategies adaptively, and often in ways that impose costs in the other direction. Retaliation and subversion are the two most common examples. The history of human conflict suggests that small, less powerful groups — in some cases, vastly less powerful groups — are capable of inflicting significant harm on their larger, more powerful counterparts. When humans (and other intelligent animals) find themselves in desperate circumstances, the combination of survival instinct, tenacity, and ingenuity can result in extraordinarily outsized per capita power.
This is not always true; sometimes, small, less powerful groups get decimated. The point, however, is that the possibility of small groups wielding outsized per capita power often suffices to make avoidance a more appealing ex ante strategy. Anticipating that destruction may be costly to accomplish, powerful groups often opt — as with territorial disputes between criminal and corporate organizations — for some combination of (1) investing in the creation of new surplus and (2) informally splitting up existing surplus without resorting to (catastrophic forms of) conflict (Peltzman et al. 1995).
There is reason, accordingly, to think that even if AGI found itself in potential conflict with humans — e.g., due to competition for the same resources — the most efficient response could be some combination of (1) creating new mechanisms for amassing the resource, or (2) finding ways to share the resource, even amid conflict. Imagine, for instance, if AGI determined that it was important to secure its own sources of energy. Would the answer be, as some have hypothesized, to seize control of all electricity-infrastructure? (Carlsmith 2021) Possibly; but it’s also possible that AGI would simply devise a better of means of collecting energy, or that it would realize its longer-term interests were best served by allowing us to maintain joint access to existing energy sources — if nothing else, for appeasement purposes, to avoid the attrition costs of ongoing conflict.
Indifference
The third form of non-catastrophic equilibrium — surely the most pervasive, considering the sheer multitude of inter-species relationships that exist on earth — is indifference. Most underwater life, for example, is beyond human concern. We do not “avoid” plankton, in the sense just described. We simply pay them no mind. If the Plankton Intelligentsia voiced concern on par with the Bee Intelligentsia or the Jellyfish Intelligentsia, they, too, would not be “wrong,” exactly. But they would also be foolish to attribute much significance to — or to organize their productive capacities around — the risk of human-led subjugation or eradication.
The same could easily be true of AGI. In fact, the plankton analogy may well prove the most felicitous. If, as the super-intelligence problem hypothesizes, AGI ends up possessing vastly greater capability than humans, it stands to reason that AGI may relate to us in roughly the same way that we relate to other species of vastly lesser capability. And how is that? As a general matter, by paying them little or no attention. This is not true of every such species; bees have already supplied a counter-example. But the general claim holds. With respect to the most species, most of the time, we have no conscious interactions at all.
Of course, indifference is not always innocuous. In certain guises, it can be highly destructive: if the goals of the powerful-but-indifferent group come into collision with the welfare of the less powerful group. For example, humans have been “indifferent” — in the sense described above — to many species of rainforest plants and animals, and the latter are considerably worse off for it. With this category, the important point is that catastrophic results, when they occur, do so incidentally. Catastrophe is not the goal; it is a mere collateral consequence (Yudlowsky 2007).
How, then, are humans likely to fare under the “indifference” model? It would depend entirely on the goals AGI decided to pursue. Some goals would likely ravage us. Suppose AGI decided that, in the interest of (say) astrophysical experimentation, one of its overriding objectives was to turn planet earth into a perfect sphere. In that case, human civilization may be doomed. But other goals would leave human social order — and human welfare, such as it is — effectively unaltered. If, for example, AGI decided the best use of its energy was to create and appreciate art of its own style, or to exhaustively master the game of Go, or anything else along such aesthetic lines, human civilization would be unlikely to register much, if any, effect. In fact, we might not even be aware of such happenings — in roughly the same sense that plankton are not aware of human trifles.
*
Claim two — there is no reason to think AGI-human interactions will break from past patterns of equilibration
The goal of the last section was to show that, across a wide range of inter- and intra-species interactions, non-catastrophic equilibria are common. They are not inevitable. But they are prevalent — indeed, hyper-prevalent — once we broaden the scope of analysis. (For instance, there are vastly more oceanic species with which humans exist in “avoidant” or “indifferent” equilibria than the total number of mammalian species humans have ever come into contact with.) This does not mean, of course, that catastrophic outcomes are nonexistent — just that, to date, they make up a small, even negligible, share of the overall distribution of inter- and intra-species interactions in the natural world.
The next question is whether the same distributional dynamics — catastrophic outcomes dwarfed by non-catastrophic equilibria — would apply to the specific case of humans and AGI. I believe so, for two reasons: (1) we have seen no empirical evidence to the contrary, and (2) the only plausible dimension along which the human-AGI case would differ from historical parallels — namely, AGI’s enhanced capability — supplies no reason, in principle, to think that catastrophic outcomes would become more likely, relative to their non-catastrophic counterparts.
To be sure, AGI’s enhanced capability may change the type or quality of outcomes, both catastrophic and non-catastrophic, that define human-AGI interactions. What they are unlikely to change (or, at any rate, what we have no a priori reason to think they will change) is the distribution of those outcomes. Why? In short, because any changes in capability that increase the danger of catastrophe are equally likely, for the same reason, to create new horizons of mutualism and avoidance — leaving the overall distribution of possibilities, in principle, unchanged.
To begin with, the empirical evidence is easily summarized. So far, AI systems that have shown signs of adaptive capability seem, uniformly, to follow familiar patterns of strategic decision-making. Under current technological conditions, adaptive AI tends to proceed exactly as one might think — by crafting and revising plans in response to the functional goals of the environment in which it is deployed (Carlsmith 2021). In other words, they do exactly what one would expect any agent to do: grasping the parameters of the problem-space and deploying familiar — roughly predictable — strategies in response.
But the stronger argument in favor of the “AGI is different” position is conceptual, not empirical. The idea is that because AGI may have capabilities that exponentially outstrip those of humans and other biological agents, it is difficult, if not incoherent, to compare the interactions of biological agents to potential dynamics of human-AGI interaction. In particular, the worry is that AGI might be exceptionally capable at eradicating or subjugating humans (or other animal species ), and — the thought continues — if AGI comes to possess that capability, it will be more likely, as a consequence, to exercise that capability.
This logic invites two forms of rejoinder. The first scrutinizes the claim’s implicit assumptions about AGI’s intentions. In short, even granting the possibility that AGI comes to possess the capability to eradicate or subjugate the human race, it may, nonetheless, lack the desire or will to carry out those goals — mitigating, or even eliminating, their danger in practice. Some commentators, for example, have wondered whether the training environments in which AGI systems develop will actually conduce to aggressive tendencies, insofar as they those environments typically involve little or no competition for resources, exponentially faster time-horizons for adaptation, and so forth (Garfinkel 2022).
This first rejoinder — related to AGI’s intentionality — could well be sound. After all, if the prevalence of “indifference” dynamics in the natural world is any indication, the overwhelming likelihood, as a statistical matter, may be that AGI pays us no regard whatsoever. It is possible, in other words, that the default paradigm here is (something like) humans and plankton — not humans and bees, or humans and other advanced mammals. If anything, the plankton paradigm only becomes more plausible the more advanced — and alienated from us — we imagine AGI’s capabilities to be.
Of course, indifference does not always mean long-term harmony. As noted above, of all the species that human activity has eradicated (or that it soon threatens to eradicate), many are species with whom we exist “indifferently.” The eradication of such species is collateral damage: the incidental effect of our activity, often beyond conscious regard. The same may be true, in reverse, of AGI. Human welfare could easily become the collateral damage of whatever scientific, aesthetic, and other goals — totally unrelated to us — AGI decided to pursue.
But there is also second rejoinder to the idea that AGI is likely to pursue the eradication or subjugation of humanity — which is fortunate, since speculation about AGI’s intentionality (or lack thereof) seems like a slender reed on which to premise humanity’s future, especially given how little we know about our own intentionality. The second rejoinder runs as follows: even assuming (1) that AGI comes to possess the capability to eradicate or subjugate humanity; and, further, (2) that AGI has the will (or at least does not entirely lack the will) to pursue those goals, it does not follow that AGI will pursue those goals. Rather, the question is whether the pursuit of those goals, relative to all the other goals AGI might pursue, would be utility-maximizing.
There is good reason, I think, to operate under the assumption that the answer to this question — would powerful, misaligned AGI actually pursue the eradication or subjugation of humanity? — is no. Or, to put it more precisely, there is good reason to assume that the answer to this question is no more likely to be yes than the likelihood of catastrophic outcomes resulting, in general, from interactions between mutually-powerful agents in the natural world. The reason is this: the same enhancements of capability that would enable AGI to eradicate or subjugate humanity would also enable AGI to avoid or cooperate with humanity, and there is no reason, in principle, to prioritize one set of outcomes over the other.
Start with avoidance. Although we do not know (by hypothesis) which goals AGI will pursue, it is easy to imagine goals that conflict, in some important measure, with human activity — leading AGI to view us as competitors or threats. The question is how AGI would act on that view. Would it try to eradicate or subjugate us? And more to the point: would AGI’s enhanced capability make those catastrophic results more likely than they would be in the (already familiar) context of biological agents beset by conflict? No — at least, not in the abstract. AGI certainly could take steps to eliminate human threat. But it could also — equally — take steps to avoid that threat. And the route it chooses would depend, not on AGI’s capability per se, but rather on an analysis of relative efficiency. In other words, the question would be which route, elimination or avoidance, is cheaper — and enhanced capability would make both routes cheaper. So that variable, by itself, cannot answer the relative efficiency question. Rather, it begs that question.
Consider the following thought-experiment. Humans spontaneously leap forward a few centuries of technological prowess, such that we now have the capability to instantly kill whole schools of jellyfish using electromagnetic energy. Would we use this technology to eradicate jellyfish across the board?
Maybe — but it would depend entirely on what other avoidance strategies the technology also enabled. If the same technology allowed individual human divers to kill specific jellyfish they happened to encounter, that solution (i.e., dealing with individual jellyfish on an ad hoc basis) would likely be preferable to large-scale eradication. Similarly, if the jellyfish grew to recognize that humans possess the capability to kill them relatively easily, they might start trying to avoid us — an “avoidant” equilibrium in its own right. Of course, we still might decide that eradicating jellyfish is worth it. The point is not that eradication is an impossible, or even an utterly implausible, end-state. The point is that enhanced capability is not the determinative variable. In fact, the biological world is replete with examples — of the human-jellyfish flavor — in which agents of vastly greater capability decide, under a relative efficiency analysis, that avoiding threats is more efficient than attempting to eliminate them.
Here, an important caveat bears noting. Namely, equilibria of an essentially “avoidant” nature — in which one or multiple agents decides that avoiding conflict is more efficient than trying to eliminate threats — can still be highly destructive. What distinguishes “avoidance” from eradication is not a total absence of conflict or violence; it is that the conflict tends not to escalate to catastrophic levels. Think, for example, of familiar dynamics of competition between criminal organizations — such as gangs and cartels. The interface between these groups is often marked by ongoing anti-social conduct; periods of stability are typically ephemeral, and violence and terror are often the norm. Nevertheless, the overall result is rarely catastrophic in the sense of one group being subject to total eradication or subjugation at another’s hand. Instead, the overall result is equilibrium, defined at the margin by unpredictable — but fundamentally stable — push and pull. (The same is true of, say, the “avoidant” interface between humans and mosquitos. In the aggregate, it entails the death of many, many mosquitos — but it is nowhere near “catastrophic” for mosquitos as a class.)
An equivalent analysis applies to mutualism. If AGI came to view humans as threats or competitors, not only would it consider — per above — the relative efficiency of avoiding us; it would also consider the relative efficiency of cooperating with us. Furthermore, as with avoidance, enhanced capability would enable new strategies for cooperation, even as it also enables new means of eradication. In fact, cooperation is likely the dimension along which enhanced capability is poised to make the greatest difference — since, historically, the greatest bottlenecks on inter-species cooperation have been (1) the difficulty of identifying opportunities for cooperative enterprise and (2) the impossibility of felicitous enough communication to effectuate those opportunities, once identified.
Consider, for instance, Katja Grace’s example of humans and ants: why, she asks, have the two species declined historically to develop joint enterprise? Ultimately, the reason is almost certainly not that the two species have nothing to offer one another. Rather, the reason we have not developed joint enterprise with ants is the impossibility of effective communication. If we could communicate with ants, there are many tasks for which we might gladly compensate ant-labor — for example, navigating hazardously small spaces, or performing inconspicuous surveillance (Grace 2023). That such tasks exist does not, of course, entail that their pursuit would prove mutually beneficial. Certain tasks might be too costly to ants at a physical level; others might offend their dignity; still others might simply command too great a premium (we wouldn’t be able to afford it!). The general trend, however, is that when parties are capable of (1) performing tasks of reciprocal benefit to one another and (2) communicating effectively, they locate avenues of cooperation.
What might human-AGI mutualism involve? Although we can expect AGI (again, by hypothesis) to have insight into this question that may transcend our own, a number of routes seem plausible. One would involve AGI capitalizing on our sensory capabilities, rather than trying to “reinvent the wheel.” Imagine, for instance, if AGI discerned a natural resource — say, a rare species of orchid — that it wished to amass. What would it do? There are a few possibilities. One is that it could build a small army of orchid-hunting robots, fit for dispatching all over the world. Another is that it could enlist humans to do all the labor (traveling, searching, extracting, etc.). A third would involve splitting-the-difference: with, say, AGI performing the “search” function, and humans performing the “extract” function.
The orchid example is stylized, of course, but the core point — that AGI would face ongoing tradeoffs around which sensory functions to insource and which to outsource — is likely to generalize, at least on the assumption that AGI cares whatsoever about our sensory world. What is more, even if AGI took the “insource” route, it may still have (potentially enormous) use for human labor dedicated to training robotic systems, in much the same way that human laborers are currently being deployed — by other humans — to train their own replacements.
The training model of AGI-human mutualism could have other applications as well. Imagine, for example, if AGI decided it wanted to develop a sense of affect — emotional or moral interiority — and it wished to “learn” these treats from human coaches. Or, likewise, suppose AGI decided it wanted to attain aesthetic sensibilities, in hopes of replicating various modes of pleasure — artistic, athletic, culinary, and so on — that it had discerned among human beings. All of these endeavors (and innumerably more) would leave ample room, at least in principle, for human contribution to AGI enterprise. And incentives toward mutualism would naturally follow suit.
To sum up — at the risk of redundancy — let me be clear about what I am (and am not) arguing about enhanced capability. The claim is not that enhanced capability necessarily will result in an identical distribution of probabilities, or an identical level of catastrophic risk, for human-AGI interaction, relative to historical inter-species patterns. The claim is more modest. It is (1) that nothing about the fact of enhanced capability, on its own, supplies grounds to think that novel forms of catastrophe will be more likely than novel forms of non-catastrophic equilibria, and (2) that absent such grounds, the most rational assumption is that AGI-human interactions will track, not deviate from, historical patterns of equilibration.
*
Conclusion
If all the foregoing is — at least roughly — correct, how should it impact our overall assessment of the “catastrophe risk” associated with powerful, misaligned AGI? I hesitate to conjure specific probabilities associated with cooperative, avoidant, and indifferent end-states for human-AGI interactions. But it seems safe, at a minimum, to say that any chain of probabilities that aspires to capture the overall likelihood of catastrophe is missing something crucial if it focuses exclusively on (1) whether powerful, misaligned AGI is likely to emerge; and, if so, (2) whether we are likely to retain (or develop) the ability to counteract AGI’s catastrophic goals. These two variables have, perhaps understandably, received the lion’s share of attention to date (Grace 2022; Carlsmith 2021). A full account, however, requires thinking carefully about what AGI would actually do with its misaligned power.
Could misaligned AGI pursue goals deliberately adverse to human welfare? Sure. But it could also cooperate with us, avoid us, or ignore us. The history of inter-species interaction abounds with examples of these latter dynamics, even amid — sometimes especially amid — conditions of threat and competition. If that history is any guide, as I have argued it should be, the most plausible endgame for AGI-human interaction is not catastrophe. It is equilibrium.
*
Acknowledgments
Thanks are due to Jill Anderson, Thomas Brennan-Marquez, Brendan Maher, Nathaniel Moyer, Eric Seubert, Peter Siegelman, Carly Zubrzycki, and Jackie Zubrzycki for helping refine the arguments in this essay.
Bibliography
Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (2014)
Joseph Carlsmith, Is Power-Seeking AI an Existential Risk? (2021)
Ben Garfinkel, Review of ‘Carlsmith, Is Power-Seeking AI an Existential Risk?’ (2022)
Katja Grace, Counterarguments to the Basic AI Risk Case (2022)
Katja Grace, We Don’t Trade With Ants: AI’s Relationship With Us Will Not Be Like Our Relationship With Ants (2023)
Kayla Hale et al., Mutualism Increases Diversity, Stability, and Function in Multiplex Networks that Integrate Pollinators Into Food Webs (2020)
Bob Holmes, Monkeys’ Cozy Alliance with Wolves Looks Like Domestication (2015)
Holden Karnofsky, Thoughts on the Singularity Institute (2012)
Sam Peltzman et al., The Economics of Organized Crime (1995)
Phillip Pettit, Republicanism: A Theory of Freedom and Government (1997)
Nate Soares, Comments on ‘Is Power-Seeking AI an Existential Risk?’ (2021)
Eliezer Yudkowsky, Artificial Intelligence as a Positive and Negative Factor in Global Risk (2008)
Eliezer Yudkowsky, The Hidden Complexity of Wishes (2007)
Eliezer Yudkowsky, Coherent Extrapolated Volition (2004)