Analyzing the moral value of unaligned AIs

A crucial consideration in assessing the risks of advanced AI is the moral value we place on “unaligned” AIs—systems that do not share human preferences—which could emerge if we fail to make enough progress on technical alignment.

In this post I’ll consider three potential moral perspectives, and analyze what each of them has to say about the normative value of the so-called “default” unaligned AIs that humans might eventually create:

  1. Standard total utilitarianism combined with longtermism: the view that what matters most is making sure the cosmos is eventually filled with numerous happy beings.

  2. Human species preservationism: the view that what matters most is making sure the human species continues to exist into the future, independently from impartial utilitarian imperatives.

  3. Near-termism or present-person affecting views: what matters most is improving the lives of those who currently exist, or will exist in the near future.

I argue that from the first perspective, unaligned AIs don’t seem clearly bad in expectation relative to their alternatives, since total utilitarianism is impartial to whether AIs share human preferences or not. A key consideration here is whether unaligned AIs are less likely to be conscious, or less likely to bring about consciousness, compared to alternative aligned AIs. On this question, I argue that there are considerations both ways, and no clear answers. Therefore, it tentatively appears that the normative value of alignment work is very uncertain, and plausibly approximately neutral, from a total utilitarian perspective.

However, technical alignment work is much more clearly beneficial from the second and third perspectives. This is because AIs that share human preferences are likely to both preserve the human species and improve the lives of those who currently exist. However, in the third perspective, pausing or slowing down AI is far less valuable than in the second perspective, since it forces existing humans to forego benefits from advanced AI, which I argue will likely be very large.

I personally find moral perspectives (1) and (3) most compelling, and by contrast find view (2) to be uncompelling as a moral view. Yet it is only from perspective (2) that significantly delaying advanced AI for alignment reasons seems clearly beneficial, in my opinion. This is a big reason why I’m not very sympathetic to pausing or slowing down AI as a policy proposal.

While these perspectives do not exhaust the scope of potential moral views, and I do not address every relevant consideration in this discussion, I think this analysis can help to sharpen what goals we intend to pursue by promoting particular forms of AI safety work.

Unaligned AIs from a total utilitarian point of view

Let’s first consider the normative value of unaligned AIs from the first perspective. From a standard total utilitarian perspective, entities matter morally if they are conscious (under hedonistic utilitarianism) or if they have preferences (under preference utilitarianism). From this perspective, it doesn’t actually matter much intrinsically if AIs don’t share human preferences, so long as they are moral patients and have their preferences satisfied.

The following is a prima facie argument that utilitarians shouldn’t care much about technical AI alignment work. Utilitarianism is typically not seen as partial to human preferences in particular. Therefore, efforts to align AI systems with human preferences—the core aim of technical alignment work—may be considered approximately morally neutral from a utilitarian perspective.

The reasoning here is that changing the preferences of AIs to better align them with the preferences of humans doesn’t by itself clearly seem to advance the aims of utilitarianism, in the sense of filling the cosmos with beings who experience positive lives. That’s because AI preferences will likely be satisfied either way, whether we do the alignment work or not. In other words, on utilitarian grounds, it doesn’t really matter whether the preferences of AIs are aligned with human preferences, or whether they are distinct from human preferences, per se: all that matters is whether the preferences get satisfied.

As a result, prima facie, technical alignment work is not clearly valuable from a utilitarian perspective. That doesn’t mean such work is harmful, only that it’s not obviously beneficial, and it’s very plausibly neutral, from a total utilitarian perspective.

Will unaligned AIs be conscious, or create moral value?

Of course, the argument I have just given is undermined considerably if AI alignment work makes it more likely that future beings will be conscious. In that case, alignment work could clearly be beneficial on total hedonistic utilitarian grounds, as it would make the future more likely to be filled with beings who have rich inner experiences, rather than unconscious AIs with no intrinsic moral value.

But this proposition should be proven, not merely assumed, if we are to accept it. We can consider two general arguments for why the proposition might be true:

Argument one: Aligned AIs are more likely to be conscious, or have moral value, than unaligned AIs.

Argument two: Aligned AIs are more likely to have the preference of creating additional conscious entities and adding them to the universe than unaligned AIs, which would further the objectives of total utilitarianism better than the alternative.

Argument one: aligned AIs are more likely to be conscious, or have moral value, than unaligned AIs

As far as I can tell, the first argument appears to rest on a confusion. There seems to be no strong connection between alignment work and making AIs conscious. Intuitively, whether AIs are conscious is a fundamental property of their underlying cognition, rather than a property of their preferences. Yet, AI alignment work largely only targets AI preferences, rather than trying to make AIs more conscious. Therefore, AI alignment work seems to target AI consciousness only indirectly, if at all.

My guess is that the intuition behind this argument often derives from a stereotyped image of what unaligned AIs might be like. The most common stereotyped image of an unaligned AI is the paperclip maximizer. More generally, it is often assumed that, in the absence of extraordinary efforts to align AIs with human preferences, they are likely to be alien-like and/​or have “random” preferences instead. Based on a Twitter poll of mine, I believe this stereotyped image likely plays a major role in why many EAs think that unaligned AI futures would have very little value from a utilitarian perspective.

In a previous post about the moral value of unaligned AI, Paul Christiano wrote,

Many people have a strong intuition that we should be happy for our AI descendants, whatever they choose to do. They grant the possibility of pathological preferences like paperclip-maximization, and agree that turning over the universe to a paperclip-maximizer would be a problem, but don’t believe it’s realistic for an AI to have such uninteresting preferences.

I disagree. I think this intuition comes from analogizing AI to the children we raise, but that it would be just as accurate to compare AI to the corporations we create. Optimists imagine our automated children spreading throughout the universe and doing their weird-AI-analog of art; but it’s just as realistic to imagine automated PepsiCo spreading throughout the universe and doing its weird-AI-analog of maximizing profit.

By contrast, I think I’m broadly more sympathetic to unaligned AIs. I personally don’t think my intuition here comes much from analogizing AI to the children we raise, but instead comes from trying to think clearly about the type of AIs humans are likely to actually build, even if we fall short of solving certain technical alignment problems.

First, it’s worth noting that even if unaligned AIs have “random” preferences (such as maximizing PepsiCo profit), they could still be conscious. For example, one can imagine a civilization of conscious paperclip maximizers, who derive conscious satisfaction from creating more paperclips. There does not seem to be anything contradictory about such a scenario. And if, as I suspect, consciousness arises naturally in minds with sufficient complexity and sophistication, then it may even be difficult to create an AI civilization without consciousness, either aligned or unaligned.

To understand my point here, consider that humans already routinely get conscious satisfaction from achieving goals that might at first seem “arbitrary” when considered alone. For example, one human’s personal desire for a luxury wristwatch does not seem, on its own, to be significantly more morally worthy to me than an AI’s desire to create a paperclip. However, in both cases, achieving the goal could have the side effect of fulfilling preferences inside of a morally relevant agent, creating positive value from a utilitarian perspective, even if the goal itself is not inherently about consciousness.

In this sense, even if AIs have goals that seem arbitrary from our perspective, this does not imply that the satisfaction of those goals won’t have moral value. Just like humans and other animals, AIs could be motivated to pursue their goals because it would make them feel good, in the broad sense of receiving subjective reward or experiencing satisfaction. For these reasons, it seems unjustified to hastily move from “unaligned AIs will have arbitrary goals” to “therefore, an unaligned AI civilization will have almost no moral value”. And under a preference utilitarian framework, this reasoning step seems even less justified, since in that case it shouldn’t matter much at all whether the agent is conscious in the first place.

Perhaps more important to my point, it is not clear why we should assume that a default “unaligned” AI will be more alien-like than an aligned AI in morally relevant respects, given that both will likely have similar pretraining. Even if we just focus on AI preferences rather than AI consciousness, it seems most likely to me that unaligned AI preferences will be similar to, but not exactly like, the preferences implicit in the distribution they were trained on. Since AIs will likely be pretrained in large part on human data, this distribution will likely include tons of human concepts about what things in the world hold value. These concepts will presumably play a large role in AI preference formation, even if not in a way that exactly matches the actual preferences of humans.

To be clear: I think it’s clearly still possible for an AI to pick up human concepts while lacking human preferences. However, a priori, I don’t think there’s much reason to assume that value misalignment among AGIs that humans actually create will be as perverse and “random” as wanting to maximize the number of paperclips in existence, assuming these AIs are pretrained largely on human data. In other words, I’m not convinced that alignment work per se will make AIs less “alien” in the morally relevant sense here.

For the reasons stated above, I find the argument that technical alignment work makes AIs more likely to be conscious very uncompelling.

Is consciousness rare and special?

My guess is that many people hold the view that unaligned AIs will have little moral value because they think consciousness might be very rare and special. In this view, it may be argued that consciousness is not something that’s likely to arise by chance in almost any circumstance, unless there exist humans that deliberately aim to bring it about. However, I suspect this argument relies on a view of consciousness that is likely overly parochial and simplistic.

Consciousness has arguably already arisen independently multiple times in evolutionary history. For example, is widely believed in many circles (such as my Twitter followers), that both octopuses and humans are conscious, despite the fact that the last common ancestor between these species was an extremely primitive flatworm that lived prior to the Cambrian explosion 530 million years ago—a point which is often taken to be the beginning of complex life on Earth. If consciousness could arise independently multiple times in evolutionary history—in species that share almost no homologous neural structures—it seems unlikely to be a rare and special part of our universe.

More generally, most theories of consciousness given by philosophers and cognitive scientists do not appear to give much significance to properties that are unique to biology. Instead, these theories tend to explain consciousness in terms of higher-level information processing building blocks that AIs and robots will likely share with the animal kingdom.

For example, under either Global Workspace Theory or Daniel Dennett’s Multiple Drafts Model, it seems quite likely that very sophisticated AIs—including those that are unaligned with human preferences—will be conscious in a morally relevant sense. If future AIs are trained to perform well in complex, real world physical environments, and are subject to the same type of constraints that animals had to deal with during their evolution, the pressure for them to evolve consciousness seems plausibly equally strong compared to biological organisms.

Why do I have sympathy for unaligned AI?

As a personal note, part of my moral sympathy for unaligned AI comes from generic cosmopolitanism about moral value, which I think is intrinsically downstream from my utilitarian inclinations. As someone with a very large moral circle, I’m happy to admit that strange—even very alien-like—beings could have substantial moral value in their own right, even if they do not share human preferences.

In addition to various empirical questions, I suspect a large part of the disagreement about whether unaligned AIs will have moral value comes down to how much people think these AIs need to be human-like in order for them to be important moral patients. In contrast to perhaps most effective altruists, I believe it is highly plausible that unaligned AIs will have just as much of a moral right to exist and satisfy their preferences as we humans do, even if they are very different from us.

Argument two: aligned AIs are more likely to have a preference for creating new conscious entities, furthering utilitarian objectives

The second argument for the proposition seems more plausible to me. The idea here is simply that one existing human preference is to bring about more conscious entities into existence. For example, total utilitarians have such a preference, and some humans are (at least partly) total utilitarians. If AIs broadly share the preferences of humans, then at least some AIs will share this particular preference, and we can therefore assume that at least some aligned AIs will try to further the goals of total utilitarianism by creating conscious entities.

While I agree with some of the intuitions behind this argument, I think it’s ultimately quite weak. The fraction of humans who are total utilitarians is generally considered to be small. And outside of a desire to have children—which is becoming progressively less common worldwide—most humans do not regularly express explicit and strong preferences to add new conscious entities to the universe.

Indeed, the most common motive humans have for bringing into existence additional conscious entities seems to be to use them instrumentally to satisfy some other preference, such as the human desire to eat meat. This is importantly distinct from wanting to create conscious creatures as an end in itself. Plus, many people have moral intuitions that are directly contrary to total utilitarian recommendations. For example, while utilitarianism generally advocates intervening in wild animal habitats to improve animal welfare, most people favor of habitat preservation, and keeping habitats “natural” instead.

At the least, one can easily imagine an unaligned alternative that’s better from a utilitarian perspective. Consider a case where unaligned AIs place less value on preserving nature than humans do. In this situation, unaligned AIs might generate more utilitarian benefit by transforming natural resources to optimize some utilitarian objective, compared to the actions of aligned AIs constrained by human preferences. Aligned AIs, by respecting human desires to conserve nature, would be forgoing potential utilitarian gains. They would refrain from exploiting substantial portions of the natural world, thus preventing the creation of physically constructed systems that could hold great moral worth, such as datacenters built on protected lands capable of supporting immense numbers of conscious AIs.

The human desire to preserve nature is not the only example of a human preference that conflicts with utilitarian imperatives. See this footnote for an additional concrete example.[1]

My point here is not that humans are anti-utilitarians in general, but merely that most humans have a mix of moral intuitions, and some of these intuitions act against the recommendations of utilitarianism. Therefore, empowering human preferences does not, on a first approximation, look a lot like “giving control over to utilitarians” but rather something different entirely. In general, the human world does not seem well-described as a bunch of utilitarian planners trying to maximize global utility.

And again, unaligned AI preferences seem unlikely to be completely alien or “random” compared to human preferences if AIs are largely trained from the ground-up on human data. In that case, I expect AI moral preferences will most likely approximate human moral preferences to some degree by sharing high-level concepts with us, even if their preferences do not exactly match up with human preferences. Furthermore, as I argued in the previous section, if AIs themselves are conscious, it seems natural for some of them to care about—or at least be motivated by—conscious experience, similar to humans and other animals.

In my opinion, the previous points further undermine the idea that unaligned AI moral preferences will be clearly less utilitarian than the (already not very utilitarian) moral preferences of most humans. In fact, by sharing moral concepts with us, unaligned AIs could be more utilitarian than humans (clearing an arguably already low bar), even if they do not share human preferences.

Moreover, I have previously argued that, if humans solve AI alignment in the technical sense, the main thing that we’ll do with our resources is maximize the economic consumption of existing humans at the time of alignment. These preferences are distinct from utilitarian preferences because they are indexical: people largely value happiness, comfort, and wealth for themselves and their families, not for the world as a whole or for all future generations. Notably, this means that human moral preferences are likely to comparatively unimportant even in an aligned future, relative to more ordinary economic forces that already shape our world.

Consequently, in a scenario where AIs are aligned with human preferences, the consciousness of AIs will likely be determined mainly by economic efficiency factors during production, rather than by moral considerations. To put it another way, the key factor influencing whether AIs are conscious in this scenario will be the relative efficiency of creating conscious AIs compared to unconscious ones for producing the goods and services demanded by future people. As these efficiency factors are likely to be similar in both aligned and unaligned scenarios, we have little reason to believe that aligned AIs will generate more consciousness as a byproduct of consumption compared to unaligned AIs.

To summarize my argument in this section:

  1. To the extent AI preferences are unaligned with human moral preferences, it’s not clear this is worse than the alternative under an aligned scenario from a utilitarian perspective, since human moral preferences are a complex sum of both utilitarian and anti-utilitarian intuitions, and the relative strength of these forces is not clear. Some human moral intuitions, if empowered, would directly act against the recommendations of utilitarianism. Overall, I don’t see strong reasons to think that empowering human moral preferences would advance the objectives of utilitarianism better than a “default” unaligned alternative on net. Since unaligned AIs may share moral concepts with humans, they could plausibly even care more about achieving utilitarian objectives than humans do. At the least, it is easy to imagine unaligned alternatives that perform better by utilitarian lights compared to “aligned” scenarios.

  2. To the extent that AI preferences are unaligned with human economic consumption preferences, I also see no clear argument for why this would be worse, from a utilitarian perspective, than the alternative. Utilitarianism has no intrinsic favoritism for human consumption preferences over e.g. alien-like consumption preferences. In both the case of human consumption and alien-like consumption, consciousness will likely arise as a byproduct of consumption activity. However, if consciousness merely arises as a byproduct of economic activity, there’s no clear reason to assume it will be more likely to arise when it’s a byproduct of human consumption preferences compared to when it’s a byproduct of non-human consumption preferences.

  3. Aligned AIs will likely primarily be aligned to human economic consumption preferences, rather than human moral preferences, strengthening point (2).

In conclusion, I find only weak reasons to believe that utilitarian objectives like filling the universe with happy beings is more likely to happen if AIs are aligned with human preferences, compared to a scenario where they are unaligned. While I do not find the premise here particularly implausible, I also think there are reasonable considerations in both directions. In other words, it seems plausible to me that AI alignment could be either net-bad or net-good from a total utilitarian perspective, and I currently see no strong reasons to think the second possibility is more likely than the first.

As a consequence, this line of reasoning doesn’t move me very strongly towards thinking that AI alignment is morally valuable from a utilitarian perspective. Instead, competing moral considerations about AI alignment—and in particular, its propensity to make humans specifically better off—appear to be stronger reasons to think AI alignment is morally worth pursuing.

Human species preservationism

The case for AI alignment being morally valuable is much more straightforward from the perspective of avoiding human extinction. The reason is because, by definition, AI alignment is about ensuring that AIs share human preferences, and one particularly widespread and strong human preference is the desire to avoid death. If AIs share human preferences, it seems likely they will try to preserve the individuals in the human species, and as a side effect, they will likely preserve the human species itself.

According to a standard argument popular in EA and longtermism, reducing existential risk should be a global priority above most other issues, as it threatens not only currently living people, but also the lives of the much more numerous population of people who could one day inhabit the reachable cosmos. As Nick Bostrom put it, “For standard utilitarians, priority number one, two, three and four should consequently be to reduce existential risk. The utilitarian imperative “Maximize expected aggregate utility!” can be simplified to the maxim “Minimize existential risk!”.”

Traditionally, human extinction has been seen as the prototypical existential risk. However, as some have noted, human extinction from AI differs fundamentally from scenarios like a giant Earth-bound asteroid. This is because unaligned AIs would likely create a cosmic civilization in our absence. In other words, the alternative to a human civilization in the case of an AI existential catastrophe is merely an AI civilization, rather than an empty universe void of any complex life.

The preceding logic implies that we cannot simply assume that avoiding human extinction from AI is a utilitarian imperative, as we might assume for other existential risks. Indeed, if unaligned AIs are more utility-efficient compared to humans, it may even be preferred under utilitarianism for humans to create unaligned AIs, even if that results in human extinction.

Nonetheless, it is plausible that we should not be strict total longtermist utilitarians, and instead hold the (reasonable) view that human extinction would still be very bad, even if we cannot find a strong utilitarian justification for this conclusion. I concur with this perspective, but dissent from the view that avoiding human extinction should be a priority that automatically outranks other large-scale concerns, such as reducing ordinary death from aging, abolishing factory farming, reducing global poverty, and solving wild animal suffering.

In my view, the main (though not only) reason why human extinction from AI would be bad is that it would imply the deaths of all humans who exist at the time of AI development. But as bad as such a tragedy would be, in my view, it would not be far worse than the gradual death of billions of humans, over a period of several decades, which is the literal alternative humans already face in the absence of radical life extension.

I recognize that many people disagree with my moral outlook here and think that human extinction would be far worse than the staggered, individual deaths of all existing humans from aging over several decades. My guess is that many people disagree with me on this point because they care about the preservation of the human species over and above the lives and preferences of individual people who currently exist. We can call this perspective “human species preservationism”.

This view is not inherently utilitarian, as it gives priority to protecting the human species rather than trying to promote the equal consideration of interests. In this sense, it is a speciesist view, in the basic sense that it discriminates on the basis of species membership, ruling out even in principle the possibility of an equally valuable unaligned AI civilization. The fact that this moral view is speciesist is, in my opinion, a decent reason to reject it.[2]

Human species preservationism is also importantly distinguished from near-termism or present-person affecting views, i.e., the view that what matters is improving the lives of people who either already exist or will exist in the near future. That’s because, under the human species preservationist view, it is often acceptable to impose large costs on the current generation of humans, so long as it does not significantly risk the long-term preservation of the human species. For example, the human species preservationist view would find it acceptable to delay a cure to biological aging by 100 years (thereby causing billions of people to die premature deaths) if this had the effect of decreasing the probability of human extinction by 1 percentage point.

Near-termist view of AI risk

As alluded to previously, if we are neither strong longtermist total utilitarians nor care particularly strongly about the preservation of the human species inherently, a plausible alternative is to care primarily about the current generation of humans, or the people who will exist in say, the next 100 years.[3] We can call this the “near-termist view”. In both the human species preservationist view and the near-termist view, AI alignment is clearly valuable, as aligned AIs would have strong reasons to protect the existence of humans. In fact, alignment is even more directly valuable in the near-termist view, as it would obviously be good for currently-existing humans for AIs to share their preferences.

However, the near-termist ethical view significantly departs from the human preservationist view by the way it views slowing down or pausing technological progress, including AI development. This is because slowing down technological progress would likely incur large opportunity costs on the present generation of humans.

Credible economic models suggest that AIs could make humans radically richer. And it is highly plausible that, if AIs can substitute for scientific researchers, they could accelerate technological progress, including in medicine, extending human lifespan and health-span. This would likely dramatically raise human well-being over a potentially short time period. Since these gains are anticipated to be very large, they are highly commensurate with even relatively large probabilities of death on an individual level.

As an analogy, most humans are currently comfortable driving in cars, even though the lifetime probability of a car killing you is greater than 1%. In other words, most humans appear to judge—through their actions—the benefits and convenience of cars as outweighing a 1% lifetime probability of death. Given the credible economic models of AI, it seems likely to me that the benefits of adopting AI are far larger than even the benefits of having access to cars. As a consequence, from the perspective of people who currently exist, it is not obvious that we should delay AI progress substantially even if there is a non-negligible risk that AIs will kill all humans.

As far as I’m aware, the state-of-the-art for modeling this trade-off comes from Chad Jones. A summary of his model is provided as follows,

The curvature of utility is very important. With log utility, the models are remarkably unconcerned with existential risk, suggesting that large consumption gains that A.I. might deliver can be worth gambles that involve a 1-in-3 chance of extinction.

For CRRA utility with a risk aversion coefficient (γ) of 2 or more, the picture changes sharply. These utility functions are bounded, and the marginal utility of consumption falls rapidly. Models with this feature are quite conservative in trading off consumption gains versus existential risk.

These findings even extend to singularity scenarios. If utility is bounded — as it is in the standard utility functions we use frequently in a variety of applications in economics — then even infinite consumption generates relatively small gains. The models with γ ≥ 2 remain conservative with regard to existential risk.

A key exception to this conservative view of existential risk emerges if the rapid innovation associated with A.I. leads to new technologies that extend life expectancy and reduce mortality. These gains are “in the same units” as existential risk and do not run into the sharply declining marginal utility of consumption. Even with a future-oriented focus that comes from low discounting, A.I.-induced mortality reductions can make large existential risks bearable. [emphasis mine]

In short, because of the potential benefits to human wealth and lifespan—from the perspective of people who exist at the time of AI development—it may be beneficial to develop AI even in the face of potentially quite large risks of human extinction, including perhaps a 1-in-3 chance of extinction. If true, this conclusion significantly undermines the moral case for delaying AI for safety or alignment reasons.

One counter-argument is that the potential for future advanced AIs to radically extend human lifespan is very speculative, and therefore it is foolish to significantly risk the extinction of humanity for a speculative chance at dramatically raising human lifespans. However, in my opinion, this argument fails because both the radically good and radically bad possibilities from AI are speculative, and both ideas are ultimately supported by the same underlying assumption: that future advanced AIs will be very powerful, smart, or productive.

To the extent you think that future AIs would not be capable of creating massive wealth for humans, or extending their lifespans, this largely implies that you think future AIs will not be very powerful, smart, or productive. Thus, by the same argument, we should also not think future AIs will be capable of making humanity go extinct. Since this argument symmetrically applies to both bad and good AI potential futures, it is not a strong reason to delay AI development.

A final point to make here is that pausing AI may still be beneficial to currently existing humans if the pause is brief and it causes AI to be much safer as a result (and not merely very slightly safer). This depends on an empirical claim that I personally doubt, although I recognize it as a reasonable counter-point within the context of this discussion. I am not claiming to have discussed every relevant crux in this debate in this short essay.

Conclusion

I have not surveyed anything like an exhaustive set of arguments or moral views regarding AI alignment work or AI pause advocacy. Having said that, I believe the following tentative conclusions likely hold:

  1. It seems difficult to justify AI alignment work via straightforward utilitarian arguments. Arguments that aligned AIs will be more likely to be conscious than unaligned AIs appear strained and confused. Arguments that aligned AIs will be more likely to pursue utilitarian objectives than unaligned AIs appear generally weak, although not particularly implausible either.

  2. Regarding whether we should delay AI development, a key consideration is whether you are a human species preservationist, or whether you care more about the lives and preferences of people who currently exist. In the second case, delaying AI development can be bad even if AI poses a large risk of human extinction. Both of these views come apart from longtermist total utilitarianism, as the first view is speciesist, and the second view is relatively unconcerned with what will happen to non-humans in the very long-term.

The table below summarizes my best guesses on how I suspect each of the three moral perspectives I presented should view the value of AI alignment work and attempts to delay AI development, based on the discussion I have given in this article.

Moral viewValue of AI alignmentValue of delaying advanced AI
Total longtermist utilitarianismUnclear value, plausibly approximately neutral.Unclear value. If the delay is done for AI alignment reasons, the value is plausibly neutral, since AI alignment is plausibly neutral under this perspective.
Human species preservationismClearly valuable in almost any scenario.Clearly valuable if AI poses any non-negligible risk to the preservation of the human species.
Near-termism or present-person affecting viewsClearly valuable in almost any scenario.In my opinion, the value seems likely to be net-negative if AI poses less than a 1-in-3 chance of human extinction or similarly bad outcomes (from the perspective of existing humans).

Perhaps my primary intention while writing this post was to argue against what I perceive to be the naive application of Nick Bostrom’s argument in Astronomical Waste for the overwhelming value of reducing existential risk at the cost of delaying technological progress. My current understanding is that this argument—as applied to AI risk—rests on a conflation of existential risk with the risk of human replacement by another form of life. However, from an impartial utilitarian perspective, these concepts are sharply different.

In my opinion, if one is not committed to the preservation of the human species per se (independent of utilitarian considerations, and independent of the individual people who comprise the human species), then the normative case for delaying AI to solve AI alignment is fairly weak. On the other hand, the value of technical AI alignment by itself appears strong from the ordinary perspective that currently-existing people matter. For this reason, among others, I’m generally supportive of (useful) AI alignment work, but I’m not generally supportive of AI pause advocacy.

  1. ^

    Another example of an anti-total-utilitarian moral intuition that most humans have is the general human reluctance to implement coercive measures to increase human population growth. It is generally strongly recommended under total utilitarianism to increase the population size as much as possible, as long as new lives are not-positive in their contribution to total utility. However, humanity is currently facing a fertility crisis in which birth rates are falling around the world. As far as I’m aware, no country in the world has suggested trying radical policies to increase fertility that would plausibly be recommended under a strict total utilitarian framework, such as forcing people to have children, or legalizing child labor and allowing parents to sell their children’s labor, which could greatly increase the economic incentive of having children.

  2. ^

    A central point throughout this essay is that it’s important to carefully consider one’s reasons for wanting to preserve the human species. If one’s reasons are utilitarian, then the arguments in the first section of this essay apply. If one’s reasons are selfish or present-generation-focused, then the arguments in the third section apply. If neither of these explain why you want to preserve the human species, then it is worth reflecting why you are motivated to preserve an abstract category like species rather than actually-existing individuals or things like happiness and preference satisfaction.

  3. ^

    An alternative way to frame near-termist ethical views is that near-termism is an approximation of longtermism under the assumption that our actions are highly likely to “wash out” over the long-term, and have little to no predictable impact in any particular direction. This perspective can be understood through the lens of two separate considerations:

    1. Perhaps our best guess for how to best help the long-term is to do what’s best in the short-term in the expectation that the values we helped promote in the short-term might propagate into the long-term future, even though this propagation of values is not guaranteed.

    2. If we cannot reliably impact the long-term future, perhaps it is best to focus on actions that affect the short-term future, since this is the only part of the future that we have predictable influence over.