In the absence of meaningful evidence about the nature of AI civilization, what justification is there for assuming that it will have less moral value than human civilization—other than a speciesist bias?
You know these arguments! You have heard them hundreds of times. Humans care about many things. Sometimes we collapse that into caring about experience for simplicity.
AIs will probably not care about the same things, as such, the universe will be worse by our lights if controlled by AI civilizations. We don’t know what exactly those things are, but the only pointer to our values that we have is ourselves, and AIs will not share those pointers.
I think your response largely assumes a human-species-centered viewpoint, rather than engaging with my critique that is precisely aimed at re-evaluating this very point of view.
You say, “AIs will probably not care about the same things, so the universe will be worse by our lights if controlled by AI.” But what are “our lights” and “our values” in this context? Are you referring to the values of me as an individual, the current generation of humans, or humanity as a broad, ongoing species-category? These are distinct—and often conflicting—sets of values, preferences, and priorities. It’s possible, indeed probable, that I, personally, have preferences that differ fundamentally from the majority of humans. “My values” are not the same as “our values”.
When you talk about whether an AI civilization is “better” or “worse,” it’s crucial to clarify what perspective we’re measuring that from. If, from the outset, we assume that human values, or the survival of humanity-as-a-species, is the critical factor that determines whether an AI civilization is better or worse than our own, that effectively begs the question. It merely assumes what I aim to challenge. From a more impartial standpoint, the mere fact that AI might not care about the exact same things humans do doesn’t necessarily entail a decrease in total impartial moral value—unless we’ve already decided in advance that human values are inherently more important.
(To make this point clearer, perhaps replace all mentions of “human values” with “North American values” in the standard arguments about these issues, and see if it makes these arguments sound like they privilege an arbitrary category of beings.)
While it’s valid to personally value the continuation of the human species, or the preservation of human values, as a moral preference above other priorities, my point is simply that that’s precisely the species-centric assumption I’m highlighting, rather than a distinct argument that undermines my observations or analysis. Such a perspective is not substrate or species-neutral. Nor is it obviously mandated by a strictly utilitarian framework; it’s an extra premise that privileges the category “humankind” for its own sake. You may believe that such a preference is natural or good from your own perspective, but that is not equivalent to saying that it is the preference of an impartial utilitarian, who would, in theory, make no inherent distinction based purely on species, or substrate.
Are you assuming some kind of moral realism here? That there’s some deep moral truth, humans may or may not have insight into it, so any other intelligent entity is equally likely to?
If so, idk, I just reject your premise. I value what I chose to value, which is obviously related to human values, and an arbitrary sampled entity is not likely to be better on that front
From my perspective “caring about anything but human values” doesn’t make any sense. Of course, even more specifically, “caring about anything but my own values” also doesn’t make sense, but in as much as you are talking to humans, and making arguments about what other humans should do, you have to ground that in their values and so it makes sense to talk about “human values”.
The AIs will not share the pointer to these values, in the same way as every individual does to their own values, and so we should a-priori assume the AI will do worse things after we transfer all the power from the humans to the AIs.
Let’s define “shumanity” as the set of all humans who are currently alive. Under this definition, every living person today is a “shuman,” but our future children may not be, since they do not yet exist. Now, let’s define “humanity” as the set of all humans who could ever exist, including future generations. Under this broader definition, both we and our future children are part of humanity.
If all currently living humans (shumanity) were to die, this would be a catastrophic loss from the perspective of shuman values—the values held by the people who are alive today. However, it would not necessarily be a catastrophic loss from the perspective of human values—the values of humanity as a whole, across time. This distinction is crucial. In the normal course of events, every generation eventually grows old, dies, and is replaced by the next. When this happens, shumanity, as defined, ceases to exist, and as such, shuman values are lost. However, humanity continues, carried forward by the new generation. Thus, human values are preserved, but not shuman values.
Now, consider this in the context of AI. Would the extinction of shumanity by AIs be much worse than the natural generational cycle of human replacement? In my view, it is not obvious that being replaced by AIs would be much worse than being replaced by future generations of humans. Both scenarios involve the complete loss of the individual values held by currently living people, which is undeniably a major loss. To be very clear, I am not saying that it would be fine if everyone died. But in both cases, something new takes our place, continuing some form of value, mitigating part of the loss. This is the same perspective I apply to AI: its rise might not necessarily be far worse than the inevitable generational turnover of humans, which equally involves everyone dying (which I see as a bad thing!). Maybe “human values” would die in this scenario, but this would not necessarily entail the end of the broader concept of impartial utilitarian value.This is precisely my point.
Now, consider this in the context of AI. Would the extinction of shumanity by AIs be much worse than the natural generational cycle of human replacement?
I think the answer to this is “yes”, because your shared genetics and culture create much more robust pointers to your values than we are likely to get with AI.
Additionally, even if that wasn’t true, humans alive at present have obligations inherited from the past and relatedly obligations to the future. We have contracts and inheritance principles and various things that extend our moral circle of concern beyond just the current generation. It is not sufficient to coordinate with just the present humans, we are engaging in at least some moral trade with future generations, and trading away their influence to AI systems is also not something we have the right to do.
(Importantly, I think we have many fewer such obligations to very distant generations, since I don’t think we are generally borrowing or coordinating with humans living in the far future very much).
From a more impartial standpoint, the mere fact that AI might not care about the exact same things humans do doesn’t necessarily entail a decrease in total impartial moral value—unless we’ve already decided in advance that human values are inherently more important.
Look, this sentence just really doesn’t make any sense to me. From the perspective of humanity, which is composed of many humans, of course the fact that AI does not care about the same things as humans creates a strong presumption that a world optimized for those values will be worse than a world optimized for human values. Yes, current humans are also limited to what degree we successfully can delegate the fulfillment of our values to future generations, but we also just share, on-average, a huge fraction of our values with future generations. That is a struggle every generation faces, and you are just advocating for… total defeat being fine for some reason? Yes, it would be terrible if the next generation of humans suddenly did not care about almost anything I cared about, but that is very unlikely to happen, but it is quite likely to happen with AI systems.
Because there is a much higher correlation between the value of the current generation of humans and the next one than there is between the values of humans and arbitrary AI entities
I’m not talking about “arbitrary AI entities” in this context, but instead, the AI entities who will actually exist in the future, who will presumably be shaped by our training data, as well as our training methods. From this perspective, it’s not clear to me that your claim is true. But even if your claim is true, I was actually making a different point. My point was instead that it isn’t clear that future generations of AIs would be much worse than future generations of humans from an impartial utilitarian point of view.
(That said, it sounds like the real crux between us might instead be about whether pausing AI would be very costly to people who currently exist. If indeed you disagree with me about this point, I’d prefer you reply to my other comment rather than replying to this one, as I perceive that discussion as likely to be more productive.)
I don’t subscribe to moral realism. My own ethical outlook is a blend of personal attachments—my own life, my family, my friends, and other living humans—as well as a broader utilitarian concern for overall well-being. In this post, I focused on impartial utilitarianism because that’s the framework most often used by effective altruists.
However, to the extent that I also have non-utilitarian concerns (like caring about specific people I know), those concerns incline me away from supporting a pause on AI. If AI can accelerate technologies that save and improve the lives of people who exist right now, then slowing it down would cost lives in the near term. A more complete, and more rigorous version of this argument was outlined in the post.
What I find confusing about other EA’s views, including yours, is why we would assign such great importance to “human values” as something specifically tied to the human species as an abstract concept, rather than merely being partial to actual individuals who exist. This perspective is neither utilitarian, nor is it individualistic. It seems to value the concept of the human species over and above the actual individuals that comprise the species, much like how an ideological nationalist might view the survival of their nation as more important than the welfare of all the individuals who actually reside within the nation.
For your broader point of impartiality, I feel like you are continuing to assume some bizarre form of moral realism and I don’t understand the case. Otherwise, why do you not consider rocks to be morally meaningful? Why is a plant not valuable? I can come up with reasons, but these are assuming specific things about what is and is not morally valuable in exactly the same way that when I say arbitrary AI beings are on average substantially less valuable because I have specific preferences and values over what matters. I do not understand the philosophical position you are taking here—it feels like you’re saying that the standard position is speciesist and arbitrary and then drawing an arbitrary distinction slightly further out?
For your broader point of impartiality, I feel like you are continuing to assume some bizarre form of moral realism and I don’t understand the case. Otherwise, why do you not consider rocks to be morally meaningful? Why is a plant not valuable?
Traditionally, utilitarianism regards these things (rocks and plants) as lacking moral value because they do not have well-being or preferences. This principle does not clearly apply to AI, though it’s possible that you are making the assumption that future AIs will lack sentience or meaningful preferences. It would be helpful if you clarified how you perceive me to be assuming a form of moral realism (a meta-ethical theory), as I simply view myself as applying a standard utilitarian framework (a normative theory).
I do not understand the philosophical position you are taking here—it feels like you’re saying that the standard position is speciesist and arbitrary and then drawing an arbitrary distinction slightly further out?
Standard utilitarianism recognizes both morally relevant and morally irrelevant distinctions in value. According to a long tradition, following Jeremy Bentham and Peter Singer, among others, the species category is considered morally irrelevant, whereas sentience and/or preferences are considered morally relevant. I do not think this philosophy rests on the premise of moral realism: rather, it’s a conceptual framework for understanding morality, whether from a moral realist or anti-realist point of view.
To be clear, I agree that utilitarianism is itself arbitrary, from a sufficiently neutral point of view. But it’s also a fairly standard ethical framework, not just in EA but in academic philosophy too. I don’t think I’m making very unusual assumptions here.
Ah! Thanks for clarifying—if I understand correctly, you think that it’s reasonable to assert that sentience and preferences are what makes an entity morally meaningful, but that anything more specific is not? I personally just disagree with that premise, but I can see where you’re coming from
But in that case, it’s highly non obvious to me that AIs will have sentience or preferences in ways that I consider meaningful—this seems like an open philosophical question. Defining actually what they are also seems like an open question to me—does a thermostat have preferences? Does a plant that grows towards the light? While I do feel fairly confident humans are morally meaningful. Is your argument that even if there’s a good chance they’re not morally meaningful, the expected amount of moral significance is comparable to humans?
Thanks for clarifying—if I understand correctly, you think that it’s reasonable to assert that sentience and preferences are what makes an entity morally meaningful, but that anything more specific is not?
I don’t think there’s any moral view that’s objectively more “reasonable” than any other moral view (as I’m a moral anti-realist). However, I personally don’t have a significant moral preference for humans beyond the fact that I am partial to my family, friends, and a lot of other people who are currently alive. When I think about potential future generations who don’t exist yet, I tend to adopt a more impartial, utilitarian framework.
In other words, my moral views can be summarized as a combination of personal attachments and broader utilitarian moral concerns. My personal attachments are not impartial: for example, I care about my family more than I care about random strangers. However, beyond my personal attachments, I tend to take an impartial utilitarian approach that doesn’t assign any special value to the human species.
In other words, to the extent I care about humans specifically, this concern merely arises from the fact that I’m attached to some currently living individuals who happen to be human—rather than because I think the human species is particularly important.
Does that make sense?
But in that case, it’s highly non obvious to me that AIs will have sentience or preferences in ways that I consider meaningful—this seems like an open philosophical question. Defining actually what they are also seems like an open question to me—does a thermostat have preferences?
I agree this is an open question, but I think it’s much clearer that future AIs will have complex and meaningful preferences compared to a thermostat or a plant. I think we can actually be pretty confident about this prediction given the strong economic pressures that will push AIs towards being person-like and agentic. (Note, however, that I’m not making a strong claim here that all AIs will be moral patients in the future. It’s sufficient for my argument if merely a large number of them are.)
In fact, a lot of arguments for AI risk rest on the premise that AI agents will exist in the future, and that they’ll have certain preferences (at least in a functional sense). If we were to learn that future AIs won’t have preferences, that would both undermine these arguments for AI risk, and many of my moral arguments for valuing AIs. Therefore, to the extent you think AIs will lack the cognitive prerequisites for moral patienthood—under my functionalist and preference utilitarian views—this doesn’t necessarily translate into a stronger case for worrying about AI takeover.
However, I want to note that the view I have just described is actually broader than the thesis I gave in the post. If you read my post carefully, you’ll see that I actually hedged quite a bit by saying that there are potential, logically consistent utilitarian arguments that could be made in favor of pausing AI. My thesis in the post was not that such an argument couldn’t be given. It was actually a fairly narrow thesis, and I didn’t make a strong claim that AI-controlled futures would create about as much utilitarian moral value as human-controlled futures in expectation (even though I personally think this claim is plausible).
I think that even the association between functional agency and preferences in a morally valuable sense is an open philosophical question that I am not happy taking as a given.
Regardless, it seems like our underlying crux is that we assign utility to different things. I somewhat object to you saying that your version of this is utilitarianism and notions of assigning utility that privilege things humans value are not
Regardless, it seems like our underlying crux is that we assign utility to different things. I somewhat object to you saying that your version of this is utilitarianism and notions of assigning utility that privilege things humans value are not
I agree that our main point of disagreement seems to be about what we ultimately care about.
For what it’s worth, I didn’t mean to suggest in my post that my moral perspective is inherently superior to others. For example, my argument is fully compatible with someone being a deontologist. My goal was simply to articulate what I saw standard impartial utilitarianism as saying in this context, and to point out how many people’s arguments for AI pause don’t seem to track what standard impartial utilitarianism actually says. However, this only matters insofar as one adheres to that specific moral framework.
As a matter of terminology, I do think that the way I’m using the words “impartial utilitarianism” aligns more strongly with common usage in academic philosophy, given the emphasis that many utilitarians have placed on antispeciesist principles. However, even if you think I’m wrong on the grounds of terminology, I don’t think this disagreement subtracts much from the substance of my post as I’m simply talking about the implications of a common moral theory (regardless of whatever we choose to call it).
If AI can accelerate technologies that save and improve the lives of people who exist right now, then slowing it down would cost lives in the near term.
Huh? This argument only goes through if you have a sufficiently low probability of existential risk or an extremely low change in your probability of existential risk, conditioned on things moving slower. I disagree with both of these assumptions. Which part of your post are you referring to?
Huh? This argument only goes through if you have a sufficiently low probability of existential risk or an extremely low change in your probability of existential risk, conditioned on things moving slower.
This claim seems false, though its truth hinges on what exactly you mean by a “sufficiently low probability of existential risk” and “an extremely low change in your probability of existential risk”.
To illustrate why I think your claim is false, I’ll perform a quick calculation. I don’t know your p(doom), but in a post from three years ago, you stated,
If you believe the key claims of “there is a >=1% chance of AI causing x-risk and >=0.1% chance of bio causing x-risk in my lifetime” this is enough to justify the core action relevant points of EA.
Let’s assume that there’s a 2% chance of AI causing existential risk, and that, optimistically, pausing for a decade would cut this risk in half (rather than barely decreasing it, or even increasing it). This would imply that the total risk would diminish from 2% to 1%.
According to OWID, approximately 63 million people die every year, although this rate is expected to increase, rising to around 74 million in 2035. If we assume that around 68 million people will die per year during the relevant time period, and that they could have been saved by AI-enabled medical progress, then pausing AI for a decade would kill around 680 million people.
This figure is around 8.3% of the current global population, and would constitute a death count higher than the combined death toll from World War 1, World War 2, the Mongol Conquests, the Taiping rebellion, the Transition from Ming to Qing, and the Three Kingdoms Civil war.
(Note that, although we are counting deaths from old age in this case, these deaths are comparable to deaths in war from a years of life lost perspective, if you assume that AI-accelerated medical breakthroughs will likely greatly increase human lifespan.)
From the perspective of an individual human life, a 1% chance of death from AI is significantly lower than a 8.3% chance of death from aging—though obviously in the former case this risk would apply independently of age, and in the latter case, the risk would be concentrated heavily among people who are currently elderly.
Even a briefer pause lasting just two years, while still cutting risk in half, would not survive this basic cost-benefit test. Of course, it’s true that it’s difficult to directly compare the individual personal costs from AI existential risk to the diseases of old age. For example, AI existential risk has the potential to be briefer and less agonizing, which, all else being equal, should push us to favor it. On the other hand, most people might consider death from old age to be preferable since it’s more natural and allows the human species to continue.
Nonetheless, despite these nuances, I think the basic picture that I’m presenting holds up here: under typical assumptions (such as the ones you gave three years ago), a purely individualistic framing of the costs and benefits of AI pause do not clearly favor pausing, from the perspective of people who currently exist. This fact was noted in Nick Bostrom’s original essay on Astronomical Waste, and more recently, by Chad Jones in his paper on the tradeoffs involved in stopping AI development.
Ah, gotcha. Yes, I agree that if your expected reduction in p(doom) is less than around 1% per year of pause, and you assign zero value to future lives, then pausing is bad on utilitarian grounds
Note that my post was not about my actual numerical beliefs, but about a lower bound that I considered highly defensible—I personally expect notably higher than 1%/year reduction and was taking that as given, but on reflection I at least agree that that’s a more controversial belief (I also think that a true pause is nigh impossible)
I expect there are better solutions that achieve many of the benefits of pausing while still enabling substantially better biotech research, but that’s nitpicking
I’m not super sure what you mean by individualistic. I was modelling this as utilitarian but assigning literally zero value to future people. From a purely selfish perspective, I’m in my mid-20s and my chances of dying from natural causes in the next say 20 years are pretty damn low, and this means that given my background beliefs about doom and timelines, slowing down AI is great deal from my perspective. While if I expected to die from old age in the next 5 years I would be a lot more opposed
I’m not super sure what you mean by individualistic. I was modelling this as utilitarian but assigning literally zero value to future people. From a purely selfish perspective, I’m in my mid-20s and my chances of dying from natural causes in the next say 20 years are pretty damn low, and this means that given my background beliefs about doom and timelines, slowing down AI is great deal from my perspective. While if I expected to die from old age in the next 5 years I would be a lot more opposed
A typical 25 year old man in the United States has around a 4.3% chance of dying before they turn 45 according to these actuarial statistics from 2019 (the most recent non-pandemic year in the data). I wouldn’t exactly call that “pretty damn low”, though opinions on these things differ. This is comparable to my personal credence that AIs will kill me in the next 20 years. And if AI goes well, it will probably make life really awesome. So from this narrowly selfish point of view I’m still not really convinced pausing is worth it.
Perhaps more importantly: do you not have any old family members that you care about?
4% is higher than I thought! Presumably much of that is people who had pre-existing conditions which I don’t or people who got into eg a car accidents which AI probably somewhat reduces, but this seems a lot more complicated and indirect to me.
But this isn’t really engaging with my cruxes. it seems pretty unlikely to me that we will pause until we have pretty capable and impressive AIs and to me much of the non-doom scenarios comes from uncertainty about when we will get powerful ai and how capable it will be. And I expect this to be much clearer the closer we get to these systems, or at the very least the empirical uncertainty about whether it’ll happen will be a lot clearer. I would be very surprised if there was the political will to do anything about this before we got a fair bit closer to the really scary systems.
And yep, I totally put more than 4% chance that I get killed by AI in the next 20 years. But I can see this is a more controversial belief and one that requires higher standards of evidence to argue for. If I imagine a hypothetical world where I know that in 2 years we could have aligned super intelligent AI with 98% probability and it would kill us all with 2% probability. Or we could pause for 20 years and that would get it from 98 to 99%, then I guess from a selfish perspective I can kind of see your point. But I know I do value humanity not going extinct a fair amount even if I think that total utilitarianism is silly. But I observe that I’m finding this debate kind of slippery and I’m afraid that I’m maybe moving the goalposts here because I disagree on many counts so it’s not clear what exactly my cruxes are, or where I’m just attacking points in what you say that seem off
I do think that the title of your post is broadly reasonable though. I’m an advocate for making AI x-risk cases that are premised on common sense morality like “human extinction would be really really bad”, and utilitarianism in the true philosophical sense is weird and messy and has pathological edge cases and isn’t something that I fully trust in extreme situations
I think what you’re saying about your own personal tradeoffs makes a lot of sense. Since I think we’re in agreement on a bunch of points here, I’ll just zero in on your last remark, since I think we still might have an important lingering disagreement:
I do think that the title of your post is broadly reasonable though. I’m an advocate for making AI x-risk cases that are premised on common sense morality like “human extinction would be really really bad”, and utilitarianism in the true philosophical sense is weird and messy and has pathological edge cases and isn’t something that I fully trust in extreme situations
I’m not confident, but I suspect that your perception of what common sense morality says is probably a bit inaccurate. For example, suppose you gave people the choice between the following scenarios:
In scenario A, their lifespan, along with the lifespans of everyone currently living, would be extended by 100 years. Everyone in the world would live for 100 years in utopia. At the end of this, however, everyone would peacefully and painlessly die, and then the world would be colonized by a race of sentient aliens.
In scenario B, everyone would receive just 2 more years to live. During this 2 year interval, life would be hellish and brutal. However, at the end of this, everyone would painfully die and be replaced by a completely distinct set of biological humans, ensuring that the human species is preserved.
In scenario A, humanity goes extinct, but we have a good time for 100 years. In scenario B, humanity is preserved, but we all die painfully in misery.
I suspect most people would probably say that scenario A is far preferable to scenario B, despite the fact that in scenario A, humanity goes extinct.
To be clear, I don’t think this scenario is directly applicable to the situation with AI. However, I think this thought experiment suggests that, while people might have some preference for avoiding human extinction, it’s probably not anywhere near the primary thing that people care about.
Based on people’s revealed preferences (such as how they spend their time, and who they spend their money on), most people care a lot about themselves and their family, but not much about the human species as an abstract concept that needs to be preserved. In a way, it’s probably the effective altruist crowd that is unusual in this respect by caring so much about human extinction, since most people don’t give the topic much thought at all.
This got me curious so I had deep research make me a report on my probability of dying from different causes. It estimates that in the next 20 years I’ve maybe a 1 and 1⁄2 to 3% Chance of death, of which 0.5-1% is chronic illness where it’ll probably help a lot. Infectious diseases is less than .1%, Doesn’t really matter. Accidents are .5 to 1%, AI probably helps but kind of unclear. .5 to 1% on other, mostly suicide. Plausibly AI also leads to substantially improved mental health treatments which helps there? So yeah, I buy that having AGI today Vs in twenty years has small but non trivial costs to my chances of being alive when it happens
You know these arguments! You have heard them hundreds of times. Humans care about many things. Sometimes we collapse that into caring about experience for simplicity.
AIs will probably not care about the same things, as such, the universe will be worse by our lights if controlled by AI civilizations. We don’t know what exactly those things are, but the only pointer to our values that we have is ourselves, and AIs will not share those pointers.
I think your response largely assumes a human-species-centered viewpoint, rather than engaging with my critique that is precisely aimed at re-evaluating this very point of view.
You say, “AIs will probably not care about the same things, so the universe will be worse by our lights if controlled by AI.” But what are “our lights” and “our values” in this context? Are you referring to the values of me as an individual, the current generation of humans, or humanity as a broad, ongoing species-category? These are distinct—and often conflicting—sets of values, preferences, and priorities. It’s possible, indeed probable, that I, personally, have preferences that differ fundamentally from the majority of humans. “My values” are not the same as “our values”.
When you talk about whether an AI civilization is “better” or “worse,” it’s crucial to clarify what perspective we’re measuring that from. If, from the outset, we assume that human values, or the survival of humanity-as-a-species, is the critical factor that determines whether an AI civilization is better or worse than our own, that effectively begs the question. It merely assumes what I aim to challenge. From a more impartial standpoint, the mere fact that AI might not care about the exact same things humans do doesn’t necessarily entail a decrease in total impartial moral value—unless we’ve already decided in advance that human values are inherently more important.
(To make this point clearer, perhaps replace all mentions of “human values” with “North American values” in the standard arguments about these issues, and see if it makes these arguments sound like they privilege an arbitrary category of beings.)
While it’s valid to personally value the continuation of the human species, or the preservation of human values, as a moral preference above other priorities, my point is simply that that’s precisely the species-centric assumption I’m highlighting, rather than a distinct argument that undermines my observations or analysis. Such a perspective is not substrate or species-neutral. Nor is it obviously mandated by a strictly utilitarian framework; it’s an extra premise that privileges the category “humankind” for its own sake. You may believe that such a preference is natural or good from your own perspective, but that is not equivalent to saying that it is the preference of an impartial utilitarian, who would, in theory, make no inherent distinction based purely on species, or substrate.
Are you assuming some kind of moral realism here? That there’s some deep moral truth, humans may or may not have insight into it, so any other intelligent entity is equally likely to?
If so, idk, I just reject your premise. I value what I chose to value, which is obviously related to human values, and an arbitrary sampled entity is not likely to be better on that front
Yeah, this.
From my perspective “caring about anything but human values” doesn’t make any sense. Of course, even more specifically, “caring about anything but my own values” also doesn’t make sense, but in as much as you are talking to humans, and making arguments about what other humans should do, you have to ground that in their values and so it makes sense to talk about “human values”.
The AIs will not share the pointer to these values, in the same way as every individual does to their own values, and so we should a-priori assume the AI will do worse things after we transfer all the power from the humans to the AIs.
Let’s define “shumanity” as the set of all humans who are currently alive. Under this definition, every living person today is a “shuman,” but our future children may not be, since they do not yet exist. Now, let’s define “humanity” as the set of all humans who could ever exist, including future generations. Under this broader definition, both we and our future children are part of humanity.
If all currently living humans (shumanity) were to die, this would be a catastrophic loss from the perspective of shuman values—the values held by the people who are alive today. However, it would not necessarily be a catastrophic loss from the perspective of human values—the values of humanity as a whole, across time. This distinction is crucial. In the normal course of events, every generation eventually grows old, dies, and is replaced by the next. When this happens, shumanity, as defined, ceases to exist, and as such, shuman values are lost. However, humanity continues, carried forward by the new generation. Thus, human values are preserved, but not shuman values.
Now, consider this in the context of AI. Would the extinction of shumanity by AIs be much worse than the natural generational cycle of human replacement? In my view, it is not obvious that being replaced by AIs would be much worse than being replaced by future generations of humans. Both scenarios involve the complete loss of the individual values held by currently living people, which is undeniably a major loss. To be very clear, I am not saying that it would be fine if everyone died. But in both cases, something new takes our place, continuing some form of value, mitigating part of the loss. This is the same perspective I apply to AI: its rise might not necessarily be far worse than the inevitable generational turnover of humans, which equally involves everyone dying (which I see as a bad thing!). Maybe “human values” would die in this scenario, but this would not necessarily entail the end of the broader concept of impartial utilitarian value. This is precisely my point.
I think the answer to this is “yes”, because your shared genetics and culture create much more robust pointers to your values than we are likely to get with AI.
Additionally, even if that wasn’t true, humans alive at present have obligations inherited from the past and relatedly obligations to the future. We have contracts and inheritance principles and various things that extend our moral circle of concern beyond just the current generation. It is not sufficient to coordinate with just the present humans, we are engaging in at least some moral trade with future generations, and trading away their influence to AI systems is also not something we have the right to do.
(Importantly, I think we have many fewer such obligations to very distant generations, since I don’t think we are generally borrowing or coordinating with humans living in the far future very much).
Look, this sentence just really doesn’t make any sense to me. From the perspective of humanity, which is composed of many humans, of course the fact that AI does not care about the same things as humans creates a strong presumption that a world optimized for those values will be worse than a world optimized for human values. Yes, current humans are also limited to what degree we successfully can delegate the fulfillment of our values to future generations, but we also just share, on-average, a huge fraction of our values with future generations. That is a struggle every generation faces, and you are just advocating for… total defeat being fine for some reason? Yes, it would be terrible if the next generation of humans suddenly did not care about almost anything I cared about, but that is very unlikely to happen, but it is quite likely to happen with AI systems.
Because there is a much higher correlation between the value of the current generation of humans and the next one than there is between the values of humans and arbitrary AI entities
I’m not talking about “arbitrary AI entities” in this context, but instead, the AI entities who will actually exist in the future, who will presumably be shaped by our training data, as well as our training methods. From this perspective, it’s not clear to me that your claim is true. But even if your claim is true, I was actually making a different point. My point was instead that it isn’t clear that future generations of AIs would be much worse than future generations of humans from an impartial utilitarian point of view.
(That said, it sounds like the real crux between us might instead be about whether pausing AI would be very costly to people who currently exist. If indeed you disagree with me about this point, I’d prefer you reply to my other comment rather than replying to this one, as I perceive that discussion as likely to be more productive.)
I don’t subscribe to moral realism. My own ethical outlook is a blend of personal attachments—my own life, my family, my friends, and other living humans—as well as a broader utilitarian concern for overall well-being. In this post, I focused on impartial utilitarianism because that’s the framework most often used by effective altruists.
However, to the extent that I also have non-utilitarian concerns (like caring about specific people I know), those concerns incline me away from supporting a pause on AI. If AI can accelerate technologies that save and improve the lives of people who exist right now, then slowing it down would cost lives in the near term. A more complete, and more rigorous version of this argument was outlined in the post.
What I find confusing about other EA’s views, including yours, is why we would assign such great importance to “human values” as something specifically tied to the human species as an abstract concept, rather than merely being partial to actual individuals who exist. This perspective is neither utilitarian, nor is it individualistic. It seems to value the concept of the human species over and above the actual individuals that comprise the species, much like how an ideological nationalist might view the survival of their nation as more important than the welfare of all the individuals who actually reside within the nation.
For your broader point of impartiality, I feel like you are continuing to assume some bizarre form of moral realism and I don’t understand the case. Otherwise, why do you not consider rocks to be morally meaningful? Why is a plant not valuable? I can come up with reasons, but these are assuming specific things about what is and is not morally valuable in exactly the same way that when I say arbitrary AI beings are on average substantially less valuable because I have specific preferences and values over what matters. I do not understand the philosophical position you are taking here—it feels like you’re saying that the standard position is speciesist and arbitrary and then drawing an arbitrary distinction slightly further out?
Traditionally, utilitarianism regards these things (rocks and plants) as lacking moral value because they do not have well-being or preferences. This principle does not clearly apply to AI, though it’s possible that you are making the assumption that future AIs will lack sentience or meaningful preferences. It would be helpful if you clarified how you perceive me to be assuming a form of moral realism (a meta-ethical theory), as I simply view myself as applying a standard utilitarian framework (a normative theory).
Standard utilitarianism recognizes both morally relevant and morally irrelevant distinctions in value. According to a long tradition, following Jeremy Bentham and Peter Singer, among others, the species category is considered morally irrelevant, whereas sentience and/or preferences are considered morally relevant. I do not think this philosophy rests on the premise of moral realism: rather, it’s a conceptual framework for understanding morality, whether from a moral realist or anti-realist point of view.
To be clear, I agree that utilitarianism is itself arbitrary, from a sufficiently neutral point of view. But it’s also a fairly standard ethical framework, not just in EA but in academic philosophy too. I don’t think I’m making very unusual assumptions here.
Ah! Thanks for clarifying—if I understand correctly, you think that it’s reasonable to assert that sentience and preferences are what makes an entity morally meaningful, but that anything more specific is not? I personally just disagree with that premise, but I can see where you’re coming from
But in that case, it’s highly non obvious to me that AIs will have sentience or preferences in ways that I consider meaningful—this seems like an open philosophical question. Defining actually what they are also seems like an open question to me—does a thermostat have preferences? Does a plant that grows towards the light? While I do feel fairly confident humans are morally meaningful. Is your argument that even if there’s a good chance they’re not morally meaningful, the expected amount of moral significance is comparable to humans?
I don’t think there’s any moral view that’s objectively more “reasonable” than any other moral view (as I’m a moral anti-realist). However, I personally don’t have a significant moral preference for humans beyond the fact that I am partial to my family, friends, and a lot of other people who are currently alive. When I think about potential future generations who don’t exist yet, I tend to adopt a more impartial, utilitarian framework.
In other words, my moral views can be summarized as a combination of personal attachments and broader utilitarian moral concerns. My personal attachments are not impartial: for example, I care about my family more than I care about random strangers. However, beyond my personal attachments, I tend to take an impartial utilitarian approach that doesn’t assign any special value to the human species.
In other words, to the extent I care about humans specifically, this concern merely arises from the fact that I’m attached to some currently living individuals who happen to be human—rather than because I think the human species is particularly important.
Does that make sense?
I agree this is an open question, but I think it’s much clearer that future AIs will have complex and meaningful preferences compared to a thermostat or a plant. I think we can actually be pretty confident about this prediction given the strong economic pressures that will push AIs towards being person-like and agentic. (Note, however, that I’m not making a strong claim here that all AIs will be moral patients in the future. It’s sufficient for my argument if merely a large number of them are.)
In fact, a lot of arguments for AI risk rest on the premise that AI agents will exist in the future, and that they’ll have certain preferences (at least in a functional sense). If we were to learn that future AIs won’t have preferences, that would both undermine these arguments for AI risk, and many of my moral arguments for valuing AIs. Therefore, to the extent you think AIs will lack the cognitive prerequisites for moral patienthood—under my functionalist and preference utilitarian views—this doesn’t necessarily translate into a stronger case for worrying about AI takeover.
However, I want to note that the view I have just described is actually broader than the thesis I gave in the post. If you read my post carefully, you’ll see that I actually hedged quite a bit by saying that there are potential, logically consistent utilitarian arguments that could be made in favor of pausing AI. My thesis in the post was not that such an argument couldn’t be given. It was actually a fairly narrow thesis, and I didn’t make a strong claim that AI-controlled futures would create about as much utilitarian moral value as human-controlled futures in expectation (even though I personally think this claim is plausible).
I think that even the association between functional agency and preferences in a morally valuable sense is an open philosophical question that I am not happy taking as a given.
Regardless, it seems like our underlying crux is that we assign utility to different things. I somewhat object to you saying that your version of this is utilitarianism and notions of assigning utility that privilege things humans value are not
I agree that our main point of disagreement seems to be about what we ultimately care about.
For what it’s worth, I didn’t mean to suggest in my post that my moral perspective is inherently superior to others. For example, my argument is fully compatible with someone being a deontologist. My goal was simply to articulate what I saw standard impartial utilitarianism as saying in this context, and to point out how many people’s arguments for AI pause don’t seem to track what standard impartial utilitarianism actually says. However, this only matters insofar as one adheres to that specific moral framework.
As a matter of terminology, I do think that the way I’m using the words “impartial utilitarianism” aligns more strongly with common usage in academic philosophy, given the emphasis that many utilitarians have placed on antispeciesist principles. However, even if you think I’m wrong on the grounds of terminology, I don’t think this disagreement subtracts much from the substance of my post as I’m simply talking about the implications of a common moral theory (regardless of whatever we choose to call it).
Thanks for clarifying. In that case I think that we broadly agree
Huh? This argument only goes through if you have a sufficiently low probability of existential risk or an extremely low change in your probability of existential risk, conditioned on things moving slower. I disagree with both of these assumptions. Which part of your post are you referring to?
This claim seems false, though its truth hinges on what exactly you mean by a “sufficiently low probability of existential risk” and “an extremely low change in your probability of existential risk”.
To illustrate why I think your claim is false, I’ll perform a quick calculation. I don’t know your p(doom), but in a post from three years ago, you stated,
Let’s assume that there’s a 2% chance of AI causing existential risk, and that, optimistically, pausing for a decade would cut this risk in half (rather than barely decreasing it, or even increasing it). This would imply that the total risk would diminish from 2% to 1%.
According to OWID, approximately 63 million people die every year, although this rate is expected to increase, rising to around 74 million in 2035. If we assume that around 68 million people will die per year during the relevant time period, and that they could have been saved by AI-enabled medical progress, then pausing AI for a decade would kill around 680 million people.
This figure is around 8.3% of the current global population, and would constitute a death count higher than the combined death toll from World War 1, World War 2, the Mongol Conquests, the Taiping rebellion, the Transition from Ming to Qing, and the Three Kingdoms Civil war.
(Note that, although we are counting deaths from old age in this case, these deaths are comparable to deaths in war from a years of life lost perspective, if you assume that AI-accelerated medical breakthroughs will likely greatly increase human lifespan.)
From the perspective of an individual human life, a 1% chance of death from AI is significantly lower than a 8.3% chance of death from aging—though obviously in the former case this risk would apply independently of age, and in the latter case, the risk would be concentrated heavily among people who are currently elderly.
Even a briefer pause lasting just two years, while still cutting risk in half, would not survive this basic cost-benefit test. Of course, it’s true that it’s difficult to directly compare the individual personal costs from AI existential risk to the diseases of old age. For example, AI existential risk has the potential to be briefer and less agonizing, which, all else being equal, should push us to favor it. On the other hand, most people might consider death from old age to be preferable since it’s more natural and allows the human species to continue.
Nonetheless, despite these nuances, I think the basic picture that I’m presenting holds up here: under typical assumptions (such as the ones you gave three years ago), a purely individualistic framing of the costs and benefits of AI pause do not clearly favor pausing, from the perspective of people who currently exist. This fact was noted in Nick Bostrom’s original essay on Astronomical Waste, and more recently, by Chad Jones in his paper on the tradeoffs involved in stopping AI development.
Ah, gotcha. Yes, I agree that if your expected reduction in p(doom) is less than around 1% per year of pause, and you assign zero value to future lives, then pausing is bad on utilitarian grounds
Note that my post was not about my actual numerical beliefs, but about a lower bound that I considered highly defensible—I personally expect notably higher than 1%/year reduction and was taking that as given, but on reflection I at least agree that that’s a more controversial belief (I also think that a true pause is nigh impossible)
I expect there are better solutions that achieve many of the benefits of pausing while still enabling substantially better biotech research, but that’s nitpicking
I’m not super sure what you mean by individualistic. I was modelling this as utilitarian but assigning literally zero value to future people. From a purely selfish perspective, I’m in my mid-20s and my chances of dying from natural causes in the next say 20 years are pretty damn low, and this means that given my background beliefs about doom and timelines, slowing down AI is great deal from my perspective. While if I expected to die from old age in the next 5 years I would be a lot more opposed
A typical 25 year old man in the United States has around a 4.3% chance of dying before they turn 45 according to these actuarial statistics from 2019 (the most recent non-pandemic year in the data). I wouldn’t exactly call that “pretty damn low”, though opinions on these things differ. This is comparable to my personal credence that AIs will kill me in the next 20 years. And if AI goes well, it will probably make life really awesome. So from this narrowly selfish point of view I’m still not really convinced pausing is worth it.
Perhaps more importantly: do you not have any old family members that you care about?
4% is higher than I thought! Presumably much of that is people who had pre-existing conditions which I don’t or people who got into eg a car accidents which AI probably somewhat reduces, but this seems a lot more complicated and indirect to me.
But this isn’t really engaging with my cruxes. it seems pretty unlikely to me that we will pause until we have pretty capable and impressive AIs and to me much of the non-doom scenarios comes from uncertainty about when we will get powerful ai and how capable it will be. And I expect this to be much clearer the closer we get to these systems, or at the very least the empirical uncertainty about whether it’ll happen will be a lot clearer. I would be very surprised if there was the political will to do anything about this before we got a fair bit closer to the really scary systems.
And yep, I totally put more than 4% chance that I get killed by AI in the next 20 years. But I can see this is a more controversial belief and one that requires higher standards of evidence to argue for. If I imagine a hypothetical world where I know that in 2 years we could have aligned super intelligent AI with 98% probability and it would kill us all with 2% probability. Or we could pause for 20 years and that would get it from 98 to 99%, then I guess from a selfish perspective I can kind of see your point. But I know I do value humanity not going extinct a fair amount even if I think that total utilitarianism is silly. But I observe that I’m finding this debate kind of slippery and I’m afraid that I’m maybe moving the goalposts here because I disagree on many counts so it’s not clear what exactly my cruxes are, or where I’m just attacking points in what you say that seem off
I do think that the title of your post is broadly reasonable though. I’m an advocate for making AI x-risk cases that are premised on common sense morality like “human extinction would be really really bad”, and utilitarianism in the true philosophical sense is weird and messy and has pathological edge cases and isn’t something that I fully trust in extreme situations
I think what you’re saying about your own personal tradeoffs makes a lot of sense. Since I think we’re in agreement on a bunch of points here, I’ll just zero in on your last remark, since I think we still might have an important lingering disagreement:
I’m not confident, but I suspect that your perception of what common sense morality says is probably a bit inaccurate. For example, suppose you gave people the choice between the following scenarios:
In scenario A, their lifespan, along with the lifespans of everyone currently living, would be extended by 100 years. Everyone in the world would live for 100 years in utopia. At the end of this, however, everyone would peacefully and painlessly die, and then the world would be colonized by a race of sentient aliens.
In scenario B, everyone would receive just 2 more years to live. During this 2 year interval, life would be hellish and brutal. However, at the end of this, everyone would painfully die and be replaced by a completely distinct set of biological humans, ensuring that the human species is preserved.
In scenario A, humanity goes extinct, but we have a good time for 100 years. In scenario B, humanity is preserved, but we all die painfully in misery.
I suspect most people would probably say that scenario A is far preferable to scenario B, despite the fact that in scenario A, humanity goes extinct.
To be clear, I don’t think this scenario is directly applicable to the situation with AI. However, I think this thought experiment suggests that, while people might have some preference for avoiding human extinction, it’s probably not anywhere near the primary thing that people care about.
Based on people’s revealed preferences (such as how they spend their time, and who they spend their money on), most people care a lot about themselves and their family, but not much about the human species as an abstract concept that needs to be preserved. In a way, it’s probably the effective altruist crowd that is unusual in this respect by caring so much about human extinction, since most people don’t give the topic much thought at all.
This got me curious so I had deep research make me a report on my probability of dying from different causes. It estimates that in the next 20 years I’ve maybe a 1 and 1⁄2 to 3% Chance of death, of which 0.5-1% is chronic illness where it’ll probably help a lot. Infectious diseases is less than .1%, Doesn’t really matter. Accidents are .5 to 1%, AI probably helps but kind of unclear. .5 to 1% on other, mostly suicide. Plausibly AI also leads to substantially improved mental health treatments which helps there? So yeah, I buy that having AGI today Vs in twenty years has small but non trivial costs to my chances of being alive when it happens