A reflection on the posts I have written in the last few months, elaborating on my views
In a series of recent posts, I have sought to challenge the conventional view among longtermists that prioritizes the empowerment or preservation of the human species as the chief goal of AI policy. It is my opinion that this view is likely rooted in a bias that automatically favors human beings over artificial entities—thereby sidelining the idea that future AIs might create equal or greater moral value than humans—and treating this alternative perspective with unwarranted skepticism.
I recognize that my position is controversial and likely to remain unpopular among effective altruists for a long time. Nevertheless, I believe it is worth articulating my view at length, as I see it as a straightforward application of standard, common-sense utilitarian principles that merely lead to an unpopular conclusion. I intend to continue elaborating on my arguments in the coming months.
My view follows from a few basic premises. First, that future AI systems are quite likely to be moral patients; second, that we shouldn’t discriminate against them based on arbitrary distinctions, such as their being instantiated on silicon rather than carbon, or having been created through deep learning rather than natural selection. If we insist on treating AIs fundamentally differently from a human child or adult—for example, by regarding them merely as property to be controlled or denying them the freedom to pursue their own goals—then we should identify a specific ethical reason for our approach that goes beyond highlighting their non-human nature.
Many people have argued that consciousness is the key quality separating humans from AIs, thus rendering any AI-based civilization morally insignificant compared to ours. They maintain that consciousness has relatively narrow boundaries, perhaps largely confined to biological organisms, and would only arise in artificial systems under highly specific conditions—for instance, if one were to emulate a human mind in digital form. While I acknowledge that this perspective is logically coherent, I find it deeply unconvincing. The AIs I am referring to when I write about this topic would almost certainly not be simplistic, robotic automatons; rather, they would be profoundly complex, sophisticated entities whose cognitive abilities rival or exceed those of the human brain. For anyone who adopts a functionalist view of consciousness, it seems difficult to be confident that such AIs would lack a rich inner experience.
Because functionalism and preference utilitarianism—both of which could grant moral worth to AI preferences even if they do not precisely replicate biological states—have at least some support within the EA community, I remain hopeful that, if I articulate my position clearly, EAs who share these philosophical assumptions will see its merits.
That said, I am aware that explaining this perspective is an uphill battle. The unpopularity of my views often makes it difficult to communicate without instant misunderstandings; critics seem to frequently conflate my arguments with other, simpler positions that can be more easily dismissed. At times, this has caused me to feel as though the EA community is open to only a narrow range of acceptable ideas. This reaction, while occasionally frustrating, does not surprise me, as I have encountered similar resistance when presenting other unpopular views—such as challenging the ethics of purchasing meat in social contexts where such concerns are quickly deemed absurd.
However, the unpopularity of these ideas also creates a benefit: it creates room for rapid intellectual progress by opening the door to new and interesting philosophical questions about AI ethics. If we free ourselves from the seemingly unquestionable premise that preserving the human species should be the top priority when governing AI development, we can begin to ask entirely new and neglected questions about the role of artificial minds in society.
These questions include: what social and legal frameworks should we pursue if AIs are seen not as dangerous tools to be contained but as individuals on similar moral footing with humans? How do we integrate AI freedom and autonomy into our vision of the future, creating the foundation for a system of ethical and pragmatic AI rights?
Under this alternative philosophical approach, policy would not focus solely on minimizing risks to humanity. Instead, it would emphasize cooperation and inclusion, seeing advanced AI as a partner rather than an ethical menace to be tightly restricted or controlled. This undoubtedly requires a significant shift in our longtermist thinking, demanding a re-examination of deeply rooted assumptions. Such a project cannot be completed overnight, but given the moral stakes and the rapid progress in AI, I view this philosophical endeavor as both urgent and exciting. I invite anyone open to rethinking these foundational premises to join me in exploring how we might foster a future in which AIs and humans coexist as moral peers, cooperating for mutual benefit rather than viewing each other as intrinsic competitors locked in an inevitable zero-sum fight.
I think it’s interesting to assess how popular or unpopular these views are within the EA community. This year and last year, we asked people in the EA Survey about the extent to which they agreed or disagreed that:
Most expected value in the future comes from digital minds’ experiences, or the experiences of other nonbiological entities.
This year about 47% (strongly or somewhat) disagreed, while 22.2% agreed (roughly a 2:1 ratio).
However, among people who rated AI risks a top priority, respondents leaned towards agreement, with 29.6% disagreeing and 36.6% agreeing (a 0.8:1 ratio).[1]
Similarly, among the most highly engaged EAs, attitudes were roughly evenly split between 33.6% disagreement and 32.7% agreement (1.02:1), with much lower agreement among everyone else.
This suggests to me that the collective opinion of EAs, among those who strongly prioritise AI risks and the most highly engaged is not so hostile to digital minds. Of course, for practical purposes, what matters most might be the attitudes of a small number of decisionmakers, but I think the attitudes of the engaged EAs matters for epistemic reasons.
Interestingly, among people who merely rated AI risks a near-top priority, attitudes towards digital minds were similar to the sample as a whole. Lower prioritisation of AI risks were associated with yet lower agreement with the digital minds item.
I haven’t read your other recent comments on this, but here’s a question on the topic of pausing AI progress. (The point I’m making is similar to what Brad West already commented.)
Let’s say we grant your assumptions (that AIs will have values that matter the same as or more than human values and that an AI-filled future would be just as or more morally important than one with humans in control). Wouldn’t it still make sense to pause AI progress at this important junction to make sure we study what we’re doing so we can set up future AIs to do as well as (reasonably) possible?
You say that we shouldn’t be confident that AI values will be worse than human values. We can put a pin in that. But values are just one feature here. We should also think about agent psychologies and character traits and infrastructure beneficial for forming peaceful coalitions. On those dimensions, some traits or setups seem (somewhat robustly?) worse than others?
We’re growing an alien species that might take over from humans. Even if you think that’s possibly okay or good, wouldn’t you agree that we can envision factors about how AIs are built/trained and about what sort of world they are placed in that affect whether the future will likely be a lot better or a lot worse?
I’m thinking about things like:
pro-social insctincts (or at least absence of anti-social ones)
more general agent character traits that do well/poorly at forming peaceful coalitions
agent infrastructure to help with coordinating (e.g., having better lie detectors, having a reliable information environment or starting out under the chaos of information warfare, etc.)
initial strategic setup (being born into AI-vs-AI competition vs. being born in a situation where the first TAI can take to proceed slowly and deliberately)
maybe: decision-theoretic setup to do well in acausal interactions with other parts of the multiverse (or at least not do particularly poorly)
If (some of) these things are really important, wouldn’t it make sense to pause and study this stuff until we know whether some of these traits are tractable to influence?
(And, if we do that, we might as well try to make AIs have the inclination to be nice to humans, because humans already exist, so anything that kills humans who don’t want to die frustrates already-existing life goals, which seems worse than frustrating the goals of merely possible beings.)
I know you don’t talk about pausing in your above comment—but I think I vaguely remember you being skeptical of it in other comments. Maybe that was for different reasons, or maybe you just wanted to voice disagreement with the types of arguments people typically give in favor of pausing?
FWIW, I totally agree with the position that we should respect the goals of AIs (assuming they’re not just roleplayed stated goals but deeply held ones—of course, this distinction shouldn’t be uncharitably weaponized against AIs ever being considered to have meaningful goals). I’m just concerned because whether the AIs respect ours in turn, especially when they find themselves in a position where they could easily crush us, will probably depend on how we build them.
In your comment, you raise a broad but important question about whether, even if we reject the idea that human survival must take absolute priority other concerns, we might still want to pause AI development in order to “set up” future AIs more thoughtfully. You list a range of traits—things like pro-social instincts, better coordination infrastructures, or other design features that might improve cooperation—that, in principle, we could try to incorporate if we took more time. I understand and agree with the motivation behind this: you are asking whether there is a prudential reason, from a more inclusive moral standpoint, to pause in order to ensure that whichever civilization emerges—whether dominated by humans, AIs, or both at once—turns out as well as possible in ways that matter impartially, rather than focusing narrowly on preserving human dominance.
Having summarized your perspective, I want to clarify exactly where I differ from your view, and why.
First, let me restate the perspective I defended in my previous post on delaying AI. In that post, I was critiquing what I see as the “standard case” for pausing AI, as I perceive it being made in many EA circles. This standard case for pausing AI often treats preventing human extinction as so paramount that any delay of AI progress, no matter how costly to currently living people, becomes justified if it incrementally lowers the probability of humans losing control.
Under this argument, the reason we want to pause is that time spent on “alignment research” can be used to ensure that future AIs share human goals, or at least do not threaten the human species. My critique had two components: first, I argued that pausing AI is very costly to people who currently exist, since it delays medical and technological breakthroughs that could be made by advanced AIs, thereby forcing a lot of people to die who could have otherwise been saved. Second, and more fundamentally, I argued that this “standard case” seems to rest on an assumption of strictly prioritizing human continuity, independent of whether future AIs might actually generate utilitarian moral value in a way that matches or exceeds humanity.
I certainly acknowledge that one could propose a different rationale for pausing AI, one which does not rest on the premise that preserving the human species is intrinsically worth more than than other moral priorities. This is, it seems, the position you are taking.
Nonetheless, I don’t find your considerations compelling for a variety of reasons.
To begin with, it might seem that granting ourselves “more time” robustly ensures that AIs come out morally better—pro-social, cooperative, and so on. Yet the connection between “getting more time” to “achieving positive outcomes” does not seem straightforward. Merely taking more time does not ensure that this time will be used to increase, rather than decrease, the relevant quality of AI systems according to an impartial moral view. Alignment with human interests, for example, could just as easily push systems in directions that entrench specific biases, maintain existing social structures, or limit moral diversity—none of which strongly aligns with the “pro-social” ideals you described. In my view, there is no inherent link between a slower timeline and ensuring that AIs end up embodying genuinely virtuous or impartial ethical principles. Indeed, if what we call “human control” is mainly about enforcing the status quo or entrenching the dominance of the human species, it may be no better—and could even be worse—than a scenario in which AI development proceeds at the default pace, potentially allowing for more diversity and freedom in how systems are shaped.
Furthermore, in my own moral framework—which is heavily influenced by preference utilitarianism—I take seriously the well-being of everyone who currently exists in the present. As I mentioned previously, one major cost to pausing AI is that it would likely postpone many technological benefits. These might include breakthroughs in medicine—potential cures for aging, radical extensions of healthy lifespans, or other dramatic increases to human welfare that advanced AI could accelerate. We should not simply dismiss the scale of that cost. The usual EA argument for downplaying these costs rests on the Astronomical Waste argument. However, I find this argument flawed, and I spelled out exactly why I found this argument flawed in the post I just wrote.
If a pause sets back major medical discoveries by even a decade, that delay could contribute to the premature deaths of around a billion people alive today. It seems to me that an argument in favor of pausing should grapple with this tradeoff, instead of dismissing it as clearly unimportant compared to the potential human lives that could maybe exist in the far future. Such a dismissal would seem both divorced from common sense concern for existing people, and divorced from broader impartial utilitarian values, as it would prioritize the continuity of the human species above and beyond species-neutral concerns for individual well-being.
Finally, I take very seriously the possibility that pausing AI would cause immense and enduring harm by requiring the creation of vast regulatory controls over society. Realistically, the political mechanisms by which we “pause” advanced AI development would likely involve a lot of coercion, surveillance and social control, particularly as AI starts becoming an integral part of our economy. These efforts are likely to expand state regulatory powers, hamper open competition, and open the door to a massive intrusion of state interference in economic and social activity. I believe these controls would likely be far more burdensome and costly than, for example, our controls over nuclear weapons. If our top long-term priority is building a more free, prosperous, inclusive, joyous, and open society for everyone, rather than merely to control and stop AI, then it seems highly questionable that creating the policing powers required to pause AI is the best way to achieve this objective.
As I see it, the core difference between the view you outlined and mine is not that I am ignoring the possibility that we might “do better” by carefully shaping the environment in which AIs arise. I concede that if we had a guaranteed mechanism to spend a known, short period of time intentionally optimizing how AIs are built, without imposing any other costs in the meantime, that might bring some benefits. However, my skepticism flows from the actual methods by which such a pause would come about, its unintended consequences on liberty, the immediate harms it imposes on present-day people by delaying technological progress, and the fact that it might simply entrench a narrower or more species-centric approach that I explicitly reject. It is not enough to claim that “pausing gives us more time”, suggesting that “more time” is robustly a good thing. One must argue why that time will be spent well, in a way that outweighs the enormous and varied costs that I believe are incurred by pausing AI.
To be clear, I am not opposed to all forms of regulation. But I tend to prefer more liberal approaches, in the sense of classical liberalism. I prefer strategies that try to invite AIs into a cooperative framework, giving them legal rights and a path to peaceful integration—coupled, of course, with constraints on any actor (human or AI) who threatens to commit violence. This, in my view, simply seems like a far stronger foundation for AI policy than a stricter top-down approach in which we halt all frontier AI progress, and establish the associated sweeping regulatory powers required to enforce such a moratorium.
I think it’s interesting and admiral that you’re dedicated on a position that’s so unusual in this space.
I assume I’m in the majority here that my intuitions are quite different from yours, however.
One quick point when we’re here: > this view is likely rooted in a bias that automatically favors human beings over artificial entities—thereby sidelining the idea that future AIs might create equal or greater moral value than humans—and treating this alternative perspective with unwarranted skepticism.
I think that a common, but perhaps not well vocalized, utilitarian take is that humans don’t have much of a special significance in terms of creating well-being. The main option would be a much more abstract idea, some kind of generalization of hedonium or consequentialism-ium or similar. For now, let’s define hedonium as “the ideal way of converting matter and energy into well-being, after a great deal of deliberation.”
As such, it’s very tempting to try to separate concerns and have AI tools focus on being great tools, and separately optimize hedonium to be efficient at being well-being. While I’m not sure if AIs would have zero qualia, I’d feel a lot more confident that they will have dramatically less qualia per unit resources than a much more optimized substrate.
If one follows this general logic, then one might assume that it’s likely that the vast majority of well-being in the future would exist as hedonium, not within AIs created to ultimately make hedonium.
One less intense formulation would be to have both AIs and humans focus only on making sure we get to the point where we much better understand the situation with qualia and hedonium (a la the Long Reflection), and then re-evaluate. In my strategic thinking around AI I’m not particularly optimizing for the qualia of the humans involved in the AI labs or the relevant governments. Similarly, I’d expect not to optimize hard for the qualia in the early AIs, in the period when we’re unsure about qualia and ethics, even if I thought they might have experiences. I would be nervous if I thought this period could involve AIs having intense suffering or be treated in highly immoral ways.
I think for me, part of the issue with your posts on this (which I think are net positive to be clear, they really push at significant weak points in ideas widely held in the community) is that you seem to be sort of vacillating between three different ideas, in a way that conceal that one of them, taken on its own sounds super-crazy and evil:
1) Actually, if AI development were to literally lead to human extinction, that might be fine, because it might lead to higher utility.
2) We should care about humans harming sentient, human-like AIs as much as we care about AIs harming humans.
3) In practice, the benefits to current people from AI development outweigh the risks, and the only moral views which say that we should ignore this and pause in the face of even tiny risks of extinction from AI because there are way more potential humans in the future, in fact, when taken seriously, imply 1), which nobody believes.
1) feels extremely bad to me, basically a sort of Nazi-style view on which genocide is fine if the replacing people are superior utility generators (or I guess, inferior but sufficiently more numerous). 1) plausibly is a consequence of classical utilitarianism (even maybe on some person-affecting versions of classical utilitarianism I think), but I take this to be a reason to reject pure classical utilitarianism, not a reason to endorse 1). 2) and 3), on the other hand, seem reasonable to me. But the thing is that you seem at least sometimes to be taking AI moral patienthood as a reason to push on in the face of uncertainty about whether AI will literally kill everyone. And that seems more like 1) than 2) or 3). 1-style reasoning supports the idea that AI moral patienthood is a reason for pushing on with AI development even in the face of human extinction risk, but as far as I can tell 2) and 3) don’t. At the same time though I don’t think you mean to endorse 1).
I realize my position can be confusing, so let me clarify it as plainly as I can: I do not regard the extinction of humanity as anything close to “fine.” In fact, I think it would be a devastating tragedy if every human being died. I have repeatedly emphasized that a major upside of advanced AI lies in its potential to accelerate medical breakthroughs—breakthroughs that might save countless human lives, including potentially my own. Clearly, I value human lives, as otherwise I would not have made this particular point so frequently.
What seems to cause confusion is that I also argue the following more subtle point: while human extinction would be unbelievably bad, it would likely not be astronomically bad in the strict sense used by the “astronomical waste” argument. The standard “astronomical waste” argument says that if humanity disappears, then all possibility for a valuable, advanced civilization vanishes forever. But in a scenario where humans die out because of AI, civilization would continue—just not with humans. That means a valuable intergalactic civilization could still arise, populated by AI rather than by humans. From a purely utilitarian perspective that counts the existence of a future civilization as extremely valuable—whether human or AI—this difference lowers the cataclysm from “astronomically, supremely, world-endingly awful” to “still incredibly awful, but not on a cosmic scale.”
In other words, my position remains that human extinction is very bad indeed—it entails the loss of eight billion individual human lives, which would be horrifying. I don’t want to be forcibly replaced by an AI. Nor do I want you, or anyone else to be forcibly replaced by an AI. I am simply pushing back on the idea that such an event would constitute the absolute destruction of all future value in the universe. There is a meaningful distinction between “an unimaginable tragedy we should try very hard to avoid” and “a total collapse of all potential for a flourishing future civilization of any kind.” My stance falls firmly in the former category.
This distinction is essential to my argument because it fundamentally shapes how we evaluate trade-offs, particularly when considering policies that aim to slow or restrict AI research. If we assume that human extinction due to AI would erase all future value, then virtually any present-day sacrifice—no matter how extreme—might seem justified to reduce that risk. However, if advanced AI could continue to sustain its own value-generating civilization, even in the absence of humans, then extinction would not represent the absolute end of valuable life. While this scenario would be catastrophic for humanity, attempting to avoid it might not outweigh certain immediate benefits of AI, such as its potential to save lives through advanced technology.
In other words, there could easily be situations where accelerating AI development—rather than pausing it—ends up being the better choice for saving human lives, even if doing so technically slightly increases the risk of human species extinction. This does not mean we should be indifferent to extinction; rather, it means we should stop treating extinction as a near-infinitely overriding concern, where even the smallest reduction in its probability is always worth immense near-term costs to actual people living today.
For a moment, I’d like to reverse the criticism you leveled at me. From where I stand, it is often those who strongly advocate pausing AI development, not myself, who can appear to undervalue the lives of humans. I know they don’t see themselves this way, and they would certainly never phrase it in those terms. Nevertheless, this is my reading of the deeper implications of their position.
A common proposition that many AI pause advocates have affirmed to me is that it very well could be worth it to pause AI, even if this led to billions of humans dying prematurely due to them missing out on accelerated medical progress that could otherwise have saved their lives. Therefore, while these advocates care deeply about human extinction (something I do not deny), their concern does not seemrooted in the intrinsic worth of the people who are alive today. Instead, their primary focus often seems to be on the loss of potential future human lives that could maybe exist in the far future—lives that do not yet even exist, and on my view, are unlikely to exist in the far future in basically any scenario, since humanity is unlikely to be preserved as a fixed, static concept over the long-run.
In my view, this philosophy neither prioritizes the well-being of actual individuals nor is it grounded in the utilitarian value that humanity actively generates. If this philosophy were purely about impartial utilitarian value, then I ask: why are they not more open to my perspective? Since my philosophy takes an impartial utilitarian approach—one that considers not just human-generated value, but also the potential value that AI itself could create—it would seem to appeal to those who simply took a strict utilitarian approach, without discriminating against artificial life arbitrarily. Yet, my philosophy largely does not appeal to those who express this view, suggesting the presence of alternative, non-utilitarian concerns.
I distinguish believing that good successor criteria are brittle from speciesism. I think antispeciesism does not oblige me to accept literally any successor.
I do feel icky coalitioning with outright speciesists (who reject the possibility of a good successor in principle), but I think my goals and all of generalized flourishing benefits a lot from those coalitions so I grin and bear it.
FWIW, I completely agree with what you’re saying here and I think that if you seriously go into consciousness research and especially for what we westerners more label as a sense of self rather than anything else it quickly becomes infeasible to hold a position that the way we’re taking AI development, e.g towards AI agents will not lead to AIs having self-models.
For all matters and purposes this encompasses most theories of physicalist or non-dual theories of consciousness which are the only feasible ones unless you want to bite some really sour apples.
There’s a classic “what are we getting wrong” question in EA and I think it’s extremely likely that we will look back in 10 years and say, “wow, what are we doing here?”.
I think it’s a lot better to think of systemic alignment and look at properties that we want for the general collective intelligences that we’re engaging in such as our information networks or our institutional decision making procedures and think of how we can optimise these for resillience and truth-seeking. If certain AIs deserve moral patienthood then that truth will naturally arise from such structures.
(hot take) Individual AI alignment might honestly be counter-productive towards this view.
A reflection on the posts I have written in the last few months, elaborating on my views
In a series of recent posts, I have sought to challenge the conventional view among longtermists that prioritizes the empowerment or preservation of the human species as the chief goal of AI policy. It is my opinion that this view is likely rooted in a bias that automatically favors human beings over artificial entities—thereby sidelining the idea that future AIs might create equal or greater moral value than humans—and treating this alternative perspective with unwarranted skepticism.
I recognize that my position is controversial and likely to remain unpopular among effective altruists for a long time. Nevertheless, I believe it is worth articulating my view at length, as I see it as a straightforward application of standard, common-sense utilitarian principles that merely lead to an unpopular conclusion. I intend to continue elaborating on my arguments in the coming months.
My view follows from a few basic premises. First, that future AI systems are quite likely to be moral patients; second, that we shouldn’t discriminate against them based on arbitrary distinctions, such as their being instantiated on silicon rather than carbon, or having been created through deep learning rather than natural selection. If we insist on treating AIs fundamentally differently from a human child or adult—for example, by regarding them merely as property to be controlled or denying them the freedom to pursue their own goals—then we should identify a specific ethical reason for our approach that goes beyond highlighting their non-human nature.
Many people have argued that consciousness is the key quality separating humans from AIs, thus rendering any AI-based civilization morally insignificant compared to ours. They maintain that consciousness has relatively narrow boundaries, perhaps largely confined to biological organisms, and would only arise in artificial systems under highly specific conditions—for instance, if one were to emulate a human mind in digital form. While I acknowledge that this perspective is logically coherent, I find it deeply unconvincing. The AIs I am referring to when I write about this topic would almost certainly not be simplistic, robotic automatons; rather, they would be profoundly complex, sophisticated entities whose cognitive abilities rival or exceed those of the human brain. For anyone who adopts a functionalist view of consciousness, it seems difficult to be confident that such AIs would lack a rich inner experience.
Because functionalism and preference utilitarianism—both of which could grant moral worth to AI preferences even if they do not precisely replicate biological states—have at least some support within the EA community, I remain hopeful that, if I articulate my position clearly, EAs who share these philosophical assumptions will see its merits.
That said, I am aware that explaining this perspective is an uphill battle. The unpopularity of my views often makes it difficult to communicate without instant misunderstandings; critics seem to frequently conflate my arguments with other, simpler positions that can be more easily dismissed. At times, this has caused me to feel as though the EA community is open to only a narrow range of acceptable ideas. This reaction, while occasionally frustrating, does not surprise me, as I have encountered similar resistance when presenting other unpopular views—such as challenging the ethics of purchasing meat in social contexts where such concerns are quickly deemed absurd.
However, the unpopularity of these ideas also creates a benefit: it creates room for rapid intellectual progress by opening the door to new and interesting philosophical questions about AI ethics. If we free ourselves from the seemingly unquestionable premise that preserving the human species should be the top priority when governing AI development, we can begin to ask entirely new and neglected questions about the role of artificial minds in society.
These questions include: what social and legal frameworks should we pursue if AIs are seen not as dangerous tools to be contained but as individuals on similar moral footing with humans? How do we integrate AI freedom and autonomy into our vision of the future, creating the foundation for a system of ethical and pragmatic AI rights?
Under this alternative philosophical approach, policy would not focus solely on minimizing risks to humanity. Instead, it would emphasize cooperation and inclusion, seeing advanced AI as a partner rather than an ethical menace to be tightly restricted or controlled. This undoubtedly requires a significant shift in our longtermist thinking, demanding a re-examination of deeply rooted assumptions. Such a project cannot be completed overnight, but given the moral stakes and the rapid progress in AI, I view this philosophical endeavor as both urgent and exciting. I invite anyone open to rethinking these foundational premises to join me in exploring how we might foster a future in which AIs and humans coexist as moral peers, cooperating for mutual benefit rather than viewing each other as intrinsic competitors locked in an inevitable zero-sum fight.
Thanks for writing on this important topic!
I think it’s interesting to assess how popular or unpopular these views are within the EA community. This year and last year, we asked people in the EA Survey about the extent to which they agreed or disagreed that:
This year about 47% (strongly or somewhat) disagreed, while 22.2% agreed (roughly a 2:1 ratio).
However, among people who rated AI risks a top priority, respondents leaned towards agreement, with 29.6% disagreeing and 36.6% agreeing (a 0.8:1 ratio).[1]
Similarly, among the most highly engaged EAs, attitudes were roughly evenly split between 33.6% disagreement and 32.7% agreement (1.02:1), with much lower agreement among everyone else.
This suggests to me that the collective opinion of EAs, among those who strongly prioritise AI risks and the most highly engaged is not so hostile to digital minds. Of course, for practical purposes, what matters most might be the attitudes of a small number of decisionmakers, but I think the attitudes of the engaged EAs matters for epistemic reasons.
Interestingly, among people who merely rated AI risks a near-top priority, attitudes towards digital minds were similar to the sample as a whole. Lower prioritisation of AI risks were associated with yet lower agreement with the digital minds item.
I haven’t read your other recent comments on this, but here’s a question on the topic of pausing AI progress. (The point I’m making is similar to what Brad West already commented.)
Let’s say we grant your assumptions (that AIs will have values that matter the same as or more than human values and that an AI-filled future would be just as or more morally important than one with humans in control). Wouldn’t it still make sense to pause AI progress at this important junction to make sure we study what we’re doing so we can set up future AIs to do as well as (reasonably) possible?
You say that we shouldn’t be confident that AI values will be worse than human values. We can put a pin in that. But values are just one feature here. We should also think about agent psychologies and character traits and infrastructure beneficial for forming peaceful coalitions. On those dimensions, some traits or setups seem (somewhat robustly?) worse than others?
We’re growing an alien species that might take over from humans. Even if you think that’s possibly okay or good, wouldn’t you agree that we can envision factors about how AIs are built/trained and about what sort of world they are placed in that affect whether the future will likely be a lot better or a lot worse?
I’m thinking about things like:
pro-social insctincts (or at least absence of anti-social ones)
more general agent character traits that do well/poorly at forming peaceful coalitions
agent infrastructure to help with coordinating (e.g., having better lie detectors, having a reliable information environment or starting out under the chaos of information warfare, etc.)
initial strategic setup (being born into AI-vs-AI competition vs. being born in a situation where the first TAI can take to proceed slowly and deliberately)
maybe: decision-theoretic setup to do well in acausal interactions with other parts of the multiverse (or at least not do particularly poorly)
If (some of) these things are really important, wouldn’t it make sense to pause and study this stuff until we know whether some of these traits are tractable to influence?
(And, if we do that, we might as well try to make AIs have the inclination to be nice to humans, because humans already exist, so anything that kills humans who don’t want to die frustrates already-existing life goals, which seems worse than frustrating the goals of merely possible beings.)
I know you don’t talk about pausing in your above comment—but I think I vaguely remember you being skeptical of it in other comments. Maybe that was for different reasons, or maybe you just wanted to voice disagreement with the types of arguments people typically give in favor of pausing?
FWIW, I totally agree with the position that we should respect the goals of AIs (assuming they’re not just roleplayed stated goals but deeply held ones—of course, this distinction shouldn’t be uncharitably weaponized against AIs ever being considered to have meaningful goals). I’m just concerned because whether the AIs respect ours in turn, especially when they find themselves in a position where they could easily crush us, will probably depend on how we build them.
In your comment, you raise a broad but important question about whether, even if we reject the idea that human survival must take absolute priority other concerns, we might still want to pause AI development in order to “set up” future AIs more thoughtfully. You list a range of traits—things like pro-social instincts, better coordination infrastructures, or other design features that might improve cooperation—that, in principle, we could try to incorporate if we took more time. I understand and agree with the motivation behind this: you are asking whether there is a prudential reason, from a more inclusive moral standpoint, to pause in order to ensure that whichever civilization emerges—whether dominated by humans, AIs, or both at once—turns out as well as possible in ways that matter impartially, rather than focusing narrowly on preserving human dominance.
Having summarized your perspective, I want to clarify exactly where I differ from your view, and why.
First, let me restate the perspective I defended in my previous post on delaying AI. In that post, I was critiquing what I see as the “standard case” for pausing AI, as I perceive it being made in many EA circles. This standard case for pausing AI often treats preventing human extinction as so paramount that any delay of AI progress, no matter how costly to currently living people, becomes justified if it incrementally lowers the probability of humans losing control.
Under this argument, the reason we want to pause is that time spent on “alignment research” can be used to ensure that future AIs share human goals, or at least do not threaten the human species. My critique had two components: first, I argued that pausing AI is very costly to people who currently exist, since it delays medical and technological breakthroughs that could be made by advanced AIs, thereby forcing a lot of people to die who could have otherwise been saved. Second, and more fundamentally, I argued that this “standard case” seems to rest on an assumption of strictly prioritizing human continuity, independent of whether future AIs might actually generate utilitarian moral value in a way that matches or exceeds humanity.
I certainly acknowledge that one could propose a different rationale for pausing AI, one which does not rest on the premise that preserving the human species is intrinsically worth more than than other moral priorities. This is, it seems, the position you are taking.
Nonetheless, I don’t find your considerations compelling for a variety of reasons.
To begin with, it might seem that granting ourselves “more time” robustly ensures that AIs come out morally better—pro-social, cooperative, and so on. Yet the connection between “getting more time” to “achieving positive outcomes” does not seem straightforward. Merely taking more time does not ensure that this time will be used to increase, rather than decrease, the relevant quality of AI systems according to an impartial moral view. Alignment with human interests, for example, could just as easily push systems in directions that entrench specific biases, maintain existing social structures, or limit moral diversity—none of which strongly aligns with the “pro-social” ideals you described. In my view, there is no inherent link between a slower timeline and ensuring that AIs end up embodying genuinely virtuous or impartial ethical principles. Indeed, if what we call “human control” is mainly about enforcing the status quo or entrenching the dominance of the human species, it may be no better—and could even be worse—than a scenario in which AI development proceeds at the default pace, potentially allowing for more diversity and freedom in how systems are shaped.
Furthermore, in my own moral framework—which is heavily influenced by preference utilitarianism—I take seriously the well-being of everyone who currently exists in the present. As I mentioned previously, one major cost to pausing AI is that it would likely postpone many technological benefits. These might include breakthroughs in medicine—potential cures for aging, radical extensions of healthy lifespans, or other dramatic increases to human welfare that advanced AI could accelerate. We should not simply dismiss the scale of that cost. The usual EA argument for downplaying these costs rests on the Astronomical Waste argument. However, I find this argument flawed, and I spelled out exactly why I found this argument flawed in the post I just wrote.
If a pause sets back major medical discoveries by even a decade, that delay could contribute to the premature deaths of around a billion people alive today. It seems to me that an argument in favor of pausing should grapple with this tradeoff, instead of dismissing it as clearly unimportant compared to the potential human lives that could maybe exist in the far future. Such a dismissal would seem both divorced from common sense concern for existing people, and divorced from broader impartial utilitarian values, as it would prioritize the continuity of the human species above and beyond species-neutral concerns for individual well-being.
Finally, I take very seriously the possibility that pausing AI would cause immense and enduring harm by requiring the creation of vast regulatory controls over society. Realistically, the political mechanisms by which we “pause” advanced AI development would likely involve a lot of coercion, surveillance and social control, particularly as AI starts becoming an integral part of our economy. These efforts are likely to expand state regulatory powers, hamper open competition, and open the door to a massive intrusion of state interference in economic and social activity. I believe these controls would likely be far more burdensome and costly than, for example, our controls over nuclear weapons. If our top long-term priority is building a more free, prosperous, inclusive, joyous, and open society for everyone, rather than merely to control and stop AI, then it seems highly questionable that creating the policing powers required to pause AI is the best way to achieve this objective.
As I see it, the core difference between the view you outlined and mine is not that I am ignoring the possibility that we might “do better” by carefully shaping the environment in which AIs arise. I concede that if we had a guaranteed mechanism to spend a known, short period of time intentionally optimizing how AIs are built, without imposing any other costs in the meantime, that might bring some benefits. However, my skepticism flows from the actual methods by which such a pause would come about, its unintended consequences on liberty, the immediate harms it imposes on present-day people by delaying technological progress, and the fact that it might simply entrench a narrower or more species-centric approach that I explicitly reject. It is not enough to claim that “pausing gives us more time”, suggesting that “more time” is robustly a good thing. One must argue why that time will be spent well, in a way that outweighs the enormous and varied costs that I believe are incurred by pausing AI.
To be clear, I am not opposed to all forms of regulation. But I tend to prefer more liberal approaches, in the sense of classical liberalism. I prefer strategies that try to invite AIs into a cooperative framework, giving them legal rights and a path to peaceful integration—coupled, of course, with constraints on any actor (human or AI) who threatens to commit violence. This, in my view, simply seems like a far stronger foundation for AI policy than a stricter top-down approach in which we halt all frontier AI progress, and establish the associated sweeping regulatory powers required to enforce such a moratorium.
I think it’s interesting and admiral that you’re dedicated on a position that’s so unusual in this space.
I assume I’m in the majority here that my intuitions are quite different from yours, however.
One quick point when we’re here:
> this view is likely rooted in a bias that automatically favors human beings over artificial entities—thereby sidelining the idea that future AIs might create equal or greater moral value than humans—and treating this alternative perspective with unwarranted skepticism.
I think that a common, but perhaps not well vocalized, utilitarian take is that humans don’t have much of a special significance in terms of creating well-being. The main option would be a much more abstract idea, some kind of generalization of hedonium or consequentialism-ium or similar. For now, let’s define hedonium as “the ideal way of converting matter and energy into well-being, after a great deal of deliberation.”
As such, it’s very tempting to try to separate concerns and have AI tools focus on being great tools, and separately optimize hedonium to be efficient at being well-being. While I’m not sure if AIs would have zero qualia, I’d feel a lot more confident that they will have dramatically less qualia per unit resources than a much more optimized substrate.
If one follows this general logic, then one might assume that it’s likely that the vast majority of well-being in the future would exist as hedonium, not within AIs created to ultimately make hedonium.
One less intense formulation would be to have both AIs and humans focus only on making sure we get to the point where we much better understand the situation with qualia and hedonium (a la the Long Reflection), and then re-evaluate. In my strategic thinking around AI I’m not particularly optimizing for the qualia of the humans involved in the AI labs or the relevant governments. Similarly, I’d expect not to optimize hard for the qualia in the early AIs, in the period when we’re unsure about qualia and ethics, even if I thought they might have experiences. I would be nervous if I thought this period could involve AIs having intense suffering or be treated in highly immoral ways.
I think for me, part of the issue with your posts on this (which I think are net positive to be clear, they really push at significant weak points in ideas widely held in the community) is that you seem to be sort of vacillating between three different ideas, in a way that conceal that one of them, taken on its own sounds super-crazy and evil:
1) Actually, if AI development were to literally lead to human extinction, that might be fine, because it might lead to higher utility.
2) We should care about humans harming sentient, human-like AIs as much as we care about AIs harming humans.
3) In practice, the benefits to current people from AI development outweigh the risks, and the only moral views which say that we should ignore this and pause in the face of even tiny risks of extinction from AI because there are way more potential humans in the future, in fact, when taken seriously, imply 1), which nobody believes.
1) feels extremely bad to me, basically a sort of Nazi-style view on which genocide is fine if the replacing people are superior utility generators (or I guess, inferior but sufficiently more numerous). 1) plausibly is a consequence of classical utilitarianism (even maybe on some person-affecting versions of classical utilitarianism I think), but I take this to be a reason to reject pure classical utilitarianism, not a reason to endorse 1). 2) and 3), on the other hand, seem reasonable to me. But the thing is that you seem at least sometimes to be taking AI moral patienthood as a reason to push on in the face of uncertainty about whether AI will literally kill everyone. And that seems more like 1) than 2) or 3). 1-style reasoning supports the idea that AI moral patienthood is a reason for pushing on with AI development even in the face of human extinction risk, but as far as I can tell 2) and 3) don’t. At the same time though I don’t think you mean to endorse 1).
I realize my position can be confusing, so let me clarify it as plainly as I can: I do not regard the extinction of humanity as anything close to “fine.” In fact, I think it would be a devastating tragedy if every human being died. I have repeatedly emphasized that a major upside of advanced AI lies in its potential to accelerate medical breakthroughs—breakthroughs that might save countless human lives, including potentially my own. Clearly, I value human lives, as otherwise I would not have made this particular point so frequently.
What seems to cause confusion is that I also argue the following more subtle point: while human extinction would be unbelievably bad, it would likely not be astronomically bad in the strict sense used by the “astronomical waste” argument. The standard “astronomical waste” argument says that if humanity disappears, then all possibility for a valuable, advanced civilization vanishes forever. But in a scenario where humans die out because of AI, civilization would continue—just not with humans. That means a valuable intergalactic civilization could still arise, populated by AI rather than by humans. From a purely utilitarian perspective that counts the existence of a future civilization as extremely valuable—whether human or AI—this difference lowers the cataclysm from “astronomically, supremely, world-endingly awful” to “still incredibly awful, but not on a cosmic scale.”
In other words, my position remains that human extinction is very bad indeed—it entails the loss of eight billion individual human lives, which would be horrifying. I don’t want to be forcibly replaced by an AI. Nor do I want you, or anyone else to be forcibly replaced by an AI. I am simply pushing back on the idea that such an event would constitute the absolute destruction of all future value in the universe. There is a meaningful distinction between “an unimaginable tragedy we should try very hard to avoid” and “a total collapse of all potential for a flourishing future civilization of any kind.” My stance falls firmly in the former category.
This distinction is essential to my argument because it fundamentally shapes how we evaluate trade-offs, particularly when considering policies that aim to slow or restrict AI research. If we assume that human extinction due to AI would erase all future value, then virtually any present-day sacrifice—no matter how extreme—might seem justified to reduce that risk. However, if advanced AI could continue to sustain its own value-generating civilization, even in the absence of humans, then extinction would not represent the absolute end of valuable life. While this scenario would be catastrophic for humanity, attempting to avoid it might not outweigh certain immediate benefits of AI, such as its potential to save lives through advanced technology.
In other words, there could easily be situations where accelerating AI development—rather than pausing it—ends up being the better choice for saving human lives, even if doing so technically slightly increases the risk of human species extinction. This does not mean we should be indifferent to extinction; rather, it means we should stop treating extinction as a near-infinitely overriding concern, where even the smallest reduction in its probability is always worth immense near-term costs to actual people living today.
For a moment, I’d like to reverse the criticism you leveled at me. From where I stand, it is often those who strongly advocate pausing AI development, not myself, who can appear to undervalue the lives of humans. I know they don’t see themselves this way, and they would certainly never phrase it in those terms. Nevertheless, this is my reading of the deeper implications of their position.
A common proposition that many AI pause advocates have affirmed to me is that it very well could be worth it to pause AI, even if this led to billions of humans dying prematurely due to them missing out on accelerated medical progress that could otherwise have saved their lives. Therefore, while these advocates care deeply about human extinction (something I do not deny), their concern does not seem rooted in the intrinsic worth of the people who are alive today. Instead, their primary focus often seems to be on the loss of potential future human lives that could maybe exist in the far future—lives that do not yet even exist, and on my view, are unlikely to exist in the far future in basically any scenario, since humanity is unlikely to be preserved as a fixed, static concept over the long-run.
In my view, this philosophy neither prioritizes the well-being of actual individuals nor is it grounded in the utilitarian value that humanity actively generates. If this philosophy were purely about impartial utilitarian value, then I ask: why are they not more open to my perspective? Since my philosophy takes an impartial utilitarian approach—one that considers not just human-generated value, but also the potential value that AI itself could create—it would seem to appeal to those who simply took a strict utilitarian approach, without discriminating against artificial life arbitrarily. Yet, my philosophy largely does not appeal to those who express this view, suggesting the presence of alternative, non-utilitarian concerns.
Thanks, that is very helpful to me in clarifying your position.
I have read or skimmed some of his posts and my sense is that he does endorse 1). But at the same time he says
so maybe this is one of these cases and I should be more careful.
I distinguish believing that good successor criteria are brittle from speciesism. I think antispeciesism does not oblige me to accept literally any successor.
I do feel icky coalitioning with outright speciesists (who reject the possibility of a good successor in principle), but I think my goals and all of generalized flourishing benefits a lot from those coalitions so I grin and bear it.
FWIW, I completely agree with what you’re saying here and I think that if you seriously go into consciousness research and especially for what we westerners more label as a sense of self rather than anything else it quickly becomes infeasible to hold a position that the way we’re taking AI development, e.g towards AI agents will not lead to AIs having self-models.
For all matters and purposes this encompasses most theories of physicalist or non-dual theories of consciousness which are the only feasible ones unless you want to bite some really sour apples.
There’s a classic “what are we getting wrong” question in EA and I think it’s extremely likely that we will look back in 10 years and say, “wow, what are we doing here?”.
I think it’s a lot better to think of systemic alignment and look at properties that we want for the general collective intelligences that we’re engaging in such as our information networks or our institutional decision making procedures and think of how we can optimise these for resillience and truth-seeking. If certain AIs deserve moral patienthood then that truth will naturally arise from such structures.
(hot take) Individual AI alignment might honestly be counter-productive towards this view.