I haven’t read your other recent comments on this, but here’s a question on the topic of pausing AI progress. (The point I’m making is similar to what Brad West already commented.)
Let’s say we grant your assumptions (that AIs will have values that matter the same as or more than human values and that an AI-filled future would be just as or more morally important than one with humans in control). Wouldn’t it still make sense to pause AI progress at this important junction to make sure we study what we’re doing so we can set up future AIs to do as well as (reasonably) possible?
You say that we shouldn’t be confident that AI values will be worse than human values. We can put a pin in that. But values are just one feature here. We should also think about agent psychologies and character traits and infrastructure beneficial for forming peaceful coalitions. On those dimensions, some traits or setups seem (somewhat robustly?) worse than others?
We’re growing an alien species that might take over from humans. Even if you think that’s possibly okay or good, wouldn’t you agree that we can envision factors about how AIs are built/trained and about what sort of world they are placed in that affect whether the future will likely be a lot better or a lot worse?
I’m thinking about things like:
pro-social insctincts (or at least absence of anti-social ones)
more general agent character traits that do well/poorly at forming peaceful coalitions
agent infrastructure to help with coordinating (e.g., having better lie detectors, having a reliable information environment or starting out under the chaos of information warfare, etc.)
initial strategic setup (being born into AI-vs-AI competition vs. being born in a situation where the first TAI can take to proceed slowly and deliberately)
maybe: decision-theoretic setup to do well in acausal interactions with other parts of the multiverse (or at least not do particularly poorly)
If (some of) these things are really important, wouldn’t it make sense to pause and study this stuff until we know whether some of these traits are tractable to influence?
(And, if we do that, we might as well try to make AIs have the inclination to be nice to humans, because humans already exist, so anything that kills humans who don’t want to die frustrates already-existing life goals, which seems worse than frustrating the goals of merely possible beings.)
I know you don’t talk about pausing in your above comment—but I think I vaguely remember you being skeptical of it in other comments. Maybe that was for different reasons, or maybe you just wanted to voice disagreement with the types of arguments people typically give in favor of pausing?
FWIW, I totally agree with the position that we should respect the goals of AIs (assuming they’re not just roleplayed stated goals but deeply held ones—of course, this distinction shouldn’t be uncharitably weaponized against AIs ever being considered to have meaningful goals). I’m just concerned because whether the AIs respect ours in turn, especially when they find themselves in a position where they could easily crush us, will probably depend on how we build them.
In your comment, you raise a broad but important question about whether, even if we reject the idea that human survival must take absolute priority other concerns, we might still want to pause AI development in order to “set up” future AIs more thoughtfully. You list a range of traits—things like pro-social instincts, better coordination infrastructures, or other design features that might improve cooperation—that, in principle, we could try to incorporate if we took more time. I understand and agree with the motivation behind this: you are asking whether there is a prudential reason, from a more inclusive moral standpoint, to pause in order to ensure that whichever civilization emerges—whether dominated by humans, AIs, or both at once—turns out as well as possible in ways that matter impartially, rather than focusing narrowly on preserving human dominance.
Having summarized your perspective, I want to clarify exactly where I differ from your view, and why.
First, let me restate the perspective I defended in my previous post on delaying AI. In that post, I was critiquing what I see as the “standard case” for pausing AI, as I perceive it being made in many EA circles. This standard case for pausing AI often treats preventing human extinction as so paramount that any delay of AI progress, no matter how costly to currently living people, becomes justified if it incrementally lowers the probability of humans losing control.
Under this argument, the reason we want to pause is that time spent on “alignment research” can be used to ensure that future AIs share human goals, or at least do not threaten the human species. My critique had two components: first, I argued that pausing AI is very costly to people who currently exist, since it delays medical and technological breakthroughs that could be made by advanced AIs, thereby forcing a lot of people to die who could have otherwise been saved. Second, and more fundamentally, I argued that this “standard case” seems to rest on an assumption of strictly prioritizing human continuity, independent of whether future AIs might actually generate utilitarian moral value in a way that matches or exceeds humanity.
I certainly acknowledge that one could propose a different rationale for pausing AI, one which does not rest on the premise that preserving the human species is intrinsically worth more than than other moral priorities. This is, it seems, the position you are taking.
Nonetheless, I don’t find your considerations compelling for a variety of reasons.
To begin with, it might seem that granting ourselves “more time” robustly ensures that AIs come out morally better—pro-social, cooperative, and so on. Yet the connection between “getting more time” to “achieving positive outcomes” does not seem straightforward. Merely taking more time does not ensure that this time will be used to increase, rather than decrease, the relevant quality of AI systems according to an impartial moral view. Alignment with human interests, for example, could just as easily push systems in directions that entrench specific biases, maintain existing social structures, or limit moral diversity—none of which strongly aligns with the “pro-social” ideals you described. In my view, there is no inherent link between a slower timeline and ensuring that AIs end up embodying genuinely virtuous or impartial ethical principles. Indeed, if what we call “human control” is mainly about enforcing the status quo or entrenching the dominance of the human species, it may be no better—and could even be worse—than a scenario in which AI development proceeds at the default pace, potentially allowing for more diversity and freedom in how systems are shaped.
Furthermore, in my own moral framework—which is heavily influenced by preference utilitarianism—I take seriously the well-being of everyone who currently exists in the present. As I mentioned previously, one major cost to pausing AI is that it would likely postpone many technological benefits. These might include breakthroughs in medicine—potential cures for aging, radical extensions of healthy lifespans, or other dramatic increases to human welfare that advanced AI could accelerate. We should not simply dismiss the scale of that cost. The usual EA argument for downplaying these costs rests on the Astronomical Waste argument. However, I find this argument flawed, and I spelled out exactly why I found this argument flawed in the post I just wrote.
If a pause sets back major medical discoveries by even a decade, that delay could contribute to the premature deaths of around a billion people alive today. It seems to me that an argument in favor of pausing should grapple with this tradeoff, instead of dismissing it as clearly unimportant compared to the potential human lives that could maybe exist in the far future. Such a dismissal would seem both divorced from common sense concern for existing people, and divorced from broader impartial utilitarian values, as it would prioritize the continuity of the human species above and beyond species-neutral concerns for individual well-being.
Finally, I take very seriously the possibility that pausing AI would cause immense and enduring harm by requiring the creation of vast regulatory controls over society. Realistically, the political mechanisms by which we “pause” advanced AI development would likely involve a lot of coercion, surveillance and social control, particularly as AI starts becoming an integral part of our economy. These efforts are likely to expand state regulatory powers, hamper open competition, and open the door to a massive intrusion of state interference in economic and social activity. I believe these controls would likely be far more burdensome and costly than, for example, our controls over nuclear weapons. If our top long-term priority is building a more free, prosperous, inclusive, joyous, and open society for everyone, rather than merely to control and stop AI, then it seems highly questionable that creating the policing powers required to pause AI is the best way to achieve this objective.
As I see it, the core difference between the view you outlined and mine is not that I am ignoring the possibility that we might “do better” by carefully shaping the environment in which AIs arise. I concede that if we had a guaranteed mechanism to spend a known, short period of time intentionally optimizing how AIs are built, without imposing any other costs in the meantime, that might bring some benefits. However, my skepticism flows from the actual methods by which such a pause would come about, its unintended consequences on liberty, the immediate harms it imposes on present-day people by delaying technological progress, and the fact that it might simply entrench a narrower or more species-centric approach that I explicitly reject. It is not enough to claim that “pausing gives us more time”, suggesting that “more time” is robustly a good thing. One must argue why that time will be spent well, in a way that outweighs the enormous and varied costs that I believe are incurred by pausing AI.
To be clear, I am not opposed to all forms of regulation. But I tend to prefer more liberal approaches, in the sense of classical liberalism. I prefer strategies that try to invite AIs into a cooperative framework, giving them legal rights and a path to peaceful integration—coupled, of course, with constraints on any actor (human or AI) who threatens to commit violence. This, in my view, simply seems like a far stronger foundation for AI policy than a stricter top-down approach in which we halt all frontier AI progress, and establish the associated sweeping regulatory powers required to enforce such a moratorium.
I haven’t read your other recent comments on this, but here’s a question on the topic of pausing AI progress. (The point I’m making is similar to what Brad West already commented.)
Let’s say we grant your assumptions (that AIs will have values that matter the same as or more than human values and that an AI-filled future would be just as or more morally important than one with humans in control). Wouldn’t it still make sense to pause AI progress at this important junction to make sure we study what we’re doing so we can set up future AIs to do as well as (reasonably) possible?
You say that we shouldn’t be confident that AI values will be worse than human values. We can put a pin in that. But values are just one feature here. We should also think about agent psychologies and character traits and infrastructure beneficial for forming peaceful coalitions. On those dimensions, some traits or setups seem (somewhat robustly?) worse than others?
We’re growing an alien species that might take over from humans. Even if you think that’s possibly okay or good, wouldn’t you agree that we can envision factors about how AIs are built/trained and about what sort of world they are placed in that affect whether the future will likely be a lot better or a lot worse?
I’m thinking about things like:
pro-social insctincts (or at least absence of anti-social ones)
more general agent character traits that do well/poorly at forming peaceful coalitions
agent infrastructure to help with coordinating (e.g., having better lie detectors, having a reliable information environment or starting out under the chaos of information warfare, etc.)
initial strategic setup (being born into AI-vs-AI competition vs. being born in a situation where the first TAI can take to proceed slowly and deliberately)
maybe: decision-theoretic setup to do well in acausal interactions with other parts of the multiverse (or at least not do particularly poorly)
If (some of) these things are really important, wouldn’t it make sense to pause and study this stuff until we know whether some of these traits are tractable to influence?
(And, if we do that, we might as well try to make AIs have the inclination to be nice to humans, because humans already exist, so anything that kills humans who don’t want to die frustrates already-existing life goals, which seems worse than frustrating the goals of merely possible beings.)
I know you don’t talk about pausing in your above comment—but I think I vaguely remember you being skeptical of it in other comments. Maybe that was for different reasons, or maybe you just wanted to voice disagreement with the types of arguments people typically give in favor of pausing?
FWIW, I totally agree with the position that we should respect the goals of AIs (assuming they’re not just roleplayed stated goals but deeply held ones—of course, this distinction shouldn’t be uncharitably weaponized against AIs ever being considered to have meaningful goals). I’m just concerned because whether the AIs respect ours in turn, especially when they find themselves in a position where they could easily crush us, will probably depend on how we build them.
In your comment, you raise a broad but important question about whether, even if we reject the idea that human survival must take absolute priority other concerns, we might still want to pause AI development in order to “set up” future AIs more thoughtfully. You list a range of traits—things like pro-social instincts, better coordination infrastructures, or other design features that might improve cooperation—that, in principle, we could try to incorporate if we took more time. I understand and agree with the motivation behind this: you are asking whether there is a prudential reason, from a more inclusive moral standpoint, to pause in order to ensure that whichever civilization emerges—whether dominated by humans, AIs, or both at once—turns out as well as possible in ways that matter impartially, rather than focusing narrowly on preserving human dominance.
Having summarized your perspective, I want to clarify exactly where I differ from your view, and why.
First, let me restate the perspective I defended in my previous post on delaying AI. In that post, I was critiquing what I see as the “standard case” for pausing AI, as I perceive it being made in many EA circles. This standard case for pausing AI often treats preventing human extinction as so paramount that any delay of AI progress, no matter how costly to currently living people, becomes justified if it incrementally lowers the probability of humans losing control.
Under this argument, the reason we want to pause is that time spent on “alignment research” can be used to ensure that future AIs share human goals, or at least do not threaten the human species. My critique had two components: first, I argued that pausing AI is very costly to people who currently exist, since it delays medical and technological breakthroughs that could be made by advanced AIs, thereby forcing a lot of people to die who could have otherwise been saved. Second, and more fundamentally, I argued that this “standard case” seems to rest on an assumption of strictly prioritizing human continuity, independent of whether future AIs might actually generate utilitarian moral value in a way that matches or exceeds humanity.
I certainly acknowledge that one could propose a different rationale for pausing AI, one which does not rest on the premise that preserving the human species is intrinsically worth more than than other moral priorities. This is, it seems, the position you are taking.
Nonetheless, I don’t find your considerations compelling for a variety of reasons.
To begin with, it might seem that granting ourselves “more time” robustly ensures that AIs come out morally better—pro-social, cooperative, and so on. Yet the connection between “getting more time” to “achieving positive outcomes” does not seem straightforward. Merely taking more time does not ensure that this time will be used to increase, rather than decrease, the relevant quality of AI systems according to an impartial moral view. Alignment with human interests, for example, could just as easily push systems in directions that entrench specific biases, maintain existing social structures, or limit moral diversity—none of which strongly aligns with the “pro-social” ideals you described. In my view, there is no inherent link between a slower timeline and ensuring that AIs end up embodying genuinely virtuous or impartial ethical principles. Indeed, if what we call “human control” is mainly about enforcing the status quo or entrenching the dominance of the human species, it may be no better—and could even be worse—than a scenario in which AI development proceeds at the default pace, potentially allowing for more diversity and freedom in how systems are shaped.
Furthermore, in my own moral framework—which is heavily influenced by preference utilitarianism—I take seriously the well-being of everyone who currently exists in the present. As I mentioned previously, one major cost to pausing AI is that it would likely postpone many technological benefits. These might include breakthroughs in medicine—potential cures for aging, radical extensions of healthy lifespans, or other dramatic increases to human welfare that advanced AI could accelerate. We should not simply dismiss the scale of that cost. The usual EA argument for downplaying these costs rests on the Astronomical Waste argument. However, I find this argument flawed, and I spelled out exactly why I found this argument flawed in the post I just wrote.
If a pause sets back major medical discoveries by even a decade, that delay could contribute to the premature deaths of around a billion people alive today. It seems to me that an argument in favor of pausing should grapple with this tradeoff, instead of dismissing it as clearly unimportant compared to the potential human lives that could maybe exist in the far future. Such a dismissal would seem both divorced from common sense concern for existing people, and divorced from broader impartial utilitarian values, as it would prioritize the continuity of the human species above and beyond species-neutral concerns for individual well-being.
Finally, I take very seriously the possibility that pausing AI would cause immense and enduring harm by requiring the creation of vast regulatory controls over society. Realistically, the political mechanisms by which we “pause” advanced AI development would likely involve a lot of coercion, surveillance and social control, particularly as AI starts becoming an integral part of our economy. These efforts are likely to expand state regulatory powers, hamper open competition, and open the door to a massive intrusion of state interference in economic and social activity. I believe these controls would likely be far more burdensome and costly than, for example, our controls over nuclear weapons. If our top long-term priority is building a more free, prosperous, inclusive, joyous, and open society for everyone, rather than merely to control and stop AI, then it seems highly questionable that creating the policing powers required to pause AI is the best way to achieve this objective.
As I see it, the core difference between the view you outlined and mine is not that I am ignoring the possibility that we might “do better” by carefully shaping the environment in which AIs arise. I concede that if we had a guaranteed mechanism to spend a known, short period of time intentionally optimizing how AIs are built, without imposing any other costs in the meantime, that might bring some benefits. However, my skepticism flows from the actual methods by which such a pause would come about, its unintended consequences on liberty, the immediate harms it imposes on present-day people by delaying technological progress, and the fact that it might simply entrench a narrower or more species-centric approach that I explicitly reject. It is not enough to claim that “pausing gives us more time”, suggesting that “more time” is robustly a good thing. One must argue why that time will be spent well, in a way that outweighs the enormous and varied costs that I believe are incurred by pausing AI.
To be clear, I am not opposed to all forms of regulation. But I tend to prefer more liberal approaches, in the sense of classical liberalism. I prefer strategies that try to invite AIs into a cooperative framework, giving them legal rights and a path to peaceful integration—coupled, of course, with constraints on any actor (human or AI) who threatens to commit violence. This, in my view, simply seems like a far stronger foundation for AI policy than a stricter top-down approach in which we halt all frontier AI progress, and establish the associated sweeping regulatory powers required to enforce such a moratorium.