I think you are somewhat missing the point. The point of a treaty with an enforcement mechanism which includes bombing data centers is not to engage in implicit nuclear blackmail, which would indeed be dumb (from a game theory perspective). It is to actually stop AI training runs. You are not issuing a “threat” which you will escalate into greater and greater forms of blackmail if the first one is acceded to; the point is not to extract resources in non-cooperative ways. It is to ensure that the state of the world is one where there is no data center capable of performing AI training runs of a certain size.
The question of whether this would be correctly understood by the relevant actors is important but separate. I agree that in the world we currently live in, it doesn’t seem likely. But if you in fact lived in a world which had successfully passed a multilateral treaty like this, it seems much more possible that people in the relevant positions had updated far enough to understand that whatever was happening was at least not the typical realpolitik.
2. If the world takes AI risk seriously, do we need threats?
Obviously if you live in a world where you’ve passed such a treaty, the first step in response to a potential violation is not going to be “bombs away!”, and nothing Eliezer wrote suggests otherwise. But the fact that you have these options available ultimately bottoms out in the fact that your BATNA is still to bomb the data center.
3. Don’t do morally wrong things
I think conducting cutting edge AI capabilities research is pretty immoral, and in this counterfactual world that is a much more normalized position, even if consensus is that chances of x-risk absent a very strong plan for alignment is something like 10%. You can construct the least convenient possible world such that some poor country has decided, for perfectly innocent reasons, to build data centers that will predictably get bombed, but unless you think the probability mass on something like that happening is noticeable, I don’t think it should be a meaningful factor in your reasoning. Like, we do not let people involuntarily subject others to russian roulette, which is similar to the epistemic state of the world where 10% x-risk is a consensus position, and our response to someone actively preparing to go play roulette while declaring their intentions to do so in order to get some unrelated real benefit out of it would be to stop them.
4. Nuclear exchanges could be part of a rogue AI plan
I mean, no, in this world you’re already dead, and also nuclear exchange would in fact cost AI quite a lot so I expect many fewer nuclear wars in worlds where we’ve accidentally created an unaligned ASI.
I think you are somewhat missing the point. The point of a treaty with an enforcement mechanism which includes bombing data centers is not to engage in implicit nuclear blackmail, which would indeed be dumb (from a game theory perspective). It is to actually stop AI training runs. You are not issuing a “threat” which you will escalate into greater and greater forms of blackmail if the first one is acceded to; the point is not to extract resources in non-cooperative ways. It is to ensure that the state of the world is one where there is no data center capable of performing AI training runs of a certain size.
The counterfactual here is between two treaties that are identical, except one includes the policy “bomb datacentres in nuclear armed nations” and one does not. The only case where they differ is the scenario where a nuclear armed nation starts building GPU clusters. In which case, policy A demands resorting to nuclear blackmail when all other avenues have been exhausted, but policy B does not.
I think a missing ingredient here is the scenario that led up to this policy. If there had already been a warning shot where an AI built in a GPT-4 sized cluster killed millions of people, then it is plausible that such a clause might work, because both parties are putting clusters in the “super-nukes” category.
If this hasn’t happened, or the case for clusters being dangerous is seen as flimsy, then we are essentially back at the “china threatens to bomb openAI” scenario. I think this is a terrible scenario, unless you actually do think that nuclear war is preferable to large data-clusters being built. (to be clear, i think the chance of each individual data cluster causing the apocalypse is miniscule).
I think you are somewhat missing the point. The point of a treaty with an enforcement mechanism which includes bombing data centers is not to engage in implicit nuclear blackmail, which would indeed be dumb (from a game theory perspective). It is to actually stop AI training runs. You are not issuing a “threat” which you will escalate into greater and greater forms of blackmail if the first one is acceded to; the point is not to extract resources in non-cooperative ways. It is to ensure that the state of the world is one where there is no data center capable of performing AI training runs of a certain size.
The question of whether this would be correctly understood by the relevant actors is important but separate. I agree that in the world we currently live in, it doesn’t seem likely. But if you in fact lived in a world which had successfully passed a multilateral treaty like this, it seems much more possible that people in the relevant positions had updated far enough to understand that whatever was happening was at least not the typical realpolitik.
Obviously if you live in a world where you’ve passed such a treaty, the first step in response to a potential violation is not going to be “bombs away!”, and nothing Eliezer wrote suggests otherwise. But the fact that you have these options available ultimately bottoms out in the fact that your BATNA is still to bomb the data center.
I think conducting cutting edge AI capabilities research is pretty immoral, and in this counterfactual world that is a much more normalized position, even if consensus is that chances of x-risk absent a very strong plan for alignment is something like 10%. You can construct the least convenient possible world such that some poor country has decided, for perfectly innocent reasons, to build data centers that will predictably get bombed, but unless you think the probability mass on something like that happening is noticeable, I don’t think it should be a meaningful factor in your reasoning. Like, we do not let people involuntarily subject others to russian roulette, which is similar to the epistemic state of the world where 10% x-risk is a consensus position, and our response to someone actively preparing to go play roulette while declaring their intentions to do so in order to get some unrelated real benefit out of it would be to stop them.
I mean, no, in this world you’re already dead, and also nuclear exchange would in fact cost AI quite a lot so I expect many fewer nuclear wars in worlds where we’ve accidentally created an unaligned ASI.
The counterfactual here is between two treaties that are identical, except one includes the policy “bomb datacentres in nuclear armed nations” and one does not. The only case where they differ is the scenario where a nuclear armed nation starts building GPU clusters. In which case, policy A demands resorting to nuclear blackmail when all other avenues have been exhausted, but policy B does not.
I think a missing ingredient here is the scenario that led up to this policy. If there had already been a warning shot where an AI built in a GPT-4 sized cluster killed millions of people, then it is plausible that such a clause might work, because both parties are putting clusters in the “super-nukes” category.
If this hasn’t happened, or the case for clusters being dangerous is seen as flimsy, then we are essentially back at the “china threatens to bomb openAI” scenario. I think this is a terrible scenario, unless you actually do think that nuclear war is preferable to large data-clusters being built. (to be clear, i think the chance of each individual data cluster causing the apocalypse is miniscule).