I think it is useful to think about something like this happening in the current world like you did here because we have better intuitions about the current world. Someone could say that they will torture animals unless vegans give them money, I guess. I think this doesn’t happen for multiple reasons. One of them is that it would be irrational for vegans to agree to give money because then other people would continue exploiting them with this simple trick.
I think that the same applies to far future scenarios. If an agent allows itself to be manipulated this easily, it won’t become powerful. It’s more rational to just make it publicly known that you refuse to engage with such threats. This is one of the reasons why most Western countries have a publicly declared policy to not negotiate with terrorists. So yeah, thinking about it this way, I am no longer concerned about this threats thing.
Someone could say that they will torture animals unless vegans give them money, I guess. I think this doesn’t happen for multiple reasons.
Interestingly, there is at least one instance where this apparently has happened. (It’s possible it was just a joke, though.) There was even a law review article about the incident.
I think this is an interesting point but I’m not convinced that it’s true with high enough probability that the alternative isn’t worth considering.
In particular, I can imagine luck/happenstance to shake out enough that arbitrarily powerful agents on one dimension are less powerful/rational on other dimensions.
Another issue is the nature of precommitments[1]. It seems that under most games/simple decision theories for playing those games (eg “Chicken” in CDT), being the first to credibly precommit gives you a strategic edge under most circumstances. But if you’re second in those situations, it’s not clear whether “I don’t negotiate with terrorists” is a better or worse stance than swerving.
(And in the former case, with both sides precommitting, a lot of torture will still happen).
[1] using what I assume is the technical definition of precommitment
I think it is useful to think about something like this happening in the current world like you did here because we have better intuitions about the current world. Someone could say that they will torture animals unless vegans give them money, I guess. I think this doesn’t happen for multiple reasons. One of them is that it would be irrational for vegans to agree to give money because then other people would continue exploiting them with this simple trick.
I think that the same applies to far future scenarios. If an agent allows itself to be manipulated this easily, it won’t become powerful. It’s more rational to just make it publicly known that you refuse to engage with such threats. This is one of the reasons why most Western countries have a publicly declared policy to not negotiate with terrorists. So yeah, thinking about it this way, I am no longer concerned about this threats thing.
Interestingly, there is at least one instance where this apparently has happened. (It’s possible it was just a joke, though.) There was even a law review article about the incident.
I think this is an interesting point but I’m not convinced that it’s true with high enough probability that the alternative isn’t worth considering.
In particular, I can imagine luck/happenstance to shake out enough that arbitrarily powerful agents on one dimension are less powerful/rational on other dimensions.
Another issue is the nature of precommitments[1]. It seems that under most games/simple decision theories for playing those games (eg “Chicken” in CDT), being the first to credibly precommit gives you a strategic edge under most circumstances. But if you’re second in those situations, it’s not clear whether “I don’t negotiate with terrorists” is a better or worse stance than swerving.
(And in the former case, with both sides precommitting, a lot of torture will still happen).
[1] using what I assume is the technical definition of precommitment