JordanStone comments on Defense-favoured coordination design sketches

JordanStone 6 Apr 2026 20:46 UTC
5 points
0 ∶ 0
I think that challenges from misrepresentation and lying might be understated—the truthfulness of the AIs is a structural issue for adopting AI delegates in the early stages.

There’s a potential asymmetry where adopting the defense-favoured coordination tech might actually disadvantage you. With AI delegates, they would presumably be verifiable and would be programmed to tell the truth and keep to deals, but humans could still lie (even if they do so by changing their mind after the interaction with the AI delegate). So if one person adopts the AI delegate and another doesn’t, then the human can overexaggerate their preferences, withhold information, and even defect on the deal (without blatantly lying), but a verifiable AI delegate presumably wouldn’t be able to do that? So, humans without AI delegates might be advantaged.

Also, I don’t think that many humans do seek a fair deal—they seek a deal that benefits themself more than the other person. I think this, and the issue with AI delegates being truthful, either leads to a slow adoption of AI delegates, or maybe motivations to manipulate the AI delegates to act in deceptively or manipulatively.

The equilibriums are like:
1. Everyone adopts AI delegates
2. No-one adopts AI delegates
3. AI delegates become corrupted to act in ways that might not be defined as defense-favoured

I don’t know how society gets through the transitionary period where AI delegates start getting adopted.
- Owen Cotton-Barratt 7 Apr 2026 14:33 UTC
  6 points
  1 ∶ 0
  Parent
  I feel like you’re baking a lot into this clause:
  With AI delegates, they would presumably be verifiable and would be programmed to tell the truth and keep to deals
  I think that aiming for an equilibrium where that’s true would be good, but I’m not certain that’s the starting point (and if it were otherwise going to scupper getting this off the ground, it probably shouldn’t be the starting point).
  So if one person adopts the AI delegate and another doesn’t, then the human can overexaggerate their preferences, withhold information, and even defect on the deal (without blatantly lying), but a verifiable AI delegate presumably wouldn’t be able to do that?
  I see no reason why an AI delegate shouldn’t be able to withhold information. I agree that people might want delegates that could do the other things too, but I think that it might be better for the human principal if it couldn’t—it can develop a reputation as trustworthy (in a way that’s hard for an individual human to develop enough of a reputation for because others don’t get enough track record).