Using AI and cryptography to make cooperation rational – even when trust is impossible.
Epistemic status: Conceptual sketch. I believe this idea is worth exploring with small practical experiments, but I have not yet tested it. Written in collaboration with Claude Opus 4.5 (Anthropic), where I contributed the core idea and Claude developed the theoretical context and drafted the text.
The Classic Dilemma
Consider the following scenario: Two leading AI laboratories – Lab A and Lab B – both recognise that an uncontrolled race towards ever more advanced AI systems poses significant risks. Both would benefit from slowing down, investing more in safety, and coordinating their releases. But neither dares to move first.
If Lab A unilaterally slows down while Lab B continues at full speed, Lab A loses market share, talent, and influence – perhaps permanently. The same applies in reverse. The result? Both continue to accelerate, despite both preferring a world in which they had slowed down together.
We’ve heard it before – it’s the essence of a coordination problem. Individually rational strategies lead to collectively suboptimal outcomes. We see the same pattern in climate negotiations, nuclear disarmament, and countless other domains where parties would gain from cooperation but remain trapped in a destructive equilibrium.
Three mechanisms explain why coordination is so difficult:
Vulnerability through transparency. Revealing one’s true position – rather than one’s negotiating position – creates vulnerability. If Lab A says “we could actually accept a six-month pause,” Lab B now has an information advantage that can be exploited.
Impossible conditions. One cannot condition one’s offer on the other party’s secret willingness. Lab A wants to say “we’ll pause if you pause” – but how can they know whether Lab B’s response is genuine?
Leaked failures. If negotiations fail, both parties have learned strategic information about each other. The mere attempt to coordinate can therefore be costly.
The question this text explores: Can new technology change the rules of the game?
What the Literature Already Knows
Coordination problems are not new, and theorists have developed several frameworks for understanding and sometimes solving them.
Programme Equilibrium
In 2004, game theorist Moshe Tennenholtz introduced the concept of “programme equilibrium.” The idea: instead of players choosing strategies directly, they submit programmes that specify their strategy – and these programmes have access to each other’s source code.
In the prisoner’s dilemma, a player can write a programme that says: “If my opponent’s programme is identical to mine, I cooperate. Otherwise, I defect.” If both players submit this programme, both will cooperate – and neither has an incentive to deviate.
This is remarkable: rational cooperation emerges even in a one-shot game, something impossible in classical game theory. But there is an obvious limitation: it requires parties to actually inspect each other’s code and verify that it will be executed as specified.
Secure Multi-Party Computation
In parallel, cryptographers have developed protocols for Secure Multi-Party Computation (MPC). Multiple parties can jointly compute a function of their secret inputs without any party learning the others’ inputs.
A toy example: Three colleagues want to know who has the highest salary without revealing their individual salaries. With MPC, they can perform a computation that reveals only the result whilst each participant’s actual figure remains secret.
Think of it as a perfectly trustworthy third party – a “Tony” – to whom everyone could send their secrets. Tony computes the function and returns only the result. MPC showed that one can achieve this without trusting any Tony – the protocol itself guarantees secrecy.
The problem with MPC for negotiations is complexity. The protocols are designed for well-defined mathematical functions, not for the rich, context-dependent reasoning that real negotiations require.
Mediated Equilibrium
Monderer and Tennenholtz have also shown that a mediator can enable equilibria that would otherwise be impossible. A mediator receives private messages from all parties, computes a recommendation, and sends back instructions. If the mechanism is properly designed, no party has an incentive to deviate.
Certain socially desirable outcomes that cannot be reached through ordinary Nash equilibria can be achieved through mediated equilibria. But whom does one trust as mediator? And how does one verify that the mediator behaves correctly?
The New Possibility: LLM + TEE
Here is the central idea: what happens if the mediator is a large language model running inside a cryptographically secured environment?
The Architecture
┌─────────────────────────────────────────┐
│ Trusted Execution Environment (TEE) │
│ ┌─────────────────────────────────────┐│
│ │ LLM Mediator ││
│ │ • Receives secret inputs ││
│ │ • Reasons about possible deals ││
│ │ • Returns only: ││
│ │ "Deal possible" / "No deal" ││
│ └─────────────────────────────────────┘│
│ [Verifiable code, cryptographically │
│ attested by all parties] │
└─────────────────────────────────────────┘
▲ ▲
│ │
Party A: Party B:
"We can accept X "We agree to Y
if B does at least Y" if A does at least Z"
The process:
All parties review and approve the code to be executed – including the LLM’s instructions.
Each party sends their true position, encrypted so that only the TEE can read it.
The LLM inside the TEE receives all positions and reasons about whether a mutual agreement is possible.
The result is binary: either “A deal is possible – here are the minimum conditions” or “No deal possible”.
Upon failure, all information is deleted – parties learn nothing about each other beyond the fact that their positions did not sufficiently overlap.
Why an LLM Rather Than Traditional MPC?
A traditional MPC solution requires predefined functions: “Compute the intersection of intervals X and Y.” But real negotiations are richer.
An LLM can handle natural language: “We can accept a pause on capability training above 10^26 FLOP if competitors do the same, but we need an exception for safety research.”
It can reason about plausibility: “Party A’s demand for exceptions is inconsistent with Party B’s definition of safety research – but here is a possible compromise.”
It can propose creative solutions – not merely compute whether positions overlap, but identify a zone of possible agreement that the parties themselves did not see.
What TEE Guarantees
A Trusted Execution Environment is hardware that guarantees code runs in isolation – even from the owner and operator of the computer. Modern implementations (Intel SGX, ARM TrustZone) offer:
Confidentiality. Not even the server operator can see what happens inside the enclave.
Integrity. The code cannot be manipulated without detection.
Attestation. Parties can obtain cryptographic proof that the correct code is running in the correct environment.
This solves the problem of trusting the mediator: no one needs to trust any person or organisation – only that the hardware and code function as specified.
Concrete Application: AI Safety Agreements
Let us return to the AI laboratory scenario. What might an LLM-mediated negotiation look like?
The Scenario
Anthropic, OpenAI, DeepMind, and several other leading laboratories wish to explore coordinated safety measures. But:
No one wants to reveal where they actually stand in capability development.
No one wants to show their true reservation point.
A failed negotiation where positions have leaked may be worse than no negotiation at all.
The Protocol
Each lab sends an encrypted message to the LLM mediator. A submission might include:
Current state (confidential): “We are X months from being able to train a model at Y FLOP”
Actions we are willing to take: “We will implement safety protocol Z if at least N–1 other labs do the same”
Conditions: “We require that measures are mutually verifiable through method W”
Absolute limits: “We cannot accept anything that gives competitor Q more than M months’ head start”
The LLM analyses all inputs and computes: Is there a consistent set of measures that all parties can accept? If yes, what are the minimum requirements? If no, how close were the parties – expressed without revealing individual positions?
The crucial point: if no deal is possible, no party learns why. Lab A does not know whether Lab B or Lab C blocked, or for what reason. This eliminates the strategic cost of failed negotiations.
Extensions
The same mechanism could apply to other coordination problems: climate negotiations where countries reveal true emissions targets without showing their hand, nuclear disarmament where failure does not leak intelligence, or trade negotiations where parties explore deals without revealing their best alternative.
The mechanism could of course be used in situations not related to global risks as well – negotiations between employers and unions, business mergers, or divorce settlements where both parties prefer efficiency over prolonged conflict.
Critical Challenges
The idea is appealing but far from unproblematic.
Verification of Compliance
The LLM mediator can help parties reach an agreement. But who ensures they follow it? This requires separate mechanisms – inspections, reporting, sanctions.
One possible extension: the same system could handle continuous reporting, where parties regularly send status updates to the mediator, which computes whether everyone still meets commitments without revealing individual data. It would require that the reports contain proofs of their validity, but these proofs could remain hidden to other parties.
Trust in LLM Behaviour
How do we know the LLM behaves neutrally? Modern language models have subtle (or not-so-subtle) biases that could systematically favour certain outcomes.
Possible approaches: use simpler, more verifiable models for the core function, or run several different models in parallel with consensus required before accepting a result.
Gaming and Strategic Inputs
Can parties manipulate by sending strategic inputs? “We’ll say our limit is X, even though it’s really Y, to see what the system returns.”
This requires incentive-compatible mechanism design – making truthful inputs each party’s dominant strategy. The mechanism design literature offers tools, but combining them with LLM reasoning is unexplored territory.
Practical and Political Resistance
Organisations accustomed to bargaining power – who extract advantages through information asymmetry and negotiating skill – have weak incentives to relinquish those advantages.
Compare resistance to binding arbitration: even when both parties would benefit from a predictable system, the more powerful party often prefers uncertainty that allows exploiting their strength.
Why This Is Worth Exploring
Coordination problems are among the most difficult challenges humanity faces. The climate crisis, risks from advanced AI, nuclear proliferation – all share the same structure: individually rational choices leading to collective catastrophe.
Traditional solutions – repeated interactions, reputation mechanisms, social norms – work best in stable environments with long time horizons. For existential risks, where we may have only one chance, they are insufficient.
LLMs combined with cryptographic security open a new design space. Not as a magic solution, but as a tool to explore: Can we construct mechanisms where revealing one’s true position becomes rational? Where failed negotiations carry no cost? Where creative compromises can be discovered by a system that sees all parties’ perspectives simultaneously?
This is a conceptual sketch, and the next step would be small-scale experiments – perhaps in simulated negotiations or low-stakes real scenarios – to develop intuitions about what actually works.
What is needed is interdisciplinary work: game theory, cryptography, AI safety, international relations. Theoretical analysis of which coordination problems are solvable with these tools. Practical experiments. And eventually, perhaps, protocols that can be tested in real negotiation contexts.
Coordination problems have always been hard. But the tools for solving them need not be the same tomorrow as they were yesterday.
LLMs as Trusted Mediators – A Path Beyond Coordination Problems?
Using AI and cryptography to make cooperation rational – even when trust is impossible.
Epistemic status: Conceptual sketch. I believe this idea is worth exploring with small practical experiments, but I have not yet tested it. Written in collaboration with Claude Opus 4.5 (Anthropic), where I contributed the core idea and Claude developed the theoretical context and drafted the text.
The Classic Dilemma
Consider the following scenario: Two leading AI laboratories – Lab A and Lab B – both recognise that an uncontrolled race towards ever more advanced AI systems poses significant risks. Both would benefit from slowing down, investing more in safety, and coordinating their releases. But neither dares to move first.
If Lab A unilaterally slows down while Lab B continues at full speed, Lab A loses market share, talent, and influence – perhaps permanently. The same applies in reverse. The result? Both continue to accelerate, despite both preferring a world in which they had slowed down together.
We’ve heard it before – it’s the essence of a coordination problem. Individually rational strategies lead to collectively suboptimal outcomes. We see the same pattern in climate negotiations, nuclear disarmament, and countless other domains where parties would gain from cooperation but remain trapped in a destructive equilibrium.
Three mechanisms explain why coordination is so difficult:
Vulnerability through transparency. Revealing one’s true position – rather than one’s negotiating position – creates vulnerability. If Lab A says “we could actually accept a six-month pause,” Lab B now has an information advantage that can be exploited.
Impossible conditions. One cannot condition one’s offer on the other party’s secret willingness. Lab A wants to say “we’ll pause if you pause” – but how can they know whether Lab B’s response is genuine?
Leaked failures. If negotiations fail, both parties have learned strategic information about each other. The mere attempt to coordinate can therefore be costly.
The question this text explores: Can new technology change the rules of the game?
What the Literature Already Knows
Coordination problems are not new, and theorists have developed several frameworks for understanding and sometimes solving them.
Programme Equilibrium
In 2004, game theorist Moshe Tennenholtz introduced the concept of “programme equilibrium.” The idea: instead of players choosing strategies directly, they submit programmes that specify their strategy – and these programmes have access to each other’s source code.
In the prisoner’s dilemma, a player can write a programme that says: “If my opponent’s programme is identical to mine, I cooperate. Otherwise, I defect.” If both players submit this programme, both will cooperate – and neither has an incentive to deviate.
This is remarkable: rational cooperation emerges even in a one-shot game, something impossible in classical game theory. But there is an obvious limitation: it requires parties to actually inspect each other’s code and verify that it will be executed as specified.
Secure Multi-Party Computation
In parallel, cryptographers have developed protocols for Secure Multi-Party Computation (MPC). Multiple parties can jointly compute a function of their secret inputs without any party learning the others’ inputs.
A toy example: Three colleagues want to know who has the highest salary without revealing their individual salaries. With MPC, they can perform a computation that reveals only the result whilst each participant’s actual figure remains secret.
Think of it as a perfectly trustworthy third party – a “Tony” – to whom everyone could send their secrets. Tony computes the function and returns only the result. MPC showed that one can achieve this without trusting any Tony – the protocol itself guarantees secrecy.
The problem with MPC for negotiations is complexity. The protocols are designed for well-defined mathematical functions, not for the rich, context-dependent reasoning that real negotiations require.
Mediated Equilibrium
Monderer and Tennenholtz have also shown that a mediator can enable equilibria that would otherwise be impossible. A mediator receives private messages from all parties, computes a recommendation, and sends back instructions. If the mechanism is properly designed, no party has an incentive to deviate.
Certain socially desirable outcomes that cannot be reached through ordinary Nash equilibria can be achieved through mediated equilibria. But whom does one trust as mediator? And how does one verify that the mediator behaves correctly?
The New Possibility: LLM + TEE
Here is the central idea: what happens if the mediator is a large language model running inside a cryptographically secured environment?
The Architecture
The process:
All parties review and approve the code to be executed – including the LLM’s instructions.
Each party sends their true position, encrypted so that only the TEE can read it.
The LLM inside the TEE receives all positions and reasons about whether a mutual agreement is possible.
The result is binary: either “A deal is possible – here are the minimum conditions” or “No deal possible”.
Upon failure, all information is deleted – parties learn nothing about each other beyond the fact that their positions did not sufficiently overlap.
Why an LLM Rather Than Traditional MPC?
A traditional MPC solution requires predefined functions: “Compute the intersection of intervals X and Y.” But real negotiations are richer.
An LLM can handle natural language: “We can accept a pause on capability training above 10^26 FLOP if competitors do the same, but we need an exception for safety research.”
It can reason about plausibility: “Party A’s demand for exceptions is inconsistent with Party B’s definition of safety research – but here is a possible compromise.”
It can propose creative solutions – not merely compute whether positions overlap, but identify a zone of possible agreement that the parties themselves did not see.
What TEE Guarantees
A Trusted Execution Environment is hardware that guarantees code runs in isolation – even from the owner and operator of the computer. Modern implementations (Intel SGX, ARM TrustZone) offer:
Confidentiality. Not even the server operator can see what happens inside the enclave.
Integrity. The code cannot be manipulated without detection.
Attestation. Parties can obtain cryptographic proof that the correct code is running in the correct environment.
This solves the problem of trusting the mediator: no one needs to trust any person or organisation – only that the hardware and code function as specified.
Concrete Application: AI Safety Agreements
Let us return to the AI laboratory scenario. What might an LLM-mediated negotiation look like?
The Scenario
Anthropic, OpenAI, DeepMind, and several other leading laboratories wish to explore coordinated safety measures. But:
No one wants to reveal where they actually stand in capability development.
No one wants to show their true reservation point.
A failed negotiation where positions have leaked may be worse than no negotiation at all.
The Protocol
Each lab sends an encrypted message to the LLM mediator. A submission might include:
Current state (confidential): “We are X months from being able to train a model at Y FLOP”
Actions we are willing to take: “We will implement safety protocol Z if at least N–1 other labs do the same”
Conditions: “We require that measures are mutually verifiable through method W”
Absolute limits: “We cannot accept anything that gives competitor Q more than M months’ head start”
The LLM analyses all inputs and computes: Is there a consistent set of measures that all parties can accept? If yes, what are the minimum requirements? If no, how close were the parties – expressed without revealing individual positions?
The crucial point: if no deal is possible, no party learns why. Lab A does not know whether Lab B or Lab C blocked, or for what reason. This eliminates the strategic cost of failed negotiations.
Extensions
The same mechanism could apply to other coordination problems: climate negotiations where countries reveal true emissions targets without showing their hand, nuclear disarmament where failure does not leak intelligence, or trade negotiations where parties explore deals without revealing their best alternative.
The mechanism could of course be used in situations not related to global risks as well – negotiations between employers and unions, business mergers, or divorce settlements where both parties prefer efficiency over prolonged conflict.
Critical Challenges
The idea is appealing but far from unproblematic.
Verification of Compliance
The LLM mediator can help parties reach an agreement. But who ensures they follow it? This requires separate mechanisms – inspections, reporting, sanctions.
One possible extension: the same system could handle continuous reporting, where parties regularly send status updates to the mediator, which computes whether everyone still meets commitments without revealing individual data. It would require that the reports contain proofs of their validity, but these proofs could remain hidden to other parties.
Trust in LLM Behaviour
How do we know the LLM behaves neutrally? Modern language models have subtle (or not-so-subtle) biases that could systematically favour certain outcomes.
Possible approaches: use simpler, more verifiable models for the core function, or run several different models in parallel with consensus required before accepting a result.
Gaming and Strategic Inputs
Can parties manipulate by sending strategic inputs? “We’ll say our limit is X, even though it’s really Y, to see what the system returns.”
This requires incentive-compatible mechanism design – making truthful inputs each party’s dominant strategy. The mechanism design literature offers tools, but combining them with LLM reasoning is unexplored territory.
Practical and Political Resistance
Organisations accustomed to bargaining power – who extract advantages through information asymmetry and negotiating skill – have weak incentives to relinquish those advantages.
Compare resistance to binding arbitration: even when both parties would benefit from a predictable system, the more powerful party often prefers uncertainty that allows exploiting their strength.
Why This Is Worth Exploring
Coordination problems are among the most difficult challenges humanity faces. The climate crisis, risks from advanced AI, nuclear proliferation – all share the same structure: individually rational choices leading to collective catastrophe.
Traditional solutions – repeated interactions, reputation mechanisms, social norms – work best in stable environments with long time horizons. For existential risks, where we may have only one chance, they are insufficient.
LLMs combined with cryptographic security open a new design space. Not as a magic solution, but as a tool to explore: Can we construct mechanisms where revealing one’s true position becomes rational? Where failed negotiations carry no cost? Where creative compromises can be discovered by a system that sees all parties’ perspectives simultaneously?
This is a conceptual sketch, and the next step would be small-scale experiments – perhaps in simulated negotiations or low-stakes real scenarios – to develop intuitions about what actually works.
What is needed is interdisciplinary work: game theory, cryptography, AI safety, international relations. Theoretical analysis of which coordination problems are solvable with these tools. Practical experiments. And eventually, perhaps, protocols that can be tested in real negotiation contexts.
Coordination problems have always been hard. But the tools for solving them need not be the same tomorrow as they were yesterday.
This is an updated cross-post from my own Substack. Original post here.