Claim: Credible plans for a “pivotal act” may drive AI race dynamics
(Epistemic status: I had mathematica do all the grunt work and did not check the results carefully)
Consider a simple normal-form game with two equally capable agents A and B, each of which is deciding whether to aggressively pursue AI development, and three free parameters:
the probability pdoom that accelerating AI development results in an existential catastrophe (with utility −1 for both agents, versus a utility 0 status quo).
the utility u1 of developing the first friendly AI
the utility −1<u2<u1 of the other agent developing friendly AI
We’ll first assume the coin only gets flipped once: developing a friendly AI lets you immediately control all other AI development.
Since our choice of parameterization was in retrospect one that requires a lot of typing, we’ll define u+=u1+u22, u−=u1−u+=u+−u2 and then rescale to get something more readable
Claim: Credible plans for a “pivotal act” may drive AI race dynamics
(Epistemic status: I had mathematica do all the grunt work and did not check the results carefully)
Consider a simple normal-form game with two equally capable agents A and B, each of which is deciding whether to aggressively pursue AI development, and three free parameters:
the probability pdoom that accelerating AI development results in an existential catastrophe (with utility −1 for both agents, versus a utility 0 status quo).
the utility u1 of developing the first friendly AI
the utility −1<u2<u1 of the other agent developing friendly AI
We’ll first assume the coin only gets flipped once: developing a friendly AI lets you immediately control all other AI development.
Since our choice of parameterization was in retrospect one that requires a lot of typing, we’ll define u+=u1+u22, u−=u1−u+=u+−u2 and then rescale to get something more readable
AccelerateDon'tAccelerateu+u+−u−,u++u−Don'tu++u−,u+−u−pdoom1−pdoom
(Accelerate, Accelerate) is always a Nash equilibrium, no matter how trivial the differences u− captures are.
(Don’t, Don’t) is a Nash equilibrium when (u++u−)(1−pdoom)<pdoom, as you would expect
(Don’t, Don’t) is never a trembling-hand equilibrium, since (Don’t) does not weakly dominate (Accelerate) for either player.
When (u++u−)(1−pdoom)≥pdoom (Accelerate) weakly dominates (Don’t) and (Accelerate, Accelerate) is a trembling-hand equilibrium.
Now consider the case where (Accelerate, Accelerate) instead flips two coins.
AccelerateDon'tAccelerateu+(1−pdoom)−pdoomu+−u−,u++u−Don'tu++u−,u+−u−pdoom1−pdoom
This is potentially a much safer situation:
(Accelerate, Accelerate) is only a Nash equilibrium when pdoom<u−u++1
(Don’t, Don’t) is still a Nash equilibrium when (u++u−)(1−pdoom)<pdoom⟺pdoom>u++u−u++u−+1
(Don’t, Don’t) is a trembling-hand equilibrium if it’s a Nash equilibrium and (Accelerate, Accelerate) is not.