yefreitor comments on yefreitor’s Quick takes

yefreitor 15 May 2023 2:36 UTC
2 points
2 ∶ 0
Claim: Credible plans for a “pivotal act” may drive AI race dynamics
(Epistemic status: I had mathematica do all the grunt work and did not check the results carefully)
Consider a simple normal-form game with two equally capable agents A and B, each of which is deciding whether to aggressively pursue AI development, and three free parameters:
- the probability $p_{d o o m}$ that accelerating AI development results in an existential catastrophe (with utility −1 for both agents, versus a utility 0 status quo).
- the utility $u_{1}$ of developing the first friendly AI
- the utility $- 1 < u_{2} < u_{1}$ of the other agent developing friendly AI
We’ll first assume the coin only gets flipped once: developing a friendly AI lets you immediately control all other AI development.
Since our choice of parameterization was in retrospect one that requires a lot of typing, we’ll define $u_{+} = \frac{u_{1} + u_{2}}{2}$ , $u_{-} = u_{1} - u_{+} = u_{+} - u_{2}$ and then rescale to get something more readable
$\begin{matrix} Accelerate & Don't Accelerate & u_{+} & u_{+} - u_{-}, u_{+} + u_{-} Don't & u_{+} + u_{-}, u_{+} - u_{-} & \frac{p_{d o o m}}{1 - p_{d o o m}} \end{matrix}$
- (Accelerate, Accelerate) is always a Nash equilibrium, no matter how trivial the differences $u_{-}$ captures are.
- (Don’t, Don’t) is a Nash equilibrium when $(u_{+} + u_{-}) (1 - p_{d o o m}) < p_{d o o m}$ , as you would expect
- (Don’t, Don’t) is never a trembling-hand equilibrium, since (Don’t) does not weakly dominate (Accelerate) for either player.
- When $(u_{+} + u_{-}) (1 - p_{d o o m}) \geq p_{d o o m}$ (Accelerate) weakly dominates (Don’t) and (Accelerate, Accelerate) is a trembling-hand equilibrium.
Now consider the case where (Accelerate, Accelerate) instead flips two coins.
$\begin{matrix} Accelerate & Don't Accelerate & u_{+} (1 - p_{d o o m}) - p_{d o o m} & u_{+} - u_{-}, u_{+} + u_{-} Don't & u_{+} + u_{-}, u_{+} - u_{-} & \frac{p_{d o o m}}{1 - p_{d o o m}} \end{matrix}$
This is potentially a much safer situation:
- (Accelerate, Accelerate) is only a Nash equilibrium when $p_{d o o m} < \frac{u_{-}}{u_{+} + 1}$
- (Don’t, Don’t) is still a Nash equilibrium when $(u_{+} + u_{-}) (1 - p_{d o o m}) < p_{d o o m} ⟺ p_{d o o m} > \frac{u_{+} + u_{-}}{u_{+} + u_{-} + 1}$
- (Don’t, Don’t) is a trembling-hand equilibrium if it’s a Nash equilibrium and (Accelerate, Accelerate) is not.