Executive summary: Instead of relying solely on internal alignment of AGI, this paper explores how structuring external incentives and interdependencies could encourage cooperation and coexistence between humans and misaligned AGIs, building on recent game-theoretic analyses of AGI-human conflict.
Key points:
Traditional AGI safety approaches focus on internal alignment, but this may be uncertain or unachievable, necessitating alternative strategies.
Game-theoretic models suggest that unaligned AGIs and humans could default to a destructive Prisoner’s Dilemma dynamic, where mutual aggression is the rational choice absent external incentives for cooperation.
Extending existing models, this paper explores scenarios where AGI dependence on economic, political, and infrastructural systems could promote cooperation rather than conflict.
Early-stage AGIs, especially those dependent on specific AI labs, may have stronger incentives for cooperation, but these incentives erode as AGIs become more autonomous.
When AGIs integrate deeply into national security structures, the strategic landscape shifts from a zero-sum game to an assurance game, where cooperation is feasible but fragile.
Effective governance strategies should focus on creating structured dependencies and institutional incentives that make peaceful coexistence the rational strategy for AGIs and human actors.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: Instead of relying solely on internal alignment of AGI, this paper explores how structuring external incentives and interdependencies could encourage cooperation and coexistence between humans and misaligned AGIs, building on recent game-theoretic analyses of AGI-human conflict.
Key points:
Traditional AGI safety approaches focus on internal alignment, but this may be uncertain or unachievable, necessitating alternative strategies.
Game-theoretic models suggest that unaligned AGIs and humans could default to a destructive Prisoner’s Dilemma dynamic, where mutual aggression is the rational choice absent external incentives for cooperation.
Extending existing models, this paper explores scenarios where AGI dependence on economic, political, and infrastructural systems could promote cooperation rather than conflict.
Early-stage AGIs, especially those dependent on specific AI labs, may have stronger incentives for cooperation, but these incentives erode as AGIs become more autonomous.
When AGIs integrate deeply into national security structures, the strategic landscape shifts from a zero-sum game to an assurance game, where cooperation is feasible but fragile.
Effective governance strategies should focus on creating structured dependencies and institutional incentives that make peaceful coexistence the rational strategy for AGIs and human actors.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.