Zoe L comments on AGI Multi-Agent Alignment Simulation

Zoe L 12 May 2026 12:42 UTC
3 points
0 ∶ 0
Thanks Paolo!
- When using gpt-4o and gemini-2.5-flash as judges, they struggled with math (for enforcing resource and value constraints) and generally didn’t justify their claims as much. Upgrading to gpt-5.4 and gemini-2.5-pro solves the math problem, though claude-sonnet-4-6 still provided more reasoning for their decisions.
- Since we’ve only ran 3-year simulations (i.e., 3 turns for each game), we can’t make claims about long-tern equilibrium. However, we did observe that different shock events seem to encourage different strategies even in the 3-year sim, e.g. alignment_breakthrough incentivized transparency (i.e., more cooperative); nationalization_shock incentivized resource consolidation (i.e., more competitive). Running simulations over longer horizon (10-50 turns) with different parameters and under different scenarios would help confirm if these shock-induced trends hold.
  
  We also observed that more A2A communication led to more cooperation and thus better vibe-based alignment. Players are always allowed A2A communication and are always truthful about their actions in the current sim, so it would be interesting to test what happens when players are allowed (or even encouraged) to use deception or when there’s fog-of-war/a lack of A2A communication.
- We didn’t reference the specific papers you mentioned but share their ideas. Kenneth, 2026 on AI-simulated nuclear war game influenced our design of the race mechanics the most. Carichon et al. and Zeng et al. influenced our design of the multi-layer value system. We’ve seen similar behaviours as observed in Avin et al. 2024, especially:
  - The power to steer the future of AI development is very unequally distributed due to several drivers for concentration, including the enormous compute requirements of the latest frontier AI models
  - There exists an information asymmetry where states and the public will constantly be catching up to deal with the impacts of the last generation of AI technologies
  - Winners take all rather than winner takes all + Division into blocs by state lines
  - Tech race + Races are destabilising
  - Supply chain disruptions slow but don’t stop AI and cause instability (we actually have a supply chain disruption shock event, so again, will be interesting to run it over a longer time horizon)