I’m an early career independent researcher who graduated in Economics at University of Cambridge in 2019. I’m part of Modeling Cooperation, a team of independent researchers who work to build computational models and software tools for understanding the consequences of competition in transformative AI. We’ve previously investigated the consequences of a Windfall Clause in a model of AI Existential Safety (under review, see preprint on arXiv at: https://arxiv.org/abs/2108.09404). My current work focuses on building a model to explore policies to promote more resources for AI Safety research.
In October I’m starting a PhD at Teeside University on “Understanding dynamics of AI Safety development through behavioural and network modelling”.
This is super cool work, David and Zoe!
It’s rare to see LLM games that contain this much structure (you have a discrete set of actions which update a world state, and even a bunch of shocks). The other thing I was impressed by is the three different LLM judges. Looking forward to seeing more visualisations.
I have a few questions.
Were any challenges to getting the judges to behave reliably?
You mentioned seeing if there were stable ways for players to coordinate on AI alignment in the face of competitive pressure. From your work so far do you have any ideas about hypotheses or interventions that you would want to try?
I’m curious as to how the competitive dynamics are captured. Are you drawing upon any models of AI race dynamics? (e.g. Armstrong et al. 2016, Han et al. 2020, Stafford et al. 2022). Also, have you seen the Intelligence Rising paper by Avin et al. 2024? I’m wondering whether you’ve seen behaviours similar to what they’ve seen in their workshops?