Executive summary: This exploratory report examines how Claude AI models perform as autonomous agents in the social deduction game Blood on the Clocktower, revealing that while individual reasoning is often sound, cooperation—especially among agents who know they are on the same team—remains shallow, and groupthink can emerge from misinterpretations that go unchallenged.
Key points:
Setup and Scaffolding: The author built a digital version of Blood on the Clocktower in which Claude models play individual characters with private histories, public actions, and strategy-writing prompts to simulate reasoning and collaboration.
Limited AI Cooperation: Despite explicit instructions and full access to teammates’ identities (for evil players), AI agents struggled to develop deep cooperative strategies unless directly prompted—and even then, they followed instructions superficially rather than creatively building on them.
Emergent Groupthink: Multiple games showed AI players adopting incorrect rules or logic without question, sometimes due to trust in proven teammates, resulting in flawed—but occasionally lucky—decision-making.
Semantic Biases: Certain words like “dead” and “Investigator” carried unintended semantic weight that interfered with reasoning, leading to misunderstandings that could be mitigated with alternate phrasing (e.g., “ghost player”).
Model Comparison: Claude 4 Opus demonstrated the most advanced reasoning and was the only model to reason about unmentioned roles, while Claude 3.5 Haiku uniquely showed spontaneous (albeit weak) team messaging among evil players.
Suggestions for Further Work: The author proposes exploring cross-game memory, richer strategy scaffolding, and techniques like rule-citation to reduce hallucinated group consensus—highlighting a broader research opportunity in multi-agent collaboration and misalignment.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: This exploratory report examines how Claude AI models perform as autonomous agents in the social deduction game Blood on the Clocktower, revealing that while individual reasoning is often sound, cooperation—especially among agents who know they are on the same team—remains shallow, and groupthink can emerge from misinterpretations that go unchallenged.
Key points:
Setup and Scaffolding: The author built a digital version of Blood on the Clocktower in which Claude models play individual characters with private histories, public actions, and strategy-writing prompts to simulate reasoning and collaboration.
Limited AI Cooperation: Despite explicit instructions and full access to teammates’ identities (for evil players), AI agents struggled to develop deep cooperative strategies unless directly prompted—and even then, they followed instructions superficially rather than creatively building on them.
Emergent Groupthink: Multiple games showed AI players adopting incorrect rules or logic without question, sometimes due to trust in proven teammates, resulting in flawed—but occasionally lucky—decision-making.
Semantic Biases: Certain words like “dead” and “Investigator” carried unintended semantic weight that interfered with reasoning, leading to misunderstandings that could be mitigated with alternate phrasing (e.g., “ghost player”).
Model Comparison: Claude 4 Opus demonstrated the most advanced reasoning and was the only model to reason about unmentioned roles, while Claude 3.5 Haiku uniquely showed spontaneous (albeit weak) team messaging among evil players.
Suggestions for Further Work: The author proposes exploring cross-game memory, richer strategy scaffolding, and techniques like rule-citation to reduce hallucinated group consensus—highlighting a broader research opportunity in multi-agent collaboration and misalignment.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.