Across 40 games of an anonymous online Diplomacy league, Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.
Wow, this sounds impressive on the face of it, though I also wonder how well the best No Press Diplomacy AI would do here (perhaps with a very mediocre chat capability added). Maybe the dialogue isn’t actually doing that much work? I’d be interested in more information on this.
There’s a commentated video by someone who plays as the only human in an otherwise all-Cicero game, which at least makes it seem like the dialogue is doing a lot.
A random thought I had while watching the video was related to how the commentator pointed out that the bots predictably seem to behave in their own self-interest, whereas human players in bad/losing positions will generally throw all their forces against whoever backstabbed them rather than try to salvage a hopeless position. My personal style of play is much more bot-like than what the commentator described as what human professionals typically do. If the game wasn’t anonymous then I see the incentive to retaliate to build an incentive for future games, but given that players’ usernames are anonymous then it seems to me like the bots’ approach to always try to improve their position seems best.
There’s no comparison to prior full-press Diplomacy agents, but if I’m reading the prior-work cites right, this is because basically none of them work—not only do they not beat humans, they apparently don’t even always improve over themselves playing the game as if it was no-press Diplomacy (ie not using dialogue at all). That gives an idea how big a jump this is for full-press Diplomacy.
I watched the commentated video you and Lawrence shared, and it still wasn’t clear to me from seeing the gameplay how much the press-component was actually helping the Diplomacy agents. (e.g. I wasn’t sure if the bots were cooperating/backstabbing or if they were just always set on playing the moves that they did regardless of what was being said in the Press.) In a game with just one human and the rest bots obviously the human wouldn’t have an advantage of the bots all behaved like No Press bots. I think a mixed game with multiple humans and multiple bots would provide more insightful.
Wow, this sounds impressive on the face of it, though I also wonder how well the best No Press Diplomacy AI would do here (perhaps with a very mediocre chat capability added). Maybe the dialogue isn’t actually doing that much work? I’d be interested in more information on this.
There’s a commentated video by someone who plays as the only human in an otherwise all-Cicero game, which at least makes it seem like the dialogue is doing a lot.
A random thought I had while watching the video was related to how the commentator pointed out that the bots predictably seem to behave in their own self-interest, whereas human players in bad/losing positions will generally throw all their forces against whoever backstabbed them rather than try to salvage a hopeless position. My personal style of play is much more bot-like than what the commentator described as what human professionals typically do. If the game wasn’t anonymous then I see the incentive to retaliate to build an incentive for future games, but given that players’ usernames are anonymous then it seems to me like the bots’ approach to always try to improve their position seems best.
gwern on /r/machinelearning:
Helpful, thanks!
I watched the commentated video you and Lawrence shared, and it still wasn’t clear to me from seeing the gameplay how much the press-component was actually helping the Diplomacy agents. (e.g. I wasn’t sure if the bots were cooperating/backstabbing or if they were just always set on playing the moves that they did regardless of what was being said in the Press.) In a game with just one human and the rest bots obviously the human wouldn’t have an advantage of the bots all behaved like No Press bots. I think a mixed game with multiple humans and multiple bots would provide more insightful.