(I spent a couple of hours thinking about and discussing with a friend and want to try and write some tentative conclusions down.)
I think the mere fact that Meta was able to combine strategic thinking and dialog to let an AI achieve its goals in a collaborative and competitive environment with humans should give us some pause. On the technical level, I think the biggest contribution are two things: having the RL-trained strategy model that explicitly models the other human agents and finds the balance between optimal play and “human play”; and establishing an interface between that module and the language model via encoding the intent.
I think the Cicero algorithm is bad news for AI safety. It demonstrates yet another worrying capability of AI, and Meta’s open publishing norm is speeding up this type of work and making it harder to control.
Of course, the usual rebuttal to concerns about this sort of paper is still valid: This is just a game, which makes two important things easier:
There’s a clear state space, action space and reward function. So RL is easy to set up.
The dialog is pretty simple, consisting of short sentences in a very specific domain.
One of the most interesting questions is where things will go from here. I think adding deception to the model should allow it to play even better and I’d give it 50% that somebody will make that work within the next year. Beyond that, now that the scaffolding is established, I expect superhuman level agents to be developed in the next year as well. (I’m not quite sure how “superhuman” you can get in Diplomacy – maybe the current model is already there – but maybe the agent could win any tournament it enters.)
Beyond that, I’m curious when and where we will see some of the techniques established in the paper in real-world applications. I think the bottleneck here will be finding scenarios that can be modeled with RL and benefit from talking with humans. This seems like a difficult spot to identify. When talking with humans is required, this usually indicates a messy environment where actions and goals aren’t clearly defined. A positive example I can think of is a language coach. The goal could be to optimize test scores, the action would be picking from a set of exercises. This alone could already work well, but if you add in human psychology and the fact that e.g. motivation can be a key-driver in learning, then dialog becomes important as well.
Great post (especially for a first one, kudos)!
One recent piece of evidence that updated me further towards “many smart people are quite confused about the problem and in particular anthropomorphize current AI systems a lot” was Lex Friedman’s conversation with Eliezer (e.g. at 1:02:36):