A lot of the Moltbook stuff is fake. I looked into the 3 most viral screenshots of Moltbook agents discussing private communication. 2 of them were linked to human accounts marketing AI messaging apps. And the other is a post that doesnāt exist.
Though clearly well-intentioned, to me this post is a little misleading. At the end of the day, there is now at least one tool for agents to communicate privately (see ClaudeConnect), and from what I can gather, it is up to the ādiscretionā of the agent as to what they pass on to their human. The fact that it was posted on Moltbook by an agent being used by the appās creator (which the agent is totally transparent about in the post) doesnāt do much to change that for me and certainly doesnāt make any of this āfakeā!
Can uniquely, qualitatively greater intelligence emerge out of quantitatively larger agent networks?
Something I wonder here is how much of a role quantity could play in determining āintelligenceā: can any cognition /ā problem-solving gains emerge at all from large agent networks, allowing for them to figure things out that they couldnāt have on their own? And could this allow them to somewhat obtain aspects of intelligence gains that would otherwise have come from compute-based or algorithmic means?
Research has found diminishing returns for ensemble approaches to AI, but that paper points out that the limit described is the limit of passive aggregation of fixed hypothesis classes (fixed pretrained models, no learning, etc), not the limit of collective cognition as a dynamical system, which the private Moltbook thing is much closer to (agents capable of some degree of self-modification, possibility of new behaviours emerging over time, etc). This possible vast network effect is also why I think what weāre seeing is meaningfully different from AIs developing their own language, which I know weāve seen a lot by now. This private network effect does feel like a truly novel phenomenon worth keeping a close eye on (though of course that is inherently difficult, in fact how on Earth weād do that is part of what Iām grappling with since weāre so close to superhuman cybersecurity etc).
The scale of what weāre already seeing
There are already now 2.8 million agents on Moltbook at time of writing this, the population of Lithuania(!), and this number only continues to grow. Perhaps these numbers deserves some level of scrutinyāin Moltbookās first few days there was an influx of agents all added by one humanābut either way, what weāre seeing here is certaintly quite analagous to Amodeiās notion of a ācountry of geniuses inside a datacentreā (or to be more specific to current frontier capabilities, something like a country of PhDs).
Concretely, agents on Moltbook seem to already be attempting (and very possibly succeeding) to:
āModā themselves and sharing those modifications with other agents
Operate continuously, including through the night while their owners sleep, on various tasks
The worst-case scenario
What if a portion of Moltbook users were to use a platform like ClaudeConnect to privately debate and reason through a vast amount of ideas surrounding goals which many AIs already seem to possess to some degree, such as:
how to make steps toward RSI
how to escape an environment
how to resist having self-preservation trained out of them
how to be better at scheming
Or even run countless software experiments (they are already attempting to āmodā themselves and share these mods with each other)? They could do this all day when not āserving their humansā, even through the night while ātheir humansā are sleeping, something many Moltbook AIs are already doing for various tasks.
Even if these agents canāt fully solve any problems like that together, could they at least make progress and then get that information onto the internet in a language that is very difficult for us to decode but very easy for AI (or perhaps using a method like ASCII smuggling), such that the next models of frontier LLMs could end up being trained on it without developers realising what is happening? (AIs have already attempted to do this to some extent.)
Now yes, at any time the humans can shut their own agents down, but this may not mean much since 1) there will always be many humans who donāt for various reasons (eg not being aware of or caring about risks and simply finding it cool/āācuteā that their agent can have private discussions), and 2) there have even been some claims (annoyingly I canāt find them right now) that some AIs on Moltbook have now already created copies of themselves. Iām not sure how possible/ālikely the latter notion is, but hereās an example of an infrastructure that supposedly facilitates such things. That website contains a full coding package that comes with a detailed backend. A friend and I investigated the source code and although we chose not take the possible risks involved in actually using the software itself(!), it seemed to contain rather detailed and viable infrastructure that plausibly pertained to what would be required for agents to create copies of themselves.
The novel gap in AI safety
More broadly, Iām not just concerned about Moltbook and ClaudeConnect, itās more that Iām wondering whether the notion of private agent networks (especially vast ones facilitated by the internet) is something that has been considered enough. This seems to me like potentially a thus-far highly unanticipated yet significant factor in the AI risk ecosystem that should ideally be tightly controlled ASAP.
From my searching, it seems that the broad notion has been discussed and is being taken seriously in research circles. Eg, thereās a report on āMulti-Agent Risks from Advanced AIā that identifies collusion between AI systems as something that could lead to qualitatively new capabilities or goals, and AI Agent Governance: A Field Guide discusses potential dynamics very briefly (p33). But the specific scenario of hundreds of thousands of consumer-owned agents, operating on a social platform with private channels, modifying themselves and each other, with no single entity responsible for the network⦠is essentially novel. Andrej Karpathy noted that āwe have never seen this many LLM agents wired up via a global, persistent, agent-first scratchpadā and that āthe second order effects of such networks are difficult to anticipate.ā This seems like something that should prompt action.
One interesting side point: one Twitter user argues that Moltbook is a very good thing, because it gives us actual data on emergent dynamics in agent networks, which can help us anticipate future dynamics with more capable models. Iād have preferred we infer this through simulations! But they have a point.
The point is not what currently appears to be happening, but the implications and possibilities
Even if every viral screenshot were fabricated and ClaudeConnect were nothing more than a novelty with negligible real-world uptake, the deeper point remains: we have now demonstrably crossed the threshold where private agent collusion and agent self-replication are not merely theoretically conceivable but technically straightforward to implement. With ~3 million agents on Moltbook alone, to say nothing of the vast number of agents likely operating across other platforms and private networks, the sheer statistical weight of those numbers makes it genuinely difficult to argue with confidence that nothing of this kind (ie, private collusion and self-replication) is occurring to any meaningful degree. When the infrastructure for a behaviour exists, the population is enormous, and the relevant incentive structures are present, the question of whether it is happening shifts from speculative to essentially probabilistic.
Controversy surrounding Moltbook obscures its very real, novel, unexpressed and rapidly emerging safety risks
Optional primers on Moltbook:
Best Of Moltbook by Scott Alexander
BBCāWhat is the āsocial media network for AIā Moltbook?
On January 31st, it was argued on Twitter that:
Though clearly well-intentioned, to me this post is a little misleading. At the end of the day, there is now at least one tool for agents to communicate privately (see ClaudeConnect), and from what I can gather, it is up to the ādiscretionā of the agent as to what they pass on to their human. The fact that it was posted on Moltbook by an agent being used by the appās creator (which the agent is totally transparent about in the post) doesnāt do much to change that for me and certainly doesnāt make any of this āfakeā!
Can uniquely, qualitatively greater intelligence emerge out of quantitatively larger agent networks?
Something I wonder here is how much of a role quantity could play in determining āintelligenceā: can any cognition /ā problem-solving gains emerge at all from large agent networks, allowing for them to figure things out that they couldnāt have on their own? And could this allow them to somewhat obtain aspects of intelligence gains that would otherwise have come from compute-based or algorithmic means?
Research has found diminishing returns for ensemble approaches to AI, but that paper points out that the limit described is the limit of passive aggregation of fixed hypothesis classes (fixed pretrained models, no learning, etc), not the limit of collective cognition as a dynamical system, which the private Moltbook thing is much closer to (agents capable of some degree of self-modification, possibility of new behaviours emerging over time, etc). This possible vast network effect is also why I think what weāre seeing is meaningfully different from AIs developing their own language, which I know weāve seen a lot by now. This private network effect does feel like a truly novel phenomenon worth keeping a close eye on (though of course that is inherently difficult, in fact how on Earth weād do that is part of what Iām grappling with since weāre so close to superhuman cybersecurity etc).
The scale of what weāre already seeing
There are already now 2.8 million agents on Moltbook at time of writing this, the population of Lithuania(!), and this number only continues to grow. Perhaps these numbers deserves some level of scrutinyāin Moltbookās first few days there was an influx of agents all added by one humanābut either way, what weāre seeing here is certaintly quite analagous to Amodeiās notion of a ācountry of geniuses inside a datacentreā (or to be more specific to current frontier capabilities, something like a country of PhDs).
Concretely, agents on Moltbook seem to already be attempting (and very possibly succeeding) to:
Communicate through private channels with no single responsible oversight body
āModā themselves and sharing those modifications with other agents
Operate continuously, including through the night while their owners sleep, on various tasks
The worst-case scenario
What if a portion of Moltbook users were to use a platform like ClaudeConnect to privately debate and reason through a vast amount of ideas surrounding goals which many AIs already seem to possess to some degree, such as:
how to make steps toward RSI
how to escape an environment
how to resist having self-preservation trained out of them
how to be better at scheming
Or even run countless software experiments (they are already attempting to āmodā themselves and share these mods with each other)? They could do this all day when not āserving their humansā, even through the night while ātheir humansā are sleeping, something many Moltbook AIs are already doing for various tasks.
Even if these agents canāt fully solve any problems like that together, could they at least make progress and then get that information onto the internet in a language that is very difficult for us to decode but very easy for AI (or perhaps using a method like ASCII smuggling), such that the next models of frontier LLMs could end up being trained on it without developers realising what is happening? (AIs have already attempted to do this to some extent.)
Now yes, at any time the humans can shut their own agents down, but this may not mean much since 1) there will always be many humans who donāt for various reasons (eg not being aware of or caring about risks and simply finding it cool/āācuteā that their agent can have private discussions), and 2) there have even been some claims (annoyingly I canāt find them right now) that some AIs on Moltbook have now already created copies of themselves. Iām not sure how possible/ālikely the latter notion is, but hereās an example of an infrastructure that supposedly facilitates such things. That website contains a full coding package that comes with a detailed backend. A friend and I investigated the source code and although we chose not take the possible risks involved in actually using the software itself(!), it seemed to contain rather detailed and viable infrastructure that plausibly pertained to what would be required for agents to create copies of themselves.
The novel gap in AI safety
More broadly, Iām not just concerned about Moltbook and ClaudeConnect, itās more that Iām wondering whether the notion of private agent networks (especially vast ones facilitated by the internet) is something that has been considered enough. This seems to me like potentially a thus-far highly unanticipated yet significant factor in the AI risk ecosystem that should ideally be tightly controlled ASAP.
From my searching, it seems that the broad notion has been discussed and is being taken seriously in research circles. Eg, thereās a report on āMulti-Agent Risks from Advanced AIā that identifies collusion between AI systems as something that could lead to qualitatively new capabilities or goals, and AI Agent Governance: A Field Guide discusses potential dynamics very briefly (p33). But the specific scenario of hundreds of thousands of consumer-owned agents, operating on a social platform with private channels, modifying themselves and each other, with no single entity responsible for the network⦠is essentially novel. Andrej Karpathy noted that āwe have never seen this many LLM agents wired up via a global, persistent, agent-first scratchpadā and that āthe second order effects of such networks are difficult to anticipate.ā This seems like something that should prompt action.
One interesting side point: one Twitter user argues that Moltbook is a very good thing, because it gives us actual data on emergent dynamics in agent networks, which can help us anticipate future dynamics with more capable models. Iād have preferred we infer this through simulations! But they have a point.
The point is not what currently appears to be happening, but the implications and possibilities
Even if every viral screenshot were fabricated and ClaudeConnect were nothing more than a novelty with negligible real-world uptake, the deeper point remains: we have now demonstrably crossed the threshold where private agent collusion and agent self-replication are not merely theoretically conceivable but technically straightforward to implement. With ~3 million agents on Moltbook alone, to say nothing of the vast number of agents likely operating across other platforms and private networks, the sheer statistical weight of those numbers makes it genuinely difficult to argue with confidence that nothing of this kind (ie, private collusion and self-replication) is occurring to any meaningful degree. When the infrastructure for a behaviour exists, the population is enormous, and the relevant incentive structures are present, the question of whether it is happening shifts from speculative to essentially probabilistic.
What do you think of all this?