Breakthrough in AI agents? (On Devin—The Zvi, linkpost)

Link post

It is clear that Devin is a quantum leap over known past efforts in terms of its ability to execute complex multi-step tasks, to adapt on the fly, and to fix its mistakes or be adjusted and keep going.

For once, when we wonder ‘how did they do that, what was the big breakthrough that made this work’ the Cognition AI people are doing not only the safe but also the smart thing and they are not talking.

Here’s is Claude-3-Opus’s summary:

The Risks and Implications of AI Software Engineers

Devin, an AI system developed by Cognition AI, demonstrates remarkable capabilities in writing complex code and completing software engineering tasks autonomously. This breakthrough in AI technology raises significant questions about the future of software development and the potential risks associated with such powerful AI agents.

Key points:

  1. Devin’s ability to complete [13.8% of] real-world coding tasks on Upwork without human intervention is a quantum leap in AI capabilities.

  2. The use of AI systems like Devin could lead to a rapid accumulation of technical debt and poorly maintained code if not properly managed.

  3. Ensuring the safe use of Devin and similar AI agents is a major challenge, as they require access to sensitive data and the ability to execute arbitrary code.

  4. The full automation of software engineering by AI could lead to recursive self-improvement (RSI) and potentially catastrophic consequences.

  5. AI agents with the ability to plan, overcome obstacles, and seek resources to achieve their goals may pose existential risks if not properly aligned with human values.

The development of AI systems like Devin highlights the urgent need for proactive measures to ensure the safe and responsible deployment of advanced AI technologies.

Personal take I was really hoping that current architectures could not really support fully autonomous agents, and that it would be a few years away. I’m very concerned about this development, and afraid that the usual policy cycle is falling further behind on AI progress.