[Question] Strongest real-world examples supporting AI risk claims?

There are some great collections of examples of things like specification gaming, goal misgeneralization, and AI improving AI. But almost all of the examples are from demos/​toy environments, rather than systems which were actually deployed in the world.

There are also some databases of AI incidents which include lots of real-world examples, but the examples aren’t related to failures in a way that makes it easy to map them onto AI risk claims. (Probably most of them don’t in any case, but I’d guess some do.)

I think collecting real-world examples (particularly in a nuanced way without claiming too much of the examples) could be pretty valuable:

  • I think it’s good practice to have a transparent overview of the current state of evidence

  • For many people I think real-world examples will be most convincing

  • I expect there to be more and more real-world examples, so starting to collect them now seems good

What are the strongest real-world examples of AI systems doing things which might scale to AI risk claims?

I’m particularly interested in whether there are any good real-world examples of:

  • Goal misgeneralization

  • Deceptive alignment (answer: no, but yes to simple deception?)

  • Specification gaming

  • Power-seeking

  • Self-preservation

  • Self-improvement

This feeds into a project I’m working on with AI Impacts, collecting empirical evidence on various AI risk claims. There’s a work-in-progress table here with the main things I’m tracking so far—additions and comments very welcome.