[Question] Strongest real-world examples supporting AI risk claims?

[Manually cross-posted to LessWrong here.]

There are some great collections of examples of things like specification gaming, goal misgeneralization, and AI improving AI. But almost all of the examples are from demos/​toy environments, rather than systems which were actually deployed in the world.

There are also some databases of AI incidents which include lots of real-world examples, but the examples aren’t related to failures in a way that makes it easy to map them onto AI risk claims. (Probably most of them don’t in any case, but I’d guess some do.)

I think collecting real-world examples (particularly in a nuanced way without claiming too much of the examples) could be pretty valuable:

  • I think it’s good practice to have a transparent overview of the current state of evidence

  • For many people I think real-world examples will be most convincing

  • I expect there to be more and more real-world examples, so starting to collect them now seems good

What are the strongest real-world examples of AI systems doing things which might scale to AI risk claims?

I’m particularly interested in whether there are any good real-world examples of:

  • Goal misgeneralization

  • Deceptive alignment (answer: no, but yes to simple deception?)

  • Specification gaming

  • Power-seeking

  • Self-preservation

  • Self-improvement

This feeds into a project I’m working on with AI Impacts, collecting empirical evidence on various AI risk claims. There’s a work-in-progress table here with the main things I’m tracking so far—additions and comments very welcome.