Davidmanheim answers What predictions from theoretical AI Safety research have been confirmed by empirical work?

Davidmanheim Dec 29, 2024, 7:46 PM
1 point
0 ∶ 0
Not predictions as such, but lots of current work on AI safety and steering is based pretty directly on paradigms from Yudkowsky and Christiano—from Anthropic’s constitutional AI to ARIA’s Safeguarded AI program. There is also OpenAI’s Superalignment reserach, which was attempting to build AI that could solve agent foundations—that is, explicitly do the work that theoretical AI safety research identified. (I’m unclear whether the last is ongoing or not, given that they managed to alienate most of the people involved.)