CarlShulman answers What is an example of recent, tangible progress in AI safety research?

CarlShulman 15 Jun 2021 4:11 UTC
12 points
0 ∶ 0
Focusing on empirical results:

Learning to summarize from human feedback was good, for several reasons.

I liked the recent paper empirically demonstrating objective robustness failures hypothesized in earlier theoretical work on inner alignment.
- Mark Xu 15 Jun 2021 4:50 UTC
  4 points
  0 ∶ 0
  Parent
  nit: link on “reasons” was pasted twice. For others it’s https://www.lesswrong.com/posts/PZtsoaoSLpKjjbMqM/the-case-for-aligning-narrowly-superhuman-models
  
  Also hadn’t seen that paper. Thanks!