Holden Karnofsky comments on How to make the best of the most important century?

Holden Karnofsky 6 Oct 2021 6:33 UTC
3 points
0 ∶ 0
I’m not sure whether you’re asking for academic literature on adversarial examples (I believe there is a lot) or for links discussing the link between adversarial examples and alignment (most topics about the “link between X and alignment” haven’t been written about a ton). The latter topic is discussed some in the recent paper Unsolved Problems in ML Safety and in An overview of 11 proposals for building safe advanced AI.