Hi,
Is anyone aware of a reading list of mostly peer-reviewed journal articles and pre-prints for AI safety/alignment? I would like to start reading and citing more papers from this literature in my own papers.
Thanks in advance for any help :)
Zak
Hey, I’ve found this list really helpful, and the course that comes with it is great too. I’d suggest watching the course lecture video for a particular topic, then reading a few of the papers. Adversarial robustness and Trojans are the ones I found most interesting. https://course.mlsafety.org/readings/