So I’ll talk about things like specification gaming, negative side effects, robustness, interpretability, testing and evaluation challenges, security against adversaries, and social coordination failures or races to the bottom.
Do you have a list of specific examples of these risks?
Do you have a list of specific examples of these risks?
Some—see the links at the end of the post.
Would be really helpful to have this front and center.