capybaralet comments on Why AI alignment could be hard with modern deep learning

capybaralet 28 Sep 2021 16:37 UTC
10 points
0 ∶ 0
Great post!

This framing doesn’t seem to capture the concern that even slight misspecification (e.g. a reward function that is a bit off) could lead to x-catastrophe.

I think this is a big part of many people’s concerns, including mine.

This seems somewhat orthogonal to the Saint/Sycophant/Schemer disjunction… or to put it another way, it seems like a Saint that is just not quite right about what your interests actually are (e.g. because they have alien biology and culture) could still be an x-risk.

Thoughts?