This framing doesn’t seem to capture the concern that even slight misspecification (e.g. a reward function that is a bit off) could lead to x-catastrophe.
I think this is a big part of many people’s concerns, including mine.
This seems somewhat orthogonal to the Saint/Sycophant/Schemer disjunction… or to put it another way, it seems like a Saint that is just not quite right about what your interests actually are (e.g. because they have alien biology and culture) could still be an x-risk.
Great post!
This framing doesn’t seem to capture the concern that even slight misspecification (e.g. a reward function that is a bit off) could lead to x-catastrophe.
I think this is a big part of many people’s concerns, including mine.
This seems somewhat orthogonal to the Saint/Sycophant/Schemer disjunction… or to put it another way, it seems like a Saint that is just not quite right about what your interests actually are (e.g. because they have alien biology and culture) could still be an x-risk.
Thoughts?