I generally agree with this post that it is a good target. This is effectively equivalent to Coherent Extrapolated Volition[1], which has been proposed as a target for alignment. The difference is whether you consider extrapolated values by default or not.
I generally agree with this post that it is a good target. This is effectively equivalent to Coherent Extrapolated Volition[1], which has been proposed as a target for alignment. The difference is whether you consider extrapolated values by default or not.
https://www.lesswrong.com/w/coherent-extrapolated-volition-alignment-target