Ajeya comments on Why AI alignment could be hard with modern deep learning

Ajeya 9 Nov 2021 21:11 UTC
2 points
0 ∶ 0
I was imagining Sycophants as an outer alignment failure, assuming the model is trained with naive RL from human feedback.