abergal comments on Why AI alignment could be hard with modern deep learning

abergal 27 Sep 2021 18:04 UTC
10 points
0 ∶ 0
Another potential reason for optimism is that we’ll be able to use observations from early on in the training runs of systems (before models are very smart) to affect the pool of Saints / Sycophants / Schemers we end up with. I.e., we are effectively “raising” the adults we hire, so it could be that we’re able to detect if 8-year-olds are likely to become Sycophants / Schemers as adults and discontinue or modify their training accordingly.