Daniel_Dewey comments on My current thoughts on MIRI’s “highly reliable agent design” work

Daniel_Dewey 10 Jul 2017 19:30 UTC
0 points
0 ∶ 0
My guess is that the capability is extremely likely, and the main difficulties are motivation and reliability of learning (since in other learning tasks we might be satisfied with lower reliability that gets better over time, but in learning human preferences unreliable learning could result in a lot more harm).