You might think that “learning to reason from humans” doesn’t accomplish (2) because it makes the AI human-limited. If we want an advanced AI to help us create the kind of world that humans would want “if we knew more, thought faster, were more the people we wished we were” etc. then the approval of actual humans might, at some point, cease to be helpful.
A human can spend an hour on a task, and train an AI to do that task in milliseconds.
Similarly, an aligned AI can spend an hour on a task, and train its successor to do that task in milliseconds.
So you could hope to have a sequence of nice AI’s, each significantly smarter than the last, eventually reaching the limits of technology while still reasoning in a way that humans would endorse if they knew more and thought faster.
(This is the kind of approach I’ve outlined and am working on, and I think that most work along the lines of “learn from human reasoning” will make a similar move.)
A human can spend an hour on a task, and train an AI to do that task in milliseconds.
Similarly, an aligned AI can spend an hour on a task, and train its successor to do that task in milliseconds.
So you could hope to have a sequence of nice AI’s, each significantly smarter than the last, eventually reaching the limits of technology while still reasoning in a way that humans would endorse if they knew more and thought faster.
(This is the kind of approach I’ve outlined and am working on, and I think that most work along the lines of “learn from human reasoning” will make a similar move.)