There’s a maybe naive way of seeing their plan that leads to this objection:
”Once we have AIs that are human-level AI alignment researchers, it’s already too late. That’s already very powerful and goal-directed general AI, and we’ll be screwed soon after we develop it, either because it’s dangerous in itself or because it zips past that capability level fast since it’s an AI researcher, after all.”
There’s a maybe naive way of seeing their plan that leads to this objection:
”Once we have AIs that are human-level AI alignment researchers, it’s already too late. That’s already very powerful and goal-directed general AI, and we’ll be screwed soon after we develop it, either because it’s dangerous in itself or because it zips past that capability level fast since it’s an AI researcher, after all.”
What do you make of it?