Maybe that doesn’t sound promising, but without having much knowledge in AI alignment, outer alignment sounds already like aligning human neural networks with an optimizer. And then to inner align you have to align the optimizer with an artificial neural network. This to me sound simpler: to align a type of NN with another.
But maybe it is wrong to think about the problem like that and the actual problem is easier.
Maybe that doesn’t sound promising, but without having much knowledge in AI alignment, outer alignment sounds already like aligning human neural networks with an optimizer. And then to inner align you have to align the optimizer with an artificial neural network. This to me sound simpler: to align a type of NN with another.
But maybe it is wrong to think about the problem like that and the actual problem is easier.