I wasn’t even contrasting “moral alignment” with “aligning to the creator’s specific intent [i.e. his individual coherent extrapolated volition]”, but with just “aligning with what the creator explicitly specified at all in the first place” (“inner alignment”?), which is implicitly a solved problem in the paperclip maximizer thought experiment if the paperclip company can specify “make as many paperclips as possible”, and is very much not a solved problem in LLMs.
I wasn’t even contrasting “moral alignment” with “aligning to the creator’s specific intent [i.e. his individual coherent extrapolated volition]”, but with just “aligning with what the creator explicitly specified at all in the first place” (“inner alignment”?), which is implicitly a solved problem in the paperclip maximizer thought experiment if the paperclip company can specify “make as many paperclips as possible”, and is very much not a solved problem in LLMs.