I don’t think the orthogonality thesis would have predicted GPT models, which become intelligent by mimicking human language, and learn about human values as a byproduct. The orthogonality thesis says that, in principle, any level of intelligence can be combined with any goal, but in practice the most intelligent systems we have are trained by mimicking human concepts.
On the other hand, after you train a language model, you can ask it or fine-tune it to pursue any goal you like. It will use human concepts that it learned from pretraining on natural language, but you can give it a new goal.
I don’t think the orthogonality thesis would have predicted GPT models, which become intelligent by mimicking human language, and learn about human values as a byproduct. The orthogonality thesis says that, in principle, any level of intelligence can be combined with any goal, but in practice the most intelligent systems we have are trained by mimicking human concepts.
On the other hand, after you train a language model, you can ask it or fine-tune it to pursue any goal you like. It will use human concepts that it learned from pretraining on natural language, but you can give it a new goal.