I don’t think the orthogonality thesis would have predicted GPT models, which become intelligent by mimicking human language, and learn about human values as a byproduct. The orthogonality thesis says that, in principle, any level of intelligence can be combined with any goal, but in practice the most intelligent systems we have are trained by mimicking human concepts.
On the other hand, after you train a language model, you can ask it or fine-tune it to pursue any goal you like. It will use human concepts that it learned from pretraining on natural language, but you can give it a new goal.
This seems pretty weak as an argument for something that seems pretty core to AGI risk arguments. Can we not get any empirical evidence ether way? Also, all the links in the “defence of the thesis” section are broken for me.
Thanks for reporting the broken links. It looks like a problem with the way Stampy is importing the LessWrong tag. Until the Stampy page is fixed, following the links from LessWrong should work.
What’s the strongest argument(s) for the orthogonality thesis, understandable to your average EA?
I don’t think the orthogonality thesis would have predicted GPT models, which become intelligent by mimicking human language, and learn about human values as a byproduct. The orthogonality thesis says that, in principle, any level of intelligence can be combined with any goal, but in practice the most intelligent systems we have are trained by mimicking human concepts.
On the other hand, after you train a language model, you can ask it or fine-tune it to pursue any goal you like. It will use human concepts that it learned from pretraining on natural language, but you can give it a new goal.
The FAQ response from Stampy is quite good here:
https://ui.stampy.ai?state=6568_
This seems pretty weak as an argument for something that seems pretty core to AGI risk arguments. Can we not get any empirical evidence ether way? Also, all the links in the “defence of the thesis” section are broken for me.
Thanks for reporting the broken links. It looks like a problem with the way Stampy is importing the LessWrong tag. Until the Stampy page is fixed, following the links from LessWrong should work.