[Question] How would a language model become goal-directed?

Or what should I read to understand this?

It seems like some people expect descendants of large language models to pose a risk of becoming superintelligent agents. (By ‘descendants’ I mean adding scale and non-radical architectural changes: GPT-N.)

I accept that there’s no reason in principle that LLM intelligence (performance on tasks) should be capped at the human level.

But I don’t know why to believe that at some point language models would develop agency /​ goal-directed behaviour, where they start to try to achieve things in the real world instead of continuing to perform their ‘output predicted text’ behaviour.