Terminological point: It sounds like you’re using the phrase “instrumental convergence” in an unusual way.
I take it the typical idea is just that there are some instrumental goals that an intelligent agent can expect to be useful in the pursuit of a wide range of other goals, whereas you seem to be emphasizing the idea that those instrumental goals would be pursued to extremes destructive of humanity. It seems to me that (1) those two two ideas are worth keeping separate, (2) “instrumental convergence” would more accurately label the first idea, and (3) that phrase is in fact usually used to refer to the first idea only.
This occurred to me as I was skimming the post and saw the suggestion that instrumental convergence is not seen in humans, to which my reaction was, “What?! Don’t people like money?”
Part of your question here seems to be, “If we can design a system that understands goals written in natural language, won’t it be very unlikely to deviate from what we really wanted when we wrote the goal?” Regarding that point, I’m not an expert, but I’ll point to some discussion by experts.
There are, as you may have seen, lists of examples where real AI systems have done things completely different from what their designers were intending. For example, this talk, in the section on Goodhart’s law, has a link to such a list. But from what I can tell, those examples never involve the designers specifying goals in natural language. (I’m guessing that specifying goals that way hasn’t seemed even faintly possible until recently, so nobody’s really tried it?)
Here’s a recent paper by academic philosophers that seems supportive of your question. The authors argue that AGI systems that involve large language models would be safer than alternative systems precisely because they could receive goals written in natural language. (See especially the two sections titled “reward misspecification”—though note also the last paragraph, where they suggest it might be a better idea to avoid goal-directed AI altogether.) If you want more details on whether that suggestion is correct, you might keep an eye on reactions to this paper. There are some comments on the LessWrong post, and I see the paper was submitted for a contest.