You can’t just ask the AI to “be good”, because the whole problem is getting the AI to do what you mean instead of what you ask. But what if you asked the AI to “make itself smart”? On the one hand, instrumental convergence implies that the AI should make itself smart. On the other hand, the AI will misunderstand what you mean, hence not making itself smart. Can you point the way out of this seeming contradiction?
(Under the background assumptions already being made in the scenario where you can “ask things” to “the AI”:) If you try to tell the AI to be smart, but fail and instead give it some other goal (let’s call it being smart’), then in the process of becoming smart’ it will also try to become smart, because no matter what smart’ actually specifies, becoming smart will still be helpful for that. But if you want it to be good and mistakenly tell it to be good’, it’s unlikely that being good will be helpful for being good’.
(Under the background assumptions already being made in the scenario where you can “ask things” to “the AI”:) If you try to tell the AI to be smart, but fail and instead give it some other goal (let’s call it being smart’), then in the process of becoming smart’ it will also try to become smart, because no matter what smart’ actually specifies, becoming smart will still be helpful for that. But if you want it to be good and mistakenly tell it to be good’, it’s unlikely that being good will be helpful for being good’.