[Question] Half-baked alignment idea

I’m trying think through various approaches to AI alignment, and so far this is the one I came up with that I like best. I have not read much of the literature, so please do point me if this has been discussed before.

What if we train an AI agent (ie, reinforcement learning) to survive/​thrive in an environment where there are a wide variety of agents with wildly different levels of intelligence? In particular, such that pretty much every agent can safely assume they’ll eventually meet an agent much smarter than they are; structure the environment to reward tit-for-tat with a significant bias towards cooperation, eg require agents to “eat” resources that require cooperation to secure and are primarily non-competitive. The idea is to have them learn to respect even beings of lesser intelligence, because they want beings of higher intelligence to respect them; and because in this environment a bunch of lesser intelligences can gang up and defeat one higher-intelligence being. Also, we effectively train each AI to detect and defeat new AIs that seek to disturb this balance. I have not thought this through, curious what you all think

No comments.