Thanks, I found this post helpful, especially the diagram.
What (if any) is the overlap of cooperative AI […] and AI safety?
One thing I’ve thought about a little is the possiblility of there being a tension wherein making AIs more cooperative in certain ways might raise the chance that advanced collusion between AIs breaks an alignment scheme that would otherwise work.[1]
- ^
I’ve not written anything up on this and likely never will; I figure here is as good a place as any to leave a quick comment pointing to the potential problem, appreciating that it’s but a small piece in the overall landscape and probably not the problem of highest priority.
Perhaps this old comment from Rohin Shah could serve as the standard link?
(Note that it’s on the particular case of recommending people do/don’t work at a given org, rather than the general case of praise/criticism, but I don’t think this changes the structure of the argument other than maybe making point 1 less salient.)
Excerpting the relevant part: