I think there’s something interesting to this argument, although I think it may be relying on a frame where AI systems are natural agents, in particular at this step:
a strategically and philosophically competent AI should seemingly have its own moral uncertainty and pursue its own “option value maximization” rather than blindly serve human interests/values/intent
It’s not clear to me why the key functions couldn’t be more separated, or whether the conflict you’re pointing persists across such separation. For instance, we might have a mix of:
Systems which competently pursue philosophy research (but do not have a sense of self that they are acting with regard to)
Systems which are strategic (including drawing on the fruits of the philosophy research), on behalf of human institutions or individuals
Systems which are instruction-following tools (which don’t aspire to philosophical competence), rather than independent agents
I mean “not clear to me” very literally here—I think that perhaps some version of your conflict will pose a challenge to such setups. But I’m responding with this alternate frame in the hope that it will be useful in advancing the conversation.
I’m not sure. I think there are versions of things here which are definitely not convergence (straightforward acausal trade between people who understand their own values is of this type), but I have some feeling like there might be extra reasons for convergence from people observing the host, and having that fact feed into their own reflective process.
(Indeed, I’m not totally sure there’s a clean line between convergence and trade.)