I think there’s something interesting to this argument, although I think it may be relying on a frame where AI systems are natural agents, in particular at this step:
a strategically and philosophically competent AI should seemingly have its own moral uncertainty and pursue its own “option value maximization” rather than blindly serve human interests/values/intent
It’s not clear to me why the key functions couldn’t be more separated, or whether the conflict you’re pointing persists across such separation. For instance, we might have a mix of:
Systems which competently pursue philosophy research (but do not have a sense of self that they are acting with regard to)
Systems which are strategic (including drawing on the fruits of the philosophy research), on behalf of human institutions or individuals
Systems which are instruction-following tools (which don’t aspire to philosophical competence), rather than independent agents
I mean “not clear to me” very literally here—I think that perhaps some version of your conflict will pose a challenge to such setups. But I’m responding with this alternate frame in the hope that it will be useful in advancing the conversation.
I think there’s something interesting to this argument, although I think it may be relying on a frame where AI systems are natural agents, in particular at this step:
It’s not clear to me why the key functions couldn’t be more separated, or whether the conflict you’re pointing persists across such separation. For instance, we might have a mix of:
Systems which competently pursue philosophy research (but do not have a sense of self that they are acting with regard to)
Systems which are strategic (including drawing on the fruits of the philosophy research), on behalf of human institutions or individuals
Systems which are instruction-following tools (which don’t aspire to philosophical competence), rather than independent agents
I mean “not clear to me” very literally here—I think that perhaps some version of your conflict will pose a challenge to such setups. But I’m responding with this alternate frame in the hope that it will be useful in advancing the conversation.