My general take on gradual disempowerment, independent of any other issues raised here, is that I think it’s a coherent scenario, but that it ultimately is very unlikely to arise in practice, because it relies on an equilibrium where the sort of very imperfect alignment needed for divergence between human and AI interests to occur over the long-run being stable, even as the reasons for why the alignment problem in humans being very spotty/imperfect being stable get knocked out.
In particular, I’m relatively bullish on automated AI alignment conditional on non-power seeking/non-sandbagging if we give the AIs reward but misaligned human-level AI, so I generally think it quite rapidly resolves as either the AI is power-seeking and willing to sandbag/scheme on everything, leading to the classic AI takeover, or the AI is aligned to the principal in such a way that the principal-agency cost becomes essentially 0 over time.
Note I’m not claiming that most humans won’t be dead/disempowered, I’m just saying that I don’t think gradual disempowerment is worth spending much time/money on.
My general take on gradual disempowerment, independent of any other issues raised here, is that I think it’s a coherent scenario, but that it ultimately is very unlikely to arise in practice, because it relies on an equilibrium where the sort of very imperfect alignment needed for divergence between human and AI interests to occur over the long-run being stable, even as the reasons for why the alignment problem in humans being very spotty/imperfect being stable get knocked out.
In particular, I’m relatively bullish on automated AI alignment conditional on non-power seeking/non-sandbagging if we give the AIs reward but misaligned human-level AI, so I generally think it quite rapidly resolves as either the AI is power-seeking and willing to sandbag/scheme on everything, leading to the classic AI takeover, or the AI is aligned to the principal in such a way that the principal-agency cost becomes essentially 0 over time.
Note I’m not claiming that most humans won’t be dead/disempowered, I’m just saying that I don’t think gradual disempowerment is worth spending much time/money on.
Tom Davidson has a longer post on this here.