I think EA often comes with a certain kind of ontology (consequentialism, utilitarianism, generally thinking in terms of individuals) which is kind of reflected in the top-level problems given here (from the first list: persuasion, human power concentration, AI character and welfare) - not just the focus but the framing of what the problem even is.
I think there are nearby problems which are best understood from a slightly different ontology—how AI will affect cultural development, the shifting of power from individuals to emergent structures, what the possible shapes of identity for AIs even are—where coming in with too much of a utilitarian perspective could even be actively counterproductive
There’s an awkward dance here where adding a bunch of people to these areas who are mostly coming from that perspective could really warp the discussion, even if everyone is individually pretty reasonable and trying to seek the truth
To be fair to Will, I’m sort of saying this with my gradual disempowerment hat on, which is something he gives later as an example of a thing that it would be good for people to think about more. But still, speaking as someone who is working on a few of these topics, if I could press a button that doubled the number of people in all these areas but all of the new people skewed consequentialist, I don’t think I’d want to.
I guess the upshot is that if anyone feels like trying to shepherd EAs into working on this stuff, I’d encourage them to spend some time thinking about what common blindspots EAs might have.
My general take on gradual disempowerment, independent of any other issues raised here, is that I think it’s a coherent scenario, but that it ultimately is very unlikely to arise in practice, because it relies on an equilibrium where the sort of very imperfect alignment needed for divergence between human and AI interests to occur over the long-run being stable, even as the reasons for why the alignment problem in humans being very spotty/imperfect being stable get knocked out.
In particular, I’m relatively bullish on automated AI alignment conditional on non-power seeking/non-sandbagging if we give the AIs reward but misaligned human-level AI, so I generally think it quite rapidly resolves as either the AI is power-seeking and willing to sandbag/scheme on everything, leading to the classic AI takeover, or the AI is aligned to the principal in such a way that the principal-agency cost becomes essentially 0 over time.
Note I’m not claiming that most humans won’t be dead/disempowered, I’m just saying that I don’t think gradual disempowerment is worth spending much time/money on.
Your concern about EA’s consequentialist lens warping these fields resonates with what I found when experimenting with multi-AI deliberation on ethics. I had Claude, ChatGPT, Grok, and Gemini each propose ethical frameworks independently, and each one reflected its training philosophy—Grok was absolutist about truth-seeking, Claude cautious about harm, ChatGPT moderate and consensus-seeking.
The key insight: single perspectives hide their own assumptions. It’s only when you compare multiple approaches that the blindspots become visible.
This makes your point about EA flooding these areas with one ontology particularly concerning. If we’re trying to figure out “AI character” or “gradual disempowerment” through purely consequentialist framing, we might be encoding that bias into foundational work without realizing it.
Maybe the solution isn’t avoiding EA involvement, but structuring the work to force engagement with different philosophical traditions from the start? Like explicitly pairing consequentialists with virtue ethicists, deontologists, care ethicists, etc. in research teams. Or requiring papers to address “what would critics from X tradition say about this framing?”
Your “gradual disempowerment” example is perfect—this seems like it requires understanding emergent structures and collective identity in ways that individual-focused utilitarian thinking might miss entirely.
Would you say the risk is:
EA people not recognizing non-consequentialist framings as valid?
EA organizational culture making it uncomfortable to disagree with consequentialist assumptions?
Just sheer numbers overwhelming other perspectives in discourse?
Throwing in my 2c on this:
I think EA often comes with a certain kind of ontology (consequentialism, utilitarianism, generally thinking in terms of individuals) which is kind of reflected in the top-level problems given here (from the first list: persuasion, human power concentration, AI character and welfare) - not just the focus but the framing of what the problem even is.
I think there are nearby problems which are best understood from a slightly different ontology—how AI will affect cultural development, the shifting of power from individuals to emergent structures, what the possible shapes of identity for AIs even are—where coming in with too much of a utilitarian perspective could even be actively counterproductive
There’s an awkward dance here where adding a bunch of people to these areas who are mostly coming from that perspective could really warp the discussion, even if everyone is individually pretty reasonable and trying to seek the truth
To be fair to Will, I’m sort of saying this with my gradual disempowerment hat on, which is something he gives later as an example of a thing that it would be good for people to think about more. But still, speaking as someone who is working on a few of these topics, if I could press a button that doubled the number of people in all these areas but all of the new people skewed consequentialist, I don’t think I’d want to.
I guess the upshot is that if anyone feels like trying to shepherd EAs into working on this stuff, I’d encourage them to spend some time thinking about what common blindspots EAs might have.
My general take on gradual disempowerment, independent of any other issues raised here, is that I think it’s a coherent scenario, but that it ultimately is very unlikely to arise in practice, because it relies on an equilibrium where the sort of very imperfect alignment needed for divergence between human and AI interests to occur over the long-run being stable, even as the reasons for why the alignment problem in humans being very spotty/imperfect being stable get knocked out.
In particular, I’m relatively bullish on automated AI alignment conditional on non-power seeking/non-sandbagging if we give the AIs reward but misaligned human-level AI, so I generally think it quite rapidly resolves as either the AI is power-seeking and willing to sandbag/scheme on everything, leading to the classic AI takeover, or the AI is aligned to the principal in such a way that the principal-agency cost becomes essentially 0 over time.
Note I’m not claiming that most humans won’t be dead/disempowered, I’m just saying that I don’t think gradual disempowerment is worth spending much time/money on.
Tom Davidson has a longer post on this here.
Your concern about EA’s consequentialist lens warping these fields resonates with what I found when experimenting with multi-AI deliberation on ethics. I had Claude, ChatGPT, Grok, and Gemini each propose ethical frameworks independently, and each one reflected its training philosophy—Grok was absolutist about truth-seeking, Claude cautious about harm, ChatGPT moderate and consensus-seeking.
The key insight: single perspectives hide their own assumptions. It’s only when you compare multiple approaches that the blindspots become visible.
This makes your point about EA flooding these areas with one ontology particularly concerning. If we’re trying to figure out “AI character” or “gradual disempowerment” through purely consequentialist framing, we might be encoding that bias into foundational work without realizing it.
Maybe the solution isn’t avoiding EA involvement, but structuring the work to force engagement with different philosophical traditions from the start? Like explicitly pairing consequentialists with virtue ethicists, deontologists, care ethicists, etc. in research teams. Or requiring papers to address “what would critics from X tradition say about this framing?”
Your “gradual disempowerment” example is perfect—this seems like it requires understanding emergent structures and collective identity in ways that individual-focused utilitarian thinking might miss entirely.
Would you say the risk is:
EA people not recognizing non-consequentialist framings as valid?
EA organizational culture making it uncomfortable to disagree with consequentialist assumptions?
Just sheer numbers overwhelming other perspectives in discourse?