Maybe I should write something about cultivating self-skepticism for an EA audience, in the meantime here’s my old LW post How To Be More Confident… That You’re Wrong. (On reflection I’m pretty doubtful these suggestions actually work well enough. I think my own self-skepticism mostly came from working in cryptography research in my early career, where relatively short feedback cycles, e.g. someone finding a clear flaw in an idea you thought secure or your own attempts to pre-empt this, repeatedly bludgeon overconfidence out of you. This probably can’t be easily duplicated, unlike the post suggests.)
I don’t call myself an EA, as I’m pretty skeptical of Singer-style impartial altruism. I’m a bit wary about making EA the hub for working on “making the AI transition go well” for a couple of reasons:
It gives the impression that one needs to be particularly altruistic to find these problems interesting or instrumental.
EA selects for people who are especially altruistic, which from my perspective is a sign of philosophical overconfidence. (I exclude people like Will who have talked explicitly about their uncertainties, but think EA overall probably still attracts people who are too certain about a specific kind of altruism being right.) This is probably fine or even a strength for many causes, but potentially a problem in a field that depends very heavily on making real philosophical progress and having good philosophical judgment.
I think EA often comes with a certain kind of ontology (consequentialism, utilitarianism, generally thinking in terms of individuals) which is kind of reflected in the top-level problems given here (from the first list: persuasion, human power concentration, AI character and welfare) - not just the focus but the framing of what the problem even is.
I think there are nearby problems which are best understood from a slightly different ontology—how AI will affect cultural development, the shifting of power from individuals to emergent structures, what the possible shapes of identity for AIs even are—where coming in with too much of a utilitarian perspective could even be actively counterproductive
There’s an awkward dance here where adding a bunch of people to these areas who are mostly coming from that perspective could really warp the discussion, even if everyone is individually pretty reasonable and trying to seek the truth
To be fair to Will, I’m sort of saying this with my gradual disempowerment hat on, which is something he gives later as an example of a thing that it would be good for people to think about more. But still, speaking as someone who is working on a few of these topics, if I could press a button that doubled the number of people in all these areas but all of the new people skewed consequentialist, I don’t think I’d want to.
I guess the upshot is that if anyone feels like trying to shepherd EAs into working on this stuff, I’d encourage them to spend some time thinking about what common blindspots EAs might have.
My general take on gradual disempowerment, independent of any other issues raised here, is that I think it’s a coherent scenario, but that it ultimately is very unlikely to arise in practice, because it relies on an equilibrium where the sort of very imperfect alignment needed for divergence between human and AI interests to occur over the long-run being stable, even as the reasons for why the alignment problem in humans being very spotty/imperfect being stable get knocked out.
In particular, I’m relatively bullish on automated AI alignment conditional on non-power seeking/non-sandbagging if we give the AIs reward but misaligned human-level AI, so I generally think it quite rapidly resolves as either the AI is power-seeking and willing to sandbag/scheme on everything, leading to the classic AI takeover, or the AI is aligned to the principal in such a way that the principal-agency cost becomes essentially 0 over time.
Note I’m not claiming that most humans won’t be dead/disempowered, I’m just saying that I don’t think gradual disempowerment is worth spending much time/money on.
Your concern about EA’s consequentialist lens warping these fields resonates with what I found when experimenting with multi-AI deliberation on ethics. I had Claude, ChatGPT, Grok, and Gemini each propose ethical frameworks independently, and each one reflected its training philosophy—Grok was absolutist about truth-seeking, Claude cautious about harm, ChatGPT moderate and consensus-seeking.
The key insight: single perspectives hide their own assumptions. It’s only when you compare multiple approaches that the blindspots become visible.
This makes your point about EA flooding these areas with one ontology particularly concerning. If we’re trying to figure out “AI character” or “gradual disempowerment” through purely consequentialist framing, we might be encoding that bias into foundational work without realizing it.
Maybe the solution isn’t avoiding EA involvement, but structuring the work to force engagement with different philosophical traditions from the start? Like explicitly pairing consequentialists with virtue ethicists, deontologists, care ethicists, etc. in research teams. Or requiring papers to address “what would critics from X tradition say about this framing?”
Your “gradual disempowerment” example is perfect—this seems like it requires understanding emergent structures and collective identity in ways that individual-focused utilitarian thinking might miss entirely.
Would you say the risk is:
EA people not recognizing non-consequentialist framings as valid?
EA organizational culture making it uncomfortable to disagree with consequentialist assumptions?
Just sheer numbers overwhelming other perspectives in discourse?
making EA the hub for working on “making the AI transition go well”
I don’t think EA should be THE hub. In an ideal world, loads of people and different groups would be working on these issues. But at the moment, really almost no one is. So the question is whether it’s better if, given that, EA does work on it, and at least some work gets done. I think yes.
(Analogy: was it good or bad that in the earlier days, there was some work on AI alignment, even though that work was almost exclusively done by EA/rationalist types?)
A couple more thoughts on this.
Maybe I should write something about cultivating self-skepticism for an EA audience, in the meantime here’s my old LW post How To Be More Confident… That You’re Wrong. (On reflection I’m pretty doubtful these suggestions actually work well enough. I think my own self-skepticism mostly came from working in cryptography research in my early career, where relatively short feedback cycles, e.g. someone finding a clear flaw in an idea you thought secure or your own attempts to pre-empt this, repeatedly bludgeon overconfidence out of you. This probably can’t be easily duplicated, unlike the post suggests.)
I don’t call myself an EA, as I’m pretty skeptical of Singer-style impartial altruism. I’m a bit wary about making EA the hub for working on “making the AI transition go well” for a couple of reasons:
It gives the impression that one needs to be particularly altruistic to find these problems interesting or instrumental.
EA selects for people who are especially altruistic, which from my perspective is a sign of philosophical overconfidence. (I exclude people like Will who have talked explicitly about their uncertainties, but think EA overall probably still attracts people who are too certain about a specific kind of altruism being right.) This is probably fine or even a strength for many causes, but potentially a problem in a field that depends very heavily on making real philosophical progress and having good philosophical judgment.
Throwing in my 2c on this:
I think EA often comes with a certain kind of ontology (consequentialism, utilitarianism, generally thinking in terms of individuals) which is kind of reflected in the top-level problems given here (from the first list: persuasion, human power concentration, AI character and welfare) - not just the focus but the framing of what the problem even is.
I think there are nearby problems which are best understood from a slightly different ontology—how AI will affect cultural development, the shifting of power from individuals to emergent structures, what the possible shapes of identity for AIs even are—where coming in with too much of a utilitarian perspective could even be actively counterproductive
There’s an awkward dance here where adding a bunch of people to these areas who are mostly coming from that perspective could really warp the discussion, even if everyone is individually pretty reasonable and trying to seek the truth
To be fair to Will, I’m sort of saying this with my gradual disempowerment hat on, which is something he gives later as an example of a thing that it would be good for people to think about more. But still, speaking as someone who is working on a few of these topics, if I could press a button that doubled the number of people in all these areas but all of the new people skewed consequentialist, I don’t think I’d want to.
I guess the upshot is that if anyone feels like trying to shepherd EAs into working on this stuff, I’d encourage them to spend some time thinking about what common blindspots EAs might have.
My general take on gradual disempowerment, independent of any other issues raised here, is that I think it’s a coherent scenario, but that it ultimately is very unlikely to arise in practice, because it relies on an equilibrium where the sort of very imperfect alignment needed for divergence between human and AI interests to occur over the long-run being stable, even as the reasons for why the alignment problem in humans being very spotty/imperfect being stable get knocked out.
In particular, I’m relatively bullish on automated AI alignment conditional on non-power seeking/non-sandbagging if we give the AIs reward but misaligned human-level AI, so I generally think it quite rapidly resolves as either the AI is power-seeking and willing to sandbag/scheme on everything, leading to the classic AI takeover, or the AI is aligned to the principal in such a way that the principal-agency cost becomes essentially 0 over time.
Note I’m not claiming that most humans won’t be dead/disempowered, I’m just saying that I don’t think gradual disempowerment is worth spending much time/money on.
Tom Davidson has a longer post on this here.
Your concern about EA’s consequentialist lens warping these fields resonates with what I found when experimenting with multi-AI deliberation on ethics. I had Claude, ChatGPT, Grok, and Gemini each propose ethical frameworks independently, and each one reflected its training philosophy—Grok was absolutist about truth-seeking, Claude cautious about harm, ChatGPT moderate and consensus-seeking.
The key insight: single perspectives hide their own assumptions. It’s only when you compare multiple approaches that the blindspots become visible.
This makes your point about EA flooding these areas with one ontology particularly concerning. If we’re trying to figure out “AI character” or “gradual disempowerment” through purely consequentialist framing, we might be encoding that bias into foundational work without realizing it.
Maybe the solution isn’t avoiding EA involvement, but structuring the work to force engagement with different philosophical traditions from the start? Like explicitly pairing consequentialists with virtue ethicists, deontologists, care ethicists, etc. in research teams. Or requiring papers to address “what would critics from X tradition say about this framing?”
Your “gradual disempowerment” example is perfect—this seems like it requires understanding emergent structures and collective identity in ways that individual-focused utilitarian thinking might miss entirely.
Would you say the risk is:
EA people not recognizing non-consequentialist framings as valid?
EA organizational culture making it uncomfortable to disagree with consequentialist assumptions?
Just sheer numbers overwhelming other perspectives in discourse?
I don’t think EA should be THE hub. In an ideal world, loads of people and different groups would be working on these issues. But at the moment, really almost no one is. So the question is whether it’s better if, given that, EA does work on it, and at least some work gets done. I think yes.
(Analogy: was it good or bad that in the earlier days, there was some work on AI alignment, even though that work was almost exclusively done by EA/rationalist types?)