Given this, my worry is that expressing things like “EA aims to be maximizing in the second sense only” may be kind of gaslight-y to some people’s experience (although I agree that other people will think it’s a fair summary of the message they personally understood).
Owen Cotton-Barratt
The Choice Transition
A brief history of the automated corporation
I largely agree with this, but I feel like your tone is too dismissive of the issue here? Like: the problem is that the maximizing mindset (encouraged by EA), applied to the question of how much to apply the maximizing mindset, says to go all in. This isn’t getting communicated explicitly in EA materials, but I think it’s an implicit message which many people receive. And although I think that it’s unhealthy to think that way, I don’t think people are dumb for receiving this message; I think it’s a pretty natural principled answer to reach, and the alternative answers feel unprincipled.
On the types of maximization: I think different pockets of EA are in different places on this. I think it’s not unusual, at least historically, for subcultures to have some degree of lionization of 1). And there’s a natural internal logic to this: if doing some good well is good, surely doing more is better?
On the potential conflicts between ethics and self-interest: I agree that it’s important to be nuanced in how this is discussed.
But:
-
I think there’s a bunch of stuff here which isn’t just about those conflicts, and that there is likely potential for improvements which are good on both prudential and impartial grounds.
-
Navigating real tensions is tricky, because we want to be cooperative in how we sell the ideas. cf. https://forum.effectivealtruism.org/posts/C665bLMZcMJy922fk/what-is-valuable-about-effective-altruism-implications-for
-
I really appreciated this post. I don’t agree with all of it, but I think that it’s an earnest exploration of some important and subtle boundaries.
The section of the post that I found most helpful was “EA ideology fosters unsafe judgment and intolerance”. Within that, the point that I found most striking was: that there’s a tension in how language gets used in ethical frameworks and in mental wellbeing frameworks, and people often aren’t well equipped with the tools to handle those tensions. This … basically just seems correct? And seems like a really good dynamic for people to be tracking.
Something which I kind of wish you’d explored a bit more is ways in which EA may be helpful for people’s mental health. You get at that a bit when talking about how/why it appeals to people, and seem to acknowledge that there are ways in which it can be healthy for people to engage, but I think that we’ll get faster to a better/deeper understanding of the dynamics if we try to look honestly at the ways in which it can be good for people as well as bad, as well as what levels of tradeoff in terms of potentially being bad for people are worth accepting (I think the correct answer will be “a little bit”, in that there’s no way to avoid all harms without just not being in the space at all, and I think that would be a clear mistake for EA; though I am also inclined to think that the correct answer is “somewhat less than at present”).
Winners of the Essay competition on the Automation of Wisdom and Philosophy
Yep.
it sounds like you see weak philosophical competence as being part of intent alignment, is that correct?
Ah, no, that’s not correct.
I’m saying that weak philosophical competence would:
Be useful enough for acting in the world, and in principle testable-for, that I expect it be developed as a form of capability before strong superintelligence
Be useful for research on how to produce intent-aligned systems
… and therefore that if we’ve been managing to keep things more or less intent aligned up to the point where we have systems which are weakly philosophical competent, it’s less likely that we have a failure of intent alignment thereafter. (Not impossible, but I think a pretty small fraction of the total risk.)
Yeah, I appreciated your question, because I’d also not managed to unpack the distinction I was making here until you asked.
On the minor issue: right, I think that for some particular domain(s), you could surely train a system to be highly competent in that domain without this generalizing to even weak philosophical competence overall. But if you had a system which was strong at both of those domains despite not having been trained on them, and especially if that was also true for say three more comparable domains, I guess I kind of do expect it to be good at the general thing? (I haven’t thought long about that.)
It’s not clear we have too much disagreement, but let me unpack what I meant:
Let strong philosophical competence mean competence at all philosophical questions, including those like metaethics which really don’t seem to have any empirical grounding
I’m not trying to make any claims about strong philosophical competence
I might be a little more optimistic than you about getting this by default as a generalization of weak philosophical competence (see below), but I’m still pretty worried that we won’t get it, and I didn’t mean to rely on it in my statements in this post
Let weak philosophical competence mean competence at reasoning about complex questions which ultimately have empirical answers, where it’s out of reach to test them empirically, but one may get better predictions from finding clear frameworks for thinking about them
I claim that by the time systems approach strong superintelligence, they’re likely to have a degree of weak philosophical competence
Because:
It would be useful for many tasks, and this would likely be apparent to mild superintelligent systems
It can be selected for empirically (seeing which training approaches etc. do well at weak philosophical competence in toy settings, where the experimenters have access to the ground truth about the questions they’re having the systems use philosophical reasoning to approach)
I further claim that weak philosophical competence is what you need to be able to think about how to build stronger AI systems that are, roughly speaking, safe, or intent aligned
Because this is ultimately an empirical question (“would this AI do something an informed version of me / those humans would ultimately regard as terrible?”)
I don’t claim that this would extend to being able to think about how to build stronger AI systems that it would be safe to make sovereigns
AI safety tax dynamics
Safety tax functions
IDK, structurally your argument here reminds me of arguments that we shouldn’t assume animals are conscious, since we can only generalise from human experiences. (In both cases I feel like there’s not nothing to the argument, but I’m overall pretty uncompelled.)
Why do you think the individual is the level at which conscious experience happens?
(I tend to imagine that it happens at a range of scales, including both smaller-than-individual and bigger-than-individual. I don’t see why we should generalise from our experience to the idea that individual organisms are the right boundary to draw. I put some reasonable weight on some small degree of consciousness occurring at very small levels like neurons, although that’s more like “my intuitive concept of consciousness wasn’t expansive enough, and the correct concept extends here”).
Update: I think I’d actually be less positive on it than this if I thought their antagonism might splash back on other people.
I took that not to be a relevant part of the hypothetical, but actually I’m not so sure. I think for people in the community, it’s creating a public good (for the community) to police their mistakes, so I’m not inclined to let error-filled things slide for the sake of the positives. For people outside the community, I’m not so invested in building up the social fabric, so it doesn’t seem worth trying to punish the errors, so the right move seems to be something like more straightforwardly looking for the good bits.
I guess I think it’s likely some middle ground? I don’t think he has a clear conceptual understanding of moral credit, but I do think he’s tuning in to ways in which EA claims may be exaggerating the impact people can have. I find it quite easy to believe that’s motivated by some desire to make EA look bad—but so what? If people who want to make EA look bad make for good researchers hunting for (potentially-substantive) issues, so much the better.
I agree that Wenar’s reasoning on this is confused, and that he doesn’t have a clear idea of how it’s supposed to work.
I do think that he’s in some reasonable way gesturing at the core issue, even if he doesn’t say very sensible things about how to address that issue.
And yeah, that’s the rough shape of the steelman position I have in mind. I wrote a little about my takes here; sorry I’ve not got anything more comprehensive: https://forum.effectivealtruism.org/posts/rWoT7mABXTfkCdHvr/jp-s-shortform?commentId=ArPTtZQbngqJ6KSMo
I wonder if the example is weakened by the last sentence:
Right now I feel like this is a hard question. But it doesn’t feel like an impossibly intractable one. I think if the forum spent a week debating this question you’d get some coherent positions staked out—where after the debate it would still be unreasonable to be very confident in either answer, but it wouldn’t seem crazy to think that the balance of probabilities suggested favouring one course of action over the other.
This makes me notice that the cats and dogs question feels different only in degree, not kind. I think if you had a bunch of good thinkers consider it in earnest for some months, they wouldn’t come out indifferent. I’d hazard that it would probably be worth >$0.01 (in expectation, on longtermist welfarist grounds) to pay to switch which kind of shelter the billions went to. But I doubt it would be worth >$100. And at that point it wouldn’t be worth the analysis to get to the answer.