Re AI safety vs welfare: You’re right that we could look at other pairings too. But I feel this one warrants specific attention: the same actors (e.g. labs) face both questions at once, often through the same technical choices (e.g. training or modifying an AI affects both safety and welfare); shared community, funders, and infrastructure between the two fields; politicization risk specific to this pairing (e.g. “AI rights vs humans first”); and both being among the highest-stakes issues from a longtermist perspective. I’m not saying there are no other important pairings or sub-pairings with AI welfare, but that AI welfare x safety is among the particularly important ones.
Re broader point: I agree that for almost any action that’s broadly positive, there will be some worldview combinations on which it’s negative. So in a strict sense, perfectly robust positivity is unattainable. That’s why I phrased it as “expected serious harm”, to allow for some residual harm under some assumptions. Though maybe even that doesn’t fully work. So I guess “find robustly good strategies” is best treated as a heuristic that rules out interventions that look good only on a narrow set of assumptions.
Re AI safety vs welfare: You’re right that we could look at other pairings too. But I feel this one warrants specific attention: the same actors (e.g. labs) face both questions at once, often through the same technical choices (e.g. training or modifying an AI affects both safety and welfare); shared community, funders, and infrastructure between the two fields; politicization risk specific to this pairing (e.g. “AI rights vs humans first”); and both being among the highest-stakes issues from a longtermist perspective. I’m not saying there are no other important pairings or sub-pairings with AI welfare, but that AI welfare x safety is among the particularly important ones.
Re broader point: I agree that for almost any action that’s broadly positive, there will be some worldview combinations on which it’s negative. So in a strict sense, perfectly robust positivity is unattainable. That’s why I phrased it as “expected serious harm”, to allow for some residual harm under some assumptions. Though maybe even that doesn’t fully work. So I guess “find robustly good strategies” is best treated as a heuristic that rules out interventions that look good only on a narrow set of assumptions.
Re AI safety vs welfare: Not sure I agree, but the justification does make sense to me.
Re broader point: Then we agree!