Re AI safety vs welfare: I agree with the substantive justification but don’t see a good reason to single AI safety vs. welfare out compared to AI welfare vs. AI welfare or AI welfare vs. some other important ethical goal. I think the same applies to your sociological justification but I am less sure there.
Re broader point: I am not sure I agree. Here are four statements that seem true to me (maybe to you too?) and perhaps capture most of what’s important here:
(i) There many different reasonable empirical and ethical assumptions/worldviews that can influence the evaluation of AI welfare interventions.
(ii) Many AI welfare interventions’ value will be sensitive to variation in these assumptions.
(iii) It’s almost always a bad idea to just do what’s best on one (or a small set) of these assumptions, rather than considering a wide range of reasonable assumptions.
(iv) There will often be cases where the overall-best intervention (per iii) is bad on some specific combinations of these assumptions, perhaps even very bad. (cluelessness worries seem relevant here)
Re AI safety vs welfare: You’re right that we could look at other pairings too. But I feel this one warrants specific attention: the same actors (e.g. labs) face both questions at once, often through the same technical choices (e.g. training or modifying an AI affects both safety and welfare); shared community, funders, and infrastructure between the two fields; politicization risk specific to this pairing (e.g. “AI rights vs humans first”); and both being among the highest-stakes issues from a longtermist perspective. I’m not saying there are no other important pairings or sub-pairings with AI welfare, but that AI welfare x safety is among the particularly important ones.
Re broader point: I agree that for almost any action that’s broadly positive, there will be some worldview combinations on which it’s negative. So in a strict sense, perfectly robust positivity is unattainable. That’s why I phrased it as “expected serious harm”, to allow for some residual harm under some assumptions. Though maybe even that doesn’t fully work. So I guess “find robustly good strategies” is best treated as a heuristic that rules out interventions that look good only on a narrow set of assumptions.
Re AI safety vs welfare: I agree with the substantive justification but don’t see a good reason to single AI safety vs. welfare out compared to AI welfare vs. AI welfare or AI welfare vs. some other important ethical goal. I think the same applies to your sociological justification but I am less sure there.
Re broader point: I am not sure I agree. Here are four statements that seem true to me (maybe to you too?) and perhaps capture most of what’s important here:
(i) There many different reasonable empirical and ethical assumptions/worldviews that can influence the evaluation of AI welfare interventions.
(ii) Many AI welfare interventions’ value will be sensitive to variation in these assumptions.
(iii) It’s almost always a bad idea to just do what’s best on one (or a small set) of these assumptions, rather than considering a wide range of reasonable assumptions.
(iv) There will often be cases where the overall-best intervention (per iii) is bad on some specific combinations of these assumptions, perhaps even very bad. (cluelessness worries seem relevant here)
Re AI safety vs welfare: You’re right that we could look at other pairings too. But I feel this one warrants specific attention: the same actors (e.g. labs) face both questions at once, often through the same technical choices (e.g. training or modifying an AI affects both safety and welfare); shared community, funders, and infrastructure between the two fields; politicization risk specific to this pairing (e.g. “AI rights vs humans first”); and both being among the highest-stakes issues from a longtermist perspective. I’m not saying there are no other important pairings or sub-pairings with AI welfare, but that AI welfare x safety is among the particularly important ones.
Re broader point: I agree that for almost any action that’s broadly positive, there will be some worldview combinations on which it’s negative. So in a strict sense, perfectly robust positivity is unattainable. That’s why I phrased it as “expected serious harm”, to allow for some residual harm under some assumptions. Though maybe even that doesn’t fully work. So I guess “find robustly good strategies” is best treated as a heuristic that rules out interventions that look good only on a narrow set of assumptions.
Re AI safety vs welfare: Not sure I agree, but the justification does make sense to me.
Re broader point: Then we agree!