The question of how many digital minds there will plausibly be, and on what timeline, seems also quite important for many ethical ethical and strategic issues.
The question “Do AI safety and welfare conflict?“ seems not that useful to me, at least personally. When you have two related far-reaching issues (e.g. climate change mitigation and air pollution) there will always be a wide variety of tensions as well as complementary agendas. So, the general question has a trivial answer (“yes, sometimes“). We can look for specific trade-offs between AI safety and welfare but I don’t see why the AI safety vs. welfare lense would be more useful than looking for possible adverse effects of our interventions generally.
The way I think about the space, there are two key questions: 1. (as you say) What’s robustly good, under deep uncertainty? 2. What are the questions that matter the most where there is no robustly good action and what are their answers (e.g. whether prohibiting models with certain features X is good policy)?
Re digital minds numbers and timelines, I agree this is important and underexplored!
Re AI safety vs welfare: you’re right that the general “do they conflict?” question has a trivial answer, and I’ll rephrase this. But I want to explain why I framed it this way / what I had in mind. The reason is partly substantive and partly sociological/political.
Substantively, I think we should be looking for interventions that are robustly positive across both goals, ideally synergistic. (This is in line with what Rob, Jeff, and Toni argue in their paper.)
Sociologically and politically, AI safety and AI welfare have an unusually overlapping community: shared people, shared funders, shared intellectual lineage. I think it’s really important that these communities continue to work closely together and don’t end up doing things that undermine the other goal. I also worry about broader societal dynamics in the coming years where different groups push things they see as good for one goal but bad for the other (e.g. “humans should always dominate AIs” vs “we should grant AIs empowering rights now”).
So a better version of the question might be: “What’s robustly good for both AI safety and welfare?” or “How can AI safety and welfare work support each other, or at least not work against each other?”. Thoughts? I will think more and update the post.
Re your broader point (robustly good actions vs forced choices with no robustly good answer): this is interesting and I want to think it through more.
My initial reaction is that the second category is probably smaller than it looks. Before accepting that a question has no robustly good answer, we should think really carefully if there might be robustly good options that aren’t obvious, e.g., delaying the decision, keeping options open, investing in research to get better information. That said, sometimes there really is no action that doesn’t come with expected serious harm. In those cases I agree we should identify the most important ones (e.g. by stakes, irreversibility, timing) and analyze them carefully. Do you agree with this?
Re AI safety vs welfare: I agree with the substantive justification but don’t see a good reason to single AI safety vs. welfare out compared to AI welfare vs. AI welfare or AI welfare vs. some other important ethical goal. I think the same applies to your sociological justification but I am less sure there.
Re broader point: I am not sure I agree. Here are four statements that seem true to me (maybe to you too?) and perhaps capture most of what’s important here:
(i) There many different reasonable empirical and ethical assumptions/worldviews that can influence the evaluation of AI welfare interventions.
(ii) Many AI welfare interventions’ value will be sensitive to variation in these assumptions.
(iii) It’s almost always a bad idea to just do what’s best on one (or a small set) of these assumptions, rather than considering a wide range of reasonable assumptions.
(iv) There will often be cases where the overall-best intervention (per iii) is bad on some specific combinations of these assumptions, perhaps even very bad. (cluelessness worries seem relevant here)
Re AI safety vs welfare: You’re right that we could look at other pairings too. But I feel this one warrants specific attention: the same actors (e.g. labs) face both questions at once, often through the same technical choices (e.g. training or modifying an AI affects both safety and welfare); shared community, funders, and infrastructure between the two fields; politicization risk specific to this pairing (e.g. “AI rights vs humans first”); and both being among the highest-stakes issues from a longtermist perspective. I’m not saying there are no other important pairings or sub-pairings with AI welfare, but that AI welfare x safety is among the particularly important ones.
Re broader point: I agree that for almost any action that’s broadly positive, there will be some worldview combinations on which it’s negative. So in a strict sense, perfectly robust positivity is unattainable. That’s why I phrased it as “expected serious harm”, to allow for some residual harm under some assumptions. Though maybe even that doesn’t fully work. So I guess “find robustly good strategies” is best treated as a heuristic that rules out interventions that look good only on a narrow set of assumptions.
Great post!
Some quick thoughts:
The question of how many digital minds there will plausibly be, and on what timeline, seems also quite important for many ethical ethical and strategic issues.
The question “Do AI safety and welfare conflict?“ seems not that useful to me, at least personally. When you have two related far-reaching issues (e.g. climate change mitigation and air pollution) there will always be a wide variety of tensions as well as complementary agendas. So, the general question has a trivial answer (“yes, sometimes“). We can look for specific trade-offs between AI safety and welfare but I don’t see why the AI safety vs. welfare lense would be more useful than looking for possible adverse effects of our interventions generally.
The way I think about the space, there are two key questions: 1. (as you say) What’s robustly good, under deep uncertainty? 2. What are the questions that matter the most where there is no robustly good action and what are their answers (e.g. whether prohibiting models with certain features X is good policy)?
Thanks Leonard. This is helpful.
Re digital minds numbers and timelines, I agree this is important and underexplored!
Re AI safety vs welfare: you’re right that the general “do they conflict?” question has a trivial answer, and I’ll rephrase this. But I want to explain why I framed it this way / what I had in mind. The reason is partly substantive and partly sociological/political.
Substantively, I think we should be looking for interventions that are robustly positive across both goals, ideally synergistic. (This is in line with what Rob, Jeff, and Toni argue in their paper.)
Sociologically and politically, AI safety and AI welfare have an unusually overlapping community: shared people, shared funders, shared intellectual lineage. I think it’s really important that these communities continue to work closely together and don’t end up doing things that undermine the other goal. I also worry about broader societal dynamics in the coming years where different groups push things they see as good for one goal but bad for the other (e.g. “humans should always dominate AIs” vs “we should grant AIs empowering rights now”).
So a better version of the question might be: “What’s robustly good for both AI safety and welfare?” or “How can AI safety and welfare work support each other, or at least not work against each other?”. Thoughts? I will think more and update the post.
Re your broader point (robustly good actions vs forced choices with no robustly good answer): this is interesting and I want to think it through more.
My initial reaction is that the second category is probably smaller than it looks. Before accepting that a question has no robustly good answer, we should think really carefully if there might be robustly good options that aren’t obvious, e.g., delaying the decision, keeping options open, investing in research to get better information. That said, sometimes there really is no action that doesn’t come with expected serious harm. In those cases I agree we should identify the most important ones (e.g. by stakes, irreversibility, timing) and analyze them carefully. Do you agree with this?
Re AI safety vs welfare: I agree with the substantive justification but don’t see a good reason to single AI safety vs. welfare out compared to AI welfare vs. AI welfare or AI welfare vs. some other important ethical goal. I think the same applies to your sociological justification but I am less sure there.
Re broader point: I am not sure I agree. Here are four statements that seem true to me (maybe to you too?) and perhaps capture most of what’s important here:
(i) There many different reasonable empirical and ethical assumptions/worldviews that can influence the evaluation of AI welfare interventions.
(ii) Many AI welfare interventions’ value will be sensitive to variation in these assumptions.
(iii) It’s almost always a bad idea to just do what’s best on one (or a small set) of these assumptions, rather than considering a wide range of reasonable assumptions.
(iv) There will often be cases where the overall-best intervention (per iii) is bad on some specific combinations of these assumptions, perhaps even very bad. (cluelessness worries seem relevant here)
Re AI safety vs welfare: You’re right that we could look at other pairings too. But I feel this one warrants specific attention: the same actors (e.g. labs) face both questions at once, often through the same technical choices (e.g. training or modifying an AI affects both safety and welfare); shared community, funders, and infrastructure between the two fields; politicization risk specific to this pairing (e.g. “AI rights vs humans first”); and both being among the highest-stakes issues from a longtermist perspective. I’m not saying there are no other important pairings or sub-pairings with AI welfare, but that AI welfare x safety is among the particularly important ones.
Re broader point: I agree that for almost any action that’s broadly positive, there will be some worldview combinations on which it’s negative. So in a strict sense, perfectly robust positivity is unattainable. That’s why I phrased it as “expected serious harm”, to allow for some residual harm under some assumptions. Though maybe even that doesn’t fully work. So I guess “find robustly good strategies” is best treated as a heuristic that rules out interventions that look good only on a narrow set of assumptions.
Re AI safety vs welfare: Not sure I agree, but the justification does make sense to me.
Re broader point: Then we agree!