Thank you for writing this post. An evergreen difficulty that applies to discussing topics of such a broad scope is the large number of matters that are relevant, difficult to judge, and where one’s judgement (whatever it may be) can be reasonably challenged. I hope to offer a crisper summary of why I am not persuaded.
I understand from this the primary motivation of MCE is avoiding AI-based dystopias, with the implied causal chain being along the lines of, “If we ensure the humans generating the AI have a broader circle of moral concern, the resulting post-human civilization is less likely to include dystopic scenarios involving great multitudes of suffering sentiences.”
There are two considerations that speak against this being a greater priority than AI alignment research: 1) Back-chaining from AI dystopias leaves relatively few occasions where MCE would make a crucial difference. 2) The current portfolio of ‘EA-based’ MCE is poorly addressed to averting AI-based dystopias.
Re. 1): MCE may prove neither necessary nor sufficient for ensuring AI goes well. On one hand, AI designers, even if speciesist themselves, might nonetheless provide the right apparatus for value learning such that resulting AI will not propagate the moral mistakes of its creators. On the other, even if the AI-designers have the desired broad moral circle, they may have other crucial moral faults (maybe parochial in other respects, maybe selfish, maybe insufficiently reflective, maybe some mistaken particular moral judgements, maybe naive approaches to cooperation or population ethics, and so on) - even if they do not, there are manifold ways in the wider environment (e.g. arms races), or in terms of technical implementation, that may incur disaster.
It seems clear to me that, pro tanto, the less speciesist the AI-designer, the better the AI. Yet for this issue to be of such fundamental importance to be comparable to AI safety research generally, the implication is of an implausible doctrine of ‘AI immaculate conception’: only by ensuring we ourselves are free from sin can we conceive an AI which will not err in a morally important way.
Re 2): As Plant notes, MCE does not arise from animal causes alone: global poverty, climate change also act to extend moral circles, as well as propagating other valuable moral norms. Looking at things the other way, one should expect the animal causes found most valuable from the perspective of avoiding AI-based dystopia to diverge considerably from those picked on face-value animal welfare. Companion animal causes are far inferior from the latter perspective, but unclear on the former if this a good way of fostering concern for animals; if the crucial thing is for AI-creators not to be speciest over the general population, targeted interventions like ‘Start a petting zoo at Deepmind’ look better than broader ones, like the abolition of factory farming.
The upshot is that, even if there are some particularly high yield interventions in animal welfare from the far future perspective, this should be fairly far removed from typical EAA activity directed towards having the greatest near-term impact on animals. If this post heralds a pivot of Sentience Institute to directions pretty orthogonal to the principal component of effective animal advocacy, this would be welcome indeed.
Notwithstanding the above, the approach outlined above has a role to play in some ideal ‘far future portfolio’, and it may be reasonable for some people to prioritise work on this area, if only for reasons of comparative advantage. Yet I aver it should remain a fairly junior member of this portfolio compared to AI-safety work.
Those considerations make sense. I don’t have much more to add for/against than what I said in the post.
On the comparison between different MCE strategies, I’m pretty uncertain which are best. The main reasons I currently favor farmed animal advocacy over your examples (global poverty, environmentalism, and companion animals) are that (1) farmed animal advocacy is far more neglected, (2) farmed animal advocacy is far more similar to potential far future dystopias, mainly just because it involves vast numbers of sentient beings who are largely ignored by most of society. I’m not relatively very worried about, for example, far future dystopias where dog-and-cat-like-beings (e.g. small, entertaining AIs kept around for companionship) are suffering in vast numbers. And environmentalism is typically advocating for non-sentient beings, which I think is quite different than MCE for sentient beings.
I think the better competitors to farmed animal advocacy are advocating broadly for antispeciesism/fundamental rights (e.g. Nonhuman Rights Project) and advocating specifically for digital sentience (e.g. a larger, more sophisticated version of People for the Ethical Treatment of Reinforcement Learners). There are good arguments against these, however, such as that it would be quite difficult for an eager EA to get much traction with a new digital sentience nonprofit. (We considered founding Sentience Institute with a focus on digital sentience. This was a big reason we didn’t.) Whereas given the current excitement in the farmed animal space (e.g. the coming release of “clean meat,” real meat grown without animal slaughter), the farmed animal space seems like a fantastic place for gaining traction.
I’m currently not very excited about “Start a petting zoo at Deepmind” (or similar direct outreach strategies) because it seems like it would produce a ton of backlash because it seems too adversarial and aggressive. There are additional considerations for/against (e.g. I worry that it’d be difficult to push a niche demographic like AI researchers very far away from the rest of society, at least the rest of their social circles; I also have the same traction concern I have with advocating for digital sentience), but this one just seems quite damning.
The upshot is that, even if there are some particularly high yield interventions in animal welfare from the far future perspective, this should be fairly far removed from typical EAA activity directed towards having the greatest near-term impact on animals. If this post heralds a pivot of Sentience Institute to directions pretty orthogonal to the principal component of effective animal advocacy, this would be welcome indeed.
I agree this is a valid argument, but given the other arguments (e.g. those above), I still think it’s usually right for EAAs to focus on farmed animal advocacy, including Sentience Institute at least for the next year or two.
(FYI for readers, Gregory and I also discussed these things before the post was published when he gave feedback on the draft. So our comments might seem a little rehearsed.)
The main reasons I currently favor farmed animal advocacy over your examples (global poverty, environmentalism, and companion animals) are that (1) farmed animal advocacy is far more neglected, (2) farmed animal advocacy is far more similar to potential far future dystopias, mainly just because it involves vast numbers of sentient beings who are largely ignored by most of society.
Wild animal advocacy is far more neglected than farmed animal advocacy, and it involves even larger numbers of sentient beings ignored by most of society. If the superiority of farmed animal advocacy over global poverty along these two dimensions is a sufficient reason for not working on global poverty, why isn’t the superiority of wild animal advocacy over farmed animal advocacy along those same dimensions not also a sufficient reason for not working on farmed animal advocacy?
I personally don’t think WAS is as similar to the most plausible far future dystopias, so I’ve been prioritizing it less even over just the past couple of years. I don’t expect far future dystopias to involve as much naturogenic (nature-caused) suffering, though of course it’s possible (e.g. if humans create large numbers of sentient beings in a simulation, but then let the simulation run on its own for a while, then the simulation could come to be viewed as naturogenic-ish and those attitudes could become more relevant).
I think if one wants something very neglected, digital sentience advocacy is basically across-the-board better than WAS advocacy.
That being said, I’m highly uncertain here and these reasons aren’t overwhelming (e.g. WAS advocacy pushes on more than just the “care about naturogenic suffering” lever), so I think WAS advocacy is still, in Gregory’s words, an important part of the ‘far future portfolio.’ And often one can work on it while working on other things, e.g. I think Animal Charity Evaluators’ WAS content (e.g. ]guest blog post by Oscar Horta](https://animalcharityevaluators.org/blog/why-the-situation-of-animals-in-the-wild-should-concern-us/)) has helped them be more well-rounded as an organization, and didn’t directly trade off with their farmed animal content.
But humanity/AI is likely to expand to other planets. Won’t those planets need to have complex ecosystems that could involve a lot of suffering? Or do you think it will all be done with some fancy tech that’ll be too different from today’s wildlife for it to be relevant? It’s true that those ecosystems would (mostly?) be non-naturogenic but I’m not that sure that people would care about them, it’d still be animals/diseases/hunger.etc. hurting animals. Maybe it’d be easier to engineer an ecosystem without predation and diseases but that is a non-trivial assumption and suffering could then arise in other ways.
Also, some humans want to spread life to other planets for its own sake and relatively few people need to want that to cause a lot of suffering if no one works on preventing it.
This could be less relevant if you think that most of the expected value comes from simulations that won’t involve ecosystems.
Yes, terraforming is a big way in which close-to-WAS scenarios could arise. I do think it’s smaller in expectation than digital environments that develop on their own and thus are close-to-WAS.
I don’t think terraforming would be done very differently than today’s wildlife, e.g. done without predation and diseases.
Ultimately I still think the digital, not-close-to-WAS scenarios seem much larger in expectation.
In Stuart Russell’s Human Compatible (2019), he advocates for AGI to follow preference utilitarianism, maximally satisfying the values of humans. As for animal interests, he seems to think that they are sufficiently represented since he writes that they will be valued by the AI insofar as humans care about them. Reading this from Stuart Russell shifted me toward thinking that moral circle expansion probably does matter for the long-term future. It seems quite plausible (likely?) that AGI will follow this kind of value function which does not directly care about animals rather than broadly anti-speciesist values, since AI researchers are not generally anti-speciesists. In this case, moral circle expansion across the general population would be essential.
(Another factor is that Russell’s reward modeling depends on receiving feedback occasionally from humans to learn their preferences, which is much more difficult to do with animals. Thus, under an approach similar to reward modeling, AGI developers probably won’t bother to directly include animal preferences, when that involves all the extra work of figuring out how to get the AI to discern animal preferences. And how many AI researchers want to risk, say, mosquito interests overwhelming human interests?)
In comparison, if an AGI was planned to only care about the interests of people in, say, Western countries, that would instantly be widely decried as racist (at least in today’s Western societies) and likely not be developed. So while moral circle expansion encompasses caring about people in other countries, I’m less concerned that large groups of humans will not have their interests represented in the AGI’s values than I am about nonhuman animals.
It may be more cost-effective to have targeted approach of increasing anti-speciesism among AI researchers and doing anti-speciesist AI alignment philosophy/research (e.g., more details on how AI following preference utilitarianism can also intrinsically care about animal preferences, accounting for preferences of digital sentience given the problem that they can easily replicate and dominate preference calculations), but anti-speciesism among the general population still seems to be an important component of reducing risk of having a bad far future.
Ostensibly it seems like much of Sentience Institute’s (SI) current research is focused on identifying those MCE strategies which historically have turned out to be more effective among the strategies which have been tried. I think SI as an organization is based on the experience of EA as a movement in having significant success with MCE in a relatively short period of time. Successfully spreading the meme of effective giving; increasing concern for the far future in notable ways; and corporate animal welfare campaigns are all dramatic achievements for a young social movement like EA. While these aren’t on the scale of shaping MCE over the course of the far future, these achievements makes it seem more possible EA and allied movements can have an outsized impact by pursuing neglected strategies for values-spreading.
On terminology, to say the focus is on non-human animals, or even moral patients which typically come to mind when describing ‘animal-like’ minds, i.e., familiar vertebrates is inaccurate. “Sentient being”, “moral patient” or “non-human agents/beings” are terms which are inclusive of non-human animals, and other types of potential moral patients posited. Admittedly these aren’t catchy terms.
AI designers, even if speciesist themselves, might nonetheless provide the right apparatus for value learning such that resulting AI will not propagate the moral mistakes of its creators
This is something I also struggle with in understanding the post. it seems like we need:
AI creators can be convinced to expand their moral circle
Despite (1), they do not wish to be convinced to expand their moral circle
The AI follows this second desire to not be convinced to expand their moral circle
I imagine this happening with certain religious things; e.g. I could imagine someone saying “I wish to think the Bible is true even if I could be convinced that the Bible is false”.
But it seems relatively implausible with regards to MCE?
Particularly given that AI safety talks a lot about things like CEV, it is unclear to me whether there is really a strong trade-off between MCE and AIA.
(Note: Jacy and I discussed this via email and didn’t really come to a consensus, so there’s a good chance I am just misunderstanding his argument.)
Hm, yeah, I don’t think I fully understand you here either, and this seems somewhat different than what we discussed via email.
My concern is with (2) in your list. “[T]hey do not wish to be convinced to expand their moral circle” is extremely ambiguous to me. Presumably you mean they—without MCE advocacy being done—wouldn’t put in wide-MC* values or values that lead to wide-MC into an aligned AI. But I think it’s being conflated with, “they actively oppose” or “they would answer ‘no’ if asked, ‘Do you think your values are wrong when it comes to which moral beings deserve moral consideration?’”
I think they don’t actively oppose it, they would mostly answer “no” to that question, and it’s very uncertain if they will put the wide-MC-leading values into an aligned AI. I don’t think CEV or similar reflection processes reliably lead to wide moral circles. I think they can still be heavily influenced by their initial set-up (e.g. what the values of humanity when reflection begins).
This leads me to think that you only need (2) to be true in a very weak sense for MCE to matter. I think it’s quite plausible that this is the case.
*Wide-MC meaning an extremely wide moral circle, e.g. includes insects, small/weird digital minds.
I don’t think CEV or similar reflection processes reliably lead to wide moral circles. I think they can still be heavily influenced by their initial set-up (e.g. what the values of humanity when reflection begins).
Why do you think this is the case?
Do you think there is an alternative reflection process (either implemented by an AI, by a human society, or combination of both) that could be defined that would reliably lead to wide moral circles? Do you have any thoughts on what would it look like?
If we go through some kind of reflection process to determine our values, I would much rather have a reflection process that wasn’t dependent on whether or not MCE occurred before hand, and I think not leading to a wide moral circle should be considered a serious bug in any definition of a reflection process. It seems to me that working on producing this would be a plausible alternative or at least parallel path to directly performing MCE.
I think that there’s an inevitable tradeoff between wanting a reflection process to have certain properties and worries about this violating goal preservation for at least some people. This blogpost is not about MCE directly, but if you think of “BAAN thought experiment” as “we do moral reflection and the outcome is such a wide circle that most people think it is extremely counterintuitive” then the reasoning in large parts of the blogpost should apply perfectly to the discussion here.
That is not to say that trying to fine tune reflection processes is pointless: I think it’s very important to think about what our desiderata should be for a CEV-like reflection process. I’m just saying that there will be tradeoffs between certain commonly mentioned desiderata that people don’t realize are there because they think there is such a thing as “genuinely free and open-ended deliberation.”
Thanks for commenting, Lukas. I think Lukas, Brian Tomasik, and others affiliated with FRI have thought more about this, and I basically defer to their views here, especially because I haven’t heard any reasonable people disagree with this particular point. Namely, I agree with Lukas that there seems to be an inevitable tradeoff here.
I tend to think of moral values as being pretty contingent and pretty arbitrary, such that what values you start with makes a big difference to what values you end up with even on reflection. People may “imprint” on the values they receive from their culture to a greater or lesser degree.
I’m also skeptical that sophisticated philosophical-type reflection will have significant influence over posthuman values compared with more ordinary political/economic forces. I suppose philosophers have sometimes had big influences on human politics (religions, Marxism, the Enlightenment), though not necessarily in a clean “carefully consider lots of philosophical arguments and pick the best ones” kind of way.
I’d qualify this by adding that the philosophical-type reflection seems to lead in expectation to more moral value (positive or negative, e.g. hedonium or dolorium) than other forces, despite overall having less influence than those other forces.
Thank you for writing this post. An evergreen difficulty that applies to discussing topics of such a broad scope is the large number of matters that are relevant, difficult to judge, and where one’s judgement (whatever it may be) can be reasonably challenged. I hope to offer a crisper summary of why I am not persuaded.
I understand from this the primary motivation of MCE is avoiding AI-based dystopias, with the implied causal chain being along the lines of, “If we ensure the humans generating the AI have a broader circle of moral concern, the resulting post-human civilization is less likely to include dystopic scenarios involving great multitudes of suffering sentiences.”
There are two considerations that speak against this being a greater priority than AI alignment research: 1) Back-chaining from AI dystopias leaves relatively few occasions where MCE would make a crucial difference. 2) The current portfolio of ‘EA-based’ MCE is poorly addressed to averting AI-based dystopias.
Re. 1): MCE may prove neither necessary nor sufficient for ensuring AI goes well. On one hand, AI designers, even if speciesist themselves, might nonetheless provide the right apparatus for value learning such that resulting AI will not propagate the moral mistakes of its creators. On the other, even if the AI-designers have the desired broad moral circle, they may have other crucial moral faults (maybe parochial in other respects, maybe selfish, maybe insufficiently reflective, maybe some mistaken particular moral judgements, maybe naive approaches to cooperation or population ethics, and so on) - even if they do not, there are manifold ways in the wider environment (e.g. arms races), or in terms of technical implementation, that may incur disaster.
It seems clear to me that, pro tanto, the less speciesist the AI-designer, the better the AI. Yet for this issue to be of such fundamental importance to be comparable to AI safety research generally, the implication is of an implausible doctrine of ‘AI immaculate conception’: only by ensuring we ourselves are free from sin can we conceive an AI which will not err in a morally important way.
Re 2): As Plant notes, MCE does not arise from animal causes alone: global poverty, climate change also act to extend moral circles, as well as propagating other valuable moral norms. Looking at things the other way, one should expect the animal causes found most valuable from the perspective of avoiding AI-based dystopia to diverge considerably from those picked on face-value animal welfare. Companion animal causes are far inferior from the latter perspective, but unclear on the former if this a good way of fostering concern for animals; if the crucial thing is for AI-creators not to be speciest over the general population, targeted interventions like ‘Start a petting zoo at Deepmind’ look better than broader ones, like the abolition of factory farming.
The upshot is that, even if there are some particularly high yield interventions in animal welfare from the far future perspective, this should be fairly far removed from typical EAA activity directed towards having the greatest near-term impact on animals. If this post heralds a pivot of Sentience Institute to directions pretty orthogonal to the principal component of effective animal advocacy, this would be welcome indeed.
Notwithstanding the above, the approach outlined above has a role to play in some ideal ‘far future portfolio’, and it may be reasonable for some people to prioritise work on this area, if only for reasons of comparative advantage. Yet I aver it should remain a fairly junior member of this portfolio compared to AI-safety work.
Those considerations make sense. I don’t have much more to add for/against than what I said in the post.
On the comparison between different MCE strategies, I’m pretty uncertain which are best. The main reasons I currently favor farmed animal advocacy over your examples (global poverty, environmentalism, and companion animals) are that (1) farmed animal advocacy is far more neglected, (2) farmed animal advocacy is far more similar to potential far future dystopias, mainly just because it involves vast numbers of sentient beings who are largely ignored by most of society. I’m not relatively very worried about, for example, far future dystopias where dog-and-cat-like-beings (e.g. small, entertaining AIs kept around for companionship) are suffering in vast numbers. And environmentalism is typically advocating for non-sentient beings, which I think is quite different than MCE for sentient beings.
I think the better competitors to farmed animal advocacy are advocating broadly for antispeciesism/fundamental rights (e.g. Nonhuman Rights Project) and advocating specifically for digital sentience (e.g. a larger, more sophisticated version of People for the Ethical Treatment of Reinforcement Learners). There are good arguments against these, however, such as that it would be quite difficult for an eager EA to get much traction with a new digital sentience nonprofit. (We considered founding Sentience Institute with a focus on digital sentience. This was a big reason we didn’t.) Whereas given the current excitement in the farmed animal space (e.g. the coming release of “clean meat,” real meat grown without animal slaughter), the farmed animal space seems like a fantastic place for gaining traction.
I’m currently not very excited about “Start a petting zoo at Deepmind” (or similar direct outreach strategies) because it seems like it would produce a ton of backlash because it seems too adversarial and aggressive. There are additional considerations for/against (e.g. I worry that it’d be difficult to push a niche demographic like AI researchers very far away from the rest of society, at least the rest of their social circles; I also have the same traction concern I have with advocating for digital sentience), but this one just seems quite damning.
I agree this is a valid argument, but given the other arguments (e.g. those above), I still think it’s usually right for EAAs to focus on farmed animal advocacy, including Sentience Institute at least for the next year or two.
(FYI for readers, Gregory and I also discussed these things before the post was published when he gave feedback on the draft. So our comments might seem a little rehearsed.)
Wild animal advocacy is far more neglected than farmed animal advocacy, and it involves even larger numbers of sentient beings ignored by most of society. If the superiority of farmed animal advocacy over global poverty along these two dimensions is a sufficient reason for not working on global poverty, why isn’t the superiority of wild animal advocacy over farmed animal advocacy along those same dimensions not also a sufficient reason for not working on farmed animal advocacy?
I personally don’t think WAS is as similar to the most plausible far future dystopias, so I’ve been prioritizing it less even over just the past couple of years. I don’t expect far future dystopias to involve as much naturogenic (nature-caused) suffering, though of course it’s possible (e.g. if humans create large numbers of sentient beings in a simulation, but then let the simulation run on its own for a while, then the simulation could come to be viewed as naturogenic-ish and those attitudes could become more relevant).
I think if one wants something very neglected, digital sentience advocacy is basically across-the-board better than WAS advocacy.
That being said, I’m highly uncertain here and these reasons aren’t overwhelming (e.g. WAS advocacy pushes on more than just the “care about naturogenic suffering” lever), so I think WAS advocacy is still, in Gregory’s words, an important part of the ‘far future portfolio.’ And often one can work on it while working on other things, e.g. I think Animal Charity Evaluators’ WAS content (e.g. ]guest blog post by Oscar Horta](https://animalcharityevaluators.org/blog/why-the-situation-of-animals-in-the-wild-should-concern-us/)) has helped them be more well-rounded as an organization, and didn’t directly trade off with their farmed animal content.
But humanity/AI is likely to expand to other planets. Won’t those planets need to have complex ecosystems that could involve a lot of suffering? Or do you think it will all be done with some fancy tech that’ll be too different from today’s wildlife for it to be relevant? It’s true that those ecosystems would (mostly?) be non-naturogenic but I’m not that sure that people would care about them, it’d still be animals/diseases/hunger.etc. hurting animals. Maybe it’d be easier to engineer an ecosystem without predation and diseases but that is a non-trivial assumption and suffering could then arise in other ways.
Also, some humans want to spread life to other planets for its own sake and relatively few people need to want that to cause a lot of suffering if no one works on preventing it.
This could be less relevant if you think that most of the expected value comes from simulations that won’t involve ecosystems.
Yes, terraforming is a big way in which close-to-WAS scenarios could arise. I do think it’s smaller in expectation than digital environments that develop on their own and thus are close-to-WAS.
I don’t think terraforming would be done very differently than today’s wildlife, e.g. done without predation and diseases.
Ultimately I still think the digital, not-close-to-WAS scenarios seem much larger in expectation.
In Stuart Russell’s Human Compatible (2019), he advocates for AGI to follow preference utilitarianism, maximally satisfying the values of humans. As for animal interests, he seems to think that they are sufficiently represented since he writes that they will be valued by the AI insofar as humans care about them. Reading this from Stuart Russell shifted me toward thinking that moral circle expansion probably does matter for the long-term future. It seems quite plausible (likely?) that AGI will follow this kind of value function which does not directly care about animals rather than broadly anti-speciesist values, since AI researchers are not generally anti-speciesists. In this case, moral circle expansion across the general population would be essential.
(Another factor is that Russell’s reward modeling depends on receiving feedback occasionally from humans to learn their preferences, which is much more difficult to do with animals. Thus, under an approach similar to reward modeling, AGI developers probably won’t bother to directly include animal preferences, when that involves all the extra work of figuring out how to get the AI to discern animal preferences. And how many AI researchers want to risk, say, mosquito interests overwhelming human interests?)
In comparison, if an AGI was planned to only care about the interests of people in, say, Western countries, that would instantly be widely decried as racist (at least in today’s Western societies) and likely not be developed. So while moral circle expansion encompasses caring about people in other countries, I’m less concerned that large groups of humans will not have their interests represented in the AGI’s values than I am about nonhuman animals.
It may be more cost-effective to have targeted approach of increasing anti-speciesism among AI researchers and doing anti-speciesist AI alignment philosophy/research (e.g., more details on how AI following preference utilitarianism can also intrinsically care about animal preferences, accounting for preferences of digital sentience given the problem that they can easily replicate and dominate preference calculations), but anti-speciesism among the general population still seems to be an important component of reducing risk of having a bad far future.
Thanks for funding this research. Notes:
Ostensibly it seems like much of Sentience Institute’s (SI) current research is focused on identifying those MCE strategies which historically have turned out to be more effective among the strategies which have been tried. I think SI as an organization is based on the experience of EA as a movement in having significant success with MCE in a relatively short period of time. Successfully spreading the meme of effective giving; increasing concern for the far future in notable ways; and corporate animal welfare campaigns are all dramatic achievements for a young social movement like EA. While these aren’t on the scale of shaping MCE over the course of the far future, these achievements makes it seem more possible EA and allied movements can have an outsized impact by pursuing neglected strategies for values-spreading.
On terminology, to say the focus is on non-human animals, or even moral patients which typically come to mind when describing ‘animal-like’ minds, i.e., familiar vertebrates is inaccurate. “Sentient being”, “moral patient” or “non-human agents/beings” are terms which are inclusive of non-human animals, and other types of potential moral patients posited. Admittedly these aren’t catchy terms.
This is something I also struggle with in understanding the post. it seems like we need:
AI creators can be convinced to expand their moral circle
Despite (1), they do not wish to be convinced to expand their moral circle
The AI follows this second desire to not be convinced to expand their moral circle
I imagine this happening with certain religious things; e.g. I could imagine someone saying “I wish to think the Bible is true even if I could be convinced that the Bible is false”.
But it seems relatively implausible with regards to MCE?
Particularly given that AI safety talks a lot about things like CEV, it is unclear to me whether there is really a strong trade-off between MCE and AIA.
(Note: Jacy and I discussed this via email and didn’t really come to a consensus, so there’s a good chance I am just misunderstanding his argument.)
Hm, yeah, I don’t think I fully understand you here either, and this seems somewhat different than what we discussed via email.
My concern is with (2) in your list. “[T]hey do not wish to be convinced to expand their moral circle” is extremely ambiguous to me. Presumably you mean they—without MCE advocacy being done—wouldn’t put in wide-MC* values or values that lead to wide-MC into an aligned AI. But I think it’s being conflated with, “they actively oppose” or “they would answer ‘no’ if asked, ‘Do you think your values are wrong when it comes to which moral beings deserve moral consideration?’”
I think they don’t actively oppose it, they would mostly answer “no” to that question, and it’s very uncertain if they will put the wide-MC-leading values into an aligned AI. I don’t think CEV or similar reflection processes reliably lead to wide moral circles. I think they can still be heavily influenced by their initial set-up (e.g. what the values of humanity when reflection begins).
This leads me to think that you only need (2) to be true in a very weak sense for MCE to matter. I think it’s quite plausible that this is the case.
*Wide-MC meaning an extremely wide moral circle, e.g. includes insects, small/weird digital minds.
Why do you think this is the case? Do you think there is an alternative reflection process (either implemented by an AI, by a human society, or combination of both) that could be defined that would reliably lead to wide moral circles? Do you have any thoughts on what would it look like?
If we go through some kind of reflection process to determine our values, I would much rather have a reflection process that wasn’t dependent on whether or not MCE occurred before hand, and I think not leading to a wide moral circle should be considered a serious bug in any definition of a reflection process. It seems to me that working on producing this would be a plausible alternative or at least parallel path to directly performing MCE.
I think that there’s an inevitable tradeoff between wanting a reflection process to have certain properties and worries about this violating goal preservation for at least some people. This blogpost is not about MCE directly, but if you think of “BAAN thought experiment” as “we do moral reflection and the outcome is such a wide circle that most people think it is extremely counterintuitive” then the reasoning in large parts of the blogpost should apply perfectly to the discussion here.
That is not to say that trying to fine tune reflection processes is pointless: I think it’s very important to think about what our desiderata should be for a CEV-like reflection process. I’m just saying that there will be tradeoffs between certain commonly mentioned desiderata that people don’t realize are there because they think there is such a thing as “genuinely free and open-ended deliberation.”
Thanks for commenting, Lukas. I think Lukas, Brian Tomasik, and others affiliated with FRI have thought more about this, and I basically defer to their views here, especially because I haven’t heard any reasonable people disagree with this particular point. Namely, I agree with Lukas that there seems to be an inevitable tradeoff here.
I tend to think of moral values as being pretty contingent and pretty arbitrary, such that what values you start with makes a big difference to what values you end up with even on reflection. People may “imprint” on the values they receive from their culture to a greater or lesser degree.
I’m also skeptical that sophisticated philosophical-type reflection will have significant influence over posthuman values compared with more ordinary political/economic forces. I suppose philosophers have sometimes had big influences on human politics (religions, Marxism, the Enlightenment), though not necessarily in a clean “carefully consider lots of philosophical arguments and pick the best ones” kind of way.
I’d qualify this by adding that the philosophical-type reflection seems to lead in expectation to more moral value (positive or negative, e.g. hedonium or dolorium) than other forces, despite overall having less influence than those other forces.