AI designers, even if speciesist themselves, might nonetheless provide the right apparatus for value learning such that resulting AI will not propagate the moral mistakes of its creators
This is something I also struggle with in understanding the post. it seems like we need:
AI creators can be convinced to expand their moral circle
Despite (1), they do not wish to be convinced to expand their moral circle
The AI follows this second desire to not be convinced to expand their moral circle
I imagine this happening with certain religious things; e.g. I could imagine someone saying âI wish to think the Bible is true even if I could be convinced that the Bible is falseâ.
But it seems relatively implausible with regards to MCE?
Particularly given that AI safety talks a lot about things like CEV, it is unclear to me whether there is really a strong trade-off between MCE and AIA.
(Note: Jacy and I discussed this via email and didnât really come to a consensus, so thereâs a good chance I am just misunderstanding his argument.)
Hm, yeah, I donât think I fully understand you here either, and this seems somewhat different than what we discussed via email.
My concern is with (2) in your list. â[T]hey do not wish to be convinced to expand their moral circleâ is extremely ambiguous to me. Presumably you mean theyâwithout MCE advocacy being doneâwouldnât put in wide-MC* values or values that lead to wide-MC into an aligned AI. But I think itâs being conflated with, âthey actively opposeâ or âthey would answer ânoâ if asked, âDo you think your values are wrong when it comes to which moral beings deserve moral consideration?ââ
I think they donât actively oppose it, they would mostly answer ânoâ to that question, and itâs very uncertain if they will put the wide-MC-leading values into an aligned AI. I donât think CEV or similar reflection processes reliably lead to wide moral circles. I think they can still be heavily influenced by their initial set-up (e.g. what the values of humanity when reflection begins).
This leads me to think that you only need (2) to be true in a very weak sense for MCE to matter. I think itâs quite plausible that this is the case.
*Wide-MC meaning an extremely wide moral circle, e.g. includes insects, small/âweird digital minds.
I donât think CEV or similar reflection processes reliably lead to wide moral circles. I think they can still be heavily influenced by their initial set-up (e.g. what the values of humanity when reflection begins).
Why do you think this is the case?
Do you think there is an alternative reflection process (either implemented by an AI, by a human society, or combination of both) that could be defined that would reliably lead to wide moral circles? Do you have any thoughts on what would it look like?
If we go through some kind of reflection process to determine our values, I would much rather have a reflection process that wasnât dependent on whether or not MCE occurred before hand, and I think not leading to a wide moral circle should be considered a serious bug in any definition of a reflection process. It seems to me that working on producing this would be a plausible alternative or at least parallel path to directly performing MCE.
I think that thereâs an inevitable tradeoff between wanting a reflection process to have certain properties and worries about this violating goal preservation for at least some people. This blogpost is not about MCE directly, but if you think of âBAAN thought experimentâ as âwe do moral reflection and the outcome is such a wide circle that most people think it is extremely counterintuitiveâ then the reasoning in large parts of the blogpost should apply perfectly to the discussion here.
That is not to say that trying to fine tune reflection processes is pointless: I think itâs very important to think about what our desiderata should be for a CEV-like reflection process. Iâm just saying that there will be tradeoffs between certain commonly mentioned desiderata that people donât realize are there because they think there is such a thing as âgenuinely free and open-ended deliberation.â
Thanks for commenting, Lukas. I think Lukas, Brian Tomasik, and others affiliated with FRI have thought more about this, and I basically defer to their views here, especially because I havenât heard any reasonable people disagree with this particular point. Namely, I agree with Lukas that there seems to be an inevitable tradeoff here.
I tend to think of moral values as being pretty contingent and pretty arbitrary, such that what values you start with makes a big difference to what values you end up with even on reflection. People may âimprintâ on the values they receive from their culture to a greater or lesser degree.
Iâm also skeptical that sophisticated philosophical-type reflection will have significant influence over posthuman values compared with more ordinary political/âeconomic forces. I suppose philosophers have sometimes had big influences on human politics (religions, Marxism, the Enlightenment), though not necessarily in a clean âcarefully consider lots of philosophical arguments and pick the best onesâ kind of way.
Iâd qualify this by adding that the philosophical-type reflection seems to lead in expectation to more moral value (positive or negative, e.g. hedonium or dolorium) than other forces, despite overall having less influence than those other forces.
This is something I also struggle with in understanding the post. it seems like we need:
AI creators can be convinced to expand their moral circle
Despite (1), they do not wish to be convinced to expand their moral circle
The AI follows this second desire to not be convinced to expand their moral circle
I imagine this happening with certain religious things; e.g. I could imagine someone saying âI wish to think the Bible is true even if I could be convinced that the Bible is falseâ.
But it seems relatively implausible with regards to MCE?
Particularly given that AI safety talks a lot about things like CEV, it is unclear to me whether there is really a strong trade-off between MCE and AIA.
(Note: Jacy and I discussed this via email and didnât really come to a consensus, so thereâs a good chance I am just misunderstanding his argument.)
Hm, yeah, I donât think I fully understand you here either, and this seems somewhat different than what we discussed via email.
My concern is with (2) in your list. â[T]hey do not wish to be convinced to expand their moral circleâ is extremely ambiguous to me. Presumably you mean theyâwithout MCE advocacy being doneâwouldnât put in wide-MC* values or values that lead to wide-MC into an aligned AI. But I think itâs being conflated with, âthey actively opposeâ or âthey would answer ânoâ if asked, âDo you think your values are wrong when it comes to which moral beings deserve moral consideration?ââ
I think they donât actively oppose it, they would mostly answer ânoâ to that question, and itâs very uncertain if they will put the wide-MC-leading values into an aligned AI. I donât think CEV or similar reflection processes reliably lead to wide moral circles. I think they can still be heavily influenced by their initial set-up (e.g. what the values of humanity when reflection begins).
This leads me to think that you only need (2) to be true in a very weak sense for MCE to matter. I think itâs quite plausible that this is the case.
*Wide-MC meaning an extremely wide moral circle, e.g. includes insects, small/âweird digital minds.
Why do you think this is the case? Do you think there is an alternative reflection process (either implemented by an AI, by a human society, or combination of both) that could be defined that would reliably lead to wide moral circles? Do you have any thoughts on what would it look like?
If we go through some kind of reflection process to determine our values, I would much rather have a reflection process that wasnât dependent on whether or not MCE occurred before hand, and I think not leading to a wide moral circle should be considered a serious bug in any definition of a reflection process. It seems to me that working on producing this would be a plausible alternative or at least parallel path to directly performing MCE.
I think that thereâs an inevitable tradeoff between wanting a reflection process to have certain properties and worries about this violating goal preservation for at least some people. This blogpost is not about MCE directly, but if you think of âBAAN thought experimentâ as âwe do moral reflection and the outcome is such a wide circle that most people think it is extremely counterintuitiveâ then the reasoning in large parts of the blogpost should apply perfectly to the discussion here.
That is not to say that trying to fine tune reflection processes is pointless: I think itâs very important to think about what our desiderata should be for a CEV-like reflection process. Iâm just saying that there will be tradeoffs between certain commonly mentioned desiderata that people donât realize are there because they think there is such a thing as âgenuinely free and open-ended deliberation.â
Thanks for commenting, Lukas. I think Lukas, Brian Tomasik, and others affiliated with FRI have thought more about this, and I basically defer to their views here, especially because I havenât heard any reasonable people disagree with this particular point. Namely, I agree with Lukas that there seems to be an inevitable tradeoff here.
I tend to think of moral values as being pretty contingent and pretty arbitrary, such that what values you start with makes a big difference to what values you end up with even on reflection. People may âimprintâ on the values they receive from their culture to a greater or lesser degree.
Iâm also skeptical that sophisticated philosophical-type reflection will have significant influence over posthuman values compared with more ordinary political/âeconomic forces. I suppose philosophers have sometimes had big influences on human politics (religions, Marxism, the Enlightenment), though not necessarily in a clean âcarefully consider lots of philosophical arguments and pick the best onesâ kind of way.
Iâd qualify this by adding that the philosophical-type reflection seems to lead in expectation to more moral value (positive or negative, e.g. hedonium or dolorium) than other forces, despite overall having less influence than those other forces.