Next to the counterpoints mentioned by Gregory Lewis, I think there is an additional reason why MCE seems less effective than more targeted interventions to improve the quality of the long-term future: Gains from trade between humans with different values become easier to implement as the reach of technology increases. As long as a non-trivial fraction of humans end up caring about animal wellbeing or digital minds, it seems likely it would be cheap for other coalitions to offer trades. So whether 10% of future people end up with an expanded moral circle or 100% may not make much of a difference to the outcome: It will be reasonably good either way if people reap the gains from trade.
One might object that it is unlikely that humans would be able to cooperate efficiently, given that we don’t see this type of cooperation happening today. However, I think it’s reasonable to assume that staying in control of technological progress beyond the AGI transition requires a degree of wisdom and foresight that is very far away from where most societal groups are at today. And if humans do stay in control, then finding a good solution for value disagreements may be the easier problem, or at worst similarly hard. So it feels to me that most likely, we either we get a future that goes badly for reasons related to lack of coordination and sophistication in the pre-AGI stage, or we get a future where humans set things up wisely enough to actually design an outcome that is nice (or at least not amongst the 10% of worst outcomes) by the lights of nearly everyone.
Brian Tomasik made the point that conditional on human values staying in control, we may be very unlikely to get something like broad moral reflection. Instead, values could be determined by a very small group of individuals who happened to be in power by the time AGI arrives (as opposed to individuals ending up there because they were unusually foresighted and also morally motivated). This feels possible too, but it seems to not be the likely default to me because I suspect that you’d need to necessarily increase your philosophical sophistication in order to stay in control of AGI, and that probably gives you more pleasant outcomes (correlational claim). Iterated amplification for instance, as an approach to AI alignment, has several uses for humans: Humans are not only where the resulting values come from, but they’re also in charge of keeping the bootstrapping process on track and corrigible. And as this post on factored cognition illustrates, this requires sophistication to set up. So if that’s the bar that AGI creators need to pass before they can determine how “human values” are to be extrapolated, maybe we shouldn’t be too pessimistic about the outcome. It seems kind of unlikely that someone would go through all of that only to be like “I’m going to implement my personal best guess about what matters to me, with little further reflection, and no other humans get a say here.” Similarly, it also feels unlikely that people would go through with all that and not find a way to make subparts of the population reasonably content about how sentient subroutines are going to be used.
Now, I feel a bit confused about the feasibility of AI alignment if you were to do it somewhat sloppily and with lower standards. I think that there’s a spectrum from “it just wouldn’t work at all and not be competitive” (and then people would have to try some other approach) to “it would produce a capable AGI but it would be vulnerable to failure modes like adversarial exploits or optimization daemons, and so it would end up with not human values”. These failure modes, to the very small degree I currently understand them, sound like they would not be sensitive to whether the human whose approval you tried to approximate had an expanded moral circle or not. I might be wrong about that. If people mostly want sophisticated alignment procedures because they care about preserving the option for philosophical reflection, rather than because they also think that you simply run into large failure modes otherwise, then it seems like (conditional on some kind of value alignment) whether we get an outcome with broad moral reflection is not so clear. If it’s technically easier to build value-aligned AI with very parochial values, then MCE could make a relevant difference to these non-reflection outcomes.
But all in all my argument is that it’s somewhat strange to assume that a group of people could succeed at building an AGI optimized for its creators’ values, without having to put in so much thinking about how to get this outcome right that they’d almost can’t help but become reasonably philosophically sophisticated in the process. And sure, philosophically sophisticated people can still have fairly strange values by your own lights, but it seems like there’s more convergence. Plus I’d at least be optimistic about their propensity to strive towards positive-sum outcomes, given how little scarcity you’d have if the transition does go well.
Of course, maybe value-alignment is going to work very differently from what people currently think. The main way I’d criticize my above points is that they’re based on heavy-handed inside-view thinking about how difficult I (and others I’m updating towards) expect the AGI transition to be. If AGI will be more like the Industrial Revolution rather than something that is even more difficult to stay remotely in control of, or if some other technology proves to be more consequential than AGI, then my argument has less force. I mainly see this as yet another reason to caveat that the ex ante plausible-seeming position that MCE can have a strong impact on AGI outcomes starts to feel more and more conjunctive the more you zoom in and try to identify concrete pathways.
Interesting points. :) I think there could be substantial differences in policy between 10% support and 100% support for MCE depending on the costs of appeasing this faction and how passionate it is. Or between 1% and 10% support for MCE applied to more fringe entities.
philosophically sophisticated people can still have fairly strange values by your own lights, but it seems like there’s more convergence.
I’m not sure if sophistication increases convergence. :) If anything, people who think more about philosophy tend to diverge more and more from commonsense moral assumptions.
Yudkowsky and I seem to share the same metaphysics of consciousness and have both thought about the topic in depth, yet we occupy almost antipodal positions on the question of how many entities we consider moral patients. I tend to assume that one’s starting points matter a lot for what views one ends up with.
I agree with this. It seems like the world where Moral Circle Expansion is useful is the world where:
The creators of AI are philosophically sophisticated (or persuadable) enough to expand their moral circle if they are exposed to the right arguments or work is put into persuading them.
They are not philosophically sophisticated enough to realize the arguments for expanding the moral circle on their own (seems plausible).
They are not philosophically sophisticated enough to realize that they might want to consider a distribution of arguments that they could have faced and could have persuaded them about what is morally right, and design AI with this in mind (ie CEV), or with the goal of achieving a period of reflection where they can sort out the sort of arguments that they would want to consider.
I think I’d prefer pushing on point 3, as it also encompasses a bunch of other potential philosophical mistakes that AI creators could make.
Next to the counterpoints mentioned by Gregory Lewis, I think there is an additional reason why MCE seems less effective than more targeted interventions to improve the quality of the long-term future: Gains from trade between humans with different values become easier to implement as the reach of technology increases. As long as a non-trivial fraction of humans end up caring about animal wellbeing or digital minds, it seems likely it would be cheap for other coalitions to offer trades. So whether 10% of future people end up with an expanded moral circle or 100% may not make much of a difference to the outcome: It will be reasonably good either way if people reap the gains from trade.
One might object that it is unlikely that humans would be able to cooperate efficiently, given that we don’t see this type of cooperation happening today. However, I think it’s reasonable to assume that staying in control of technological progress beyond the AGI transition requires a degree of wisdom and foresight that is very far away from where most societal groups are at today. And if humans do stay in control, then finding a good solution for value disagreements may be the easier problem, or at worst similarly hard. So it feels to me that most likely, we either we get a future that goes badly for reasons related to lack of coordination and sophistication in the pre-AGI stage, or we get a future where humans set things up wisely enough to actually design an outcome that is nice (or at least not amongst the 10% of worst outcomes) by the lights of nearly everyone.
Brian Tomasik made the point that conditional on human values staying in control, we may be very unlikely to get something like broad moral reflection. Instead, values could be determined by a very small group of individuals who happened to be in power by the time AGI arrives (as opposed to individuals ending up there because they were unusually foresighted and also morally motivated). This feels possible too, but it seems to not be the likely default to me because I suspect that you’d need to necessarily increase your philosophical sophistication in order to stay in control of AGI, and that probably gives you more pleasant outcomes (correlational claim). Iterated amplification for instance, as an approach to AI alignment, has several uses for humans: Humans are not only where the resulting values come from, but they’re also in charge of keeping the bootstrapping process on track and corrigible. And as this post on factored cognition illustrates, this requires sophistication to set up. So if that’s the bar that AGI creators need to pass before they can determine how “human values” are to be extrapolated, maybe we shouldn’t be too pessimistic about the outcome. It seems kind of unlikely that someone would go through all of that only to be like “I’m going to implement my personal best guess about what matters to me, with little further reflection, and no other humans get a say here.” Similarly, it also feels unlikely that people would go through with all that and not find a way to make subparts of the population reasonably content about how sentient subroutines are going to be used.
Now, I feel a bit confused about the feasibility of AI alignment if you were to do it somewhat sloppily and with lower standards. I think that there’s a spectrum from “it just wouldn’t work at all and not be competitive” (and then people would have to try some other approach) to “it would produce a capable AGI but it would be vulnerable to failure modes like adversarial exploits or optimization daemons, and so it would end up with not human values”. These failure modes, to the very small degree I currently understand them, sound like they would not be sensitive to whether the human whose approval you tried to approximate had an expanded moral circle or not. I might be wrong about that. If people mostly want sophisticated alignment procedures because they care about preserving the option for philosophical reflection, rather than because they also think that you simply run into large failure modes otherwise, then it seems like (conditional on some kind of value alignment) whether we get an outcome with broad moral reflection is not so clear. If it’s technically easier to build value-aligned AI with very parochial values, then MCE could make a relevant difference to these non-reflection outcomes.
But all in all my argument is that it’s somewhat strange to assume that a group of people could succeed at building an AGI optimized for its creators’ values, without having to put in so much thinking about how to get this outcome right that they’d almost can’t help but become reasonably philosophically sophisticated in the process. And sure, philosophically sophisticated people can still have fairly strange values by your own lights, but it seems like there’s more convergence. Plus I’d at least be optimistic about their propensity to strive towards positive-sum outcomes, given how little scarcity you’d have if the transition does go well.
Of course, maybe value-alignment is going to work very differently from what people currently think. The main way I’d criticize my above points is that they’re based on heavy-handed inside-view thinking about how difficult I (and others I’m updating towards) expect the AGI transition to be. If AGI will be more like the Industrial Revolution rather than something that is even more difficult to stay remotely in control of, or if some other technology proves to be more consequential than AGI, then my argument has less force. I mainly see this as yet another reason to caveat that the ex ante plausible-seeming position that MCE can have a strong impact on AGI outcomes starts to feel more and more conjunctive the more you zoom in and try to identify concrete pathways.
Interesting points. :) I think there could be substantial differences in policy between 10% support and 100% support for MCE depending on the costs of appeasing this faction and how passionate it is. Or between 1% and 10% support for MCE applied to more fringe entities.
I’m not sure if sophistication increases convergence. :) If anything, people who think more about philosophy tend to diverge more and more from commonsense moral assumptions.
Yudkowsky and I seem to share the same metaphysics of consciousness and have both thought about the topic in depth, yet we occupy almost antipodal positions on the question of how many entities we consider moral patients. I tend to assume that one’s starting points matter a lot for what views one ends up with.
I agree with this. It seems like the world where Moral Circle Expansion is useful is the world where:
The creators of AI are philosophically sophisticated (or persuadable) enough to expand their moral circle if they are exposed to the right arguments or work is put into persuading them.
They are not philosophically sophisticated enough to realize the arguments for expanding the moral circle on their own (seems plausible).
They are not philosophically sophisticated enough to realize that they might want to consider a distribution of arguments that they could have faced and could have persuaded them about what is morally right, and design AI with this in mind (ie CEV), or with the goal of achieving a period of reflection where they can sort out the sort of arguments that they would want to consider.
I think I’d prefer pushing on point 3, as it also encompasses a bunch of other potential philosophical mistakes that AI creators could make.