As it is, you seem to have a lot of confidence that human values, upon reflection, would converge onto values that would be far better in expectation than the alternative. However, I’m not a moral realist, and by comparison to you, I think I don’t have much faith in the value of moral reflection, absent additional arguments.
I’m assuming some level of moral-quasi realism: I care about what I would think is good after reflecting on the situation for a long time and becoming much smarter.
For more on this perspective consider: this post by Holden. I think there is a bunch of other discussion elsewhere from Paul Christiano and Joe Carlsmith, but I can’t find posts immediately.
I think the case for being a moral-quasi realist is very strong and depends on very few claims.
My speculative guess is that part of this argument comes from simply defining “human preferences” as aligned with utilitarian objectives.
Not exactly, I’m just defining “the good” as something like “what I would think was good after following a good reflection process which doesn’t go off the rails in an intuitive sense”. (Aka moral-quasi realism.)
I’m not certain that after reflection I would end up at something which is that well described as utilitarian. Something vaguely in the ball park seems plausible though.
But as I argued in the post, the vast majority of humans are not total utilitarians, and indeed, anti-total utilitarian moral intuitions are quite common among humans, which would act against the creation of large amounts of utilitarian value in an aligned scenario
A reasonable fraction of my view is that many of the moral intuitions of humans might mostly be biases which end up not being that important if people decide to thoughtfully reflect. I predict that humans converge more after reflection and becoming much, much smarter. I don’t know exactly what humans converge towards, but it seems likely that I converge toward a cluster which benefits from copious amounts of resources and which has reasonable support among the things which humans think on reflection.
I’m assuming some level of moral-quasi realism: I care about what I would think is good after reflecting on the situation for a long time and becoming much smarter.
Depending on the structure of this meta-ethical view, I feel like you should be relatively happy to let unaligned AIs do the reflection for you in many plausible circumstances. The intuition here is that if you are happy to defer your reflection to other humans, such as future humans who will replace us in the future, then you should potentially also be open to deferring your reflection to a large range of potential other beings, including AIs who might initially not share human preferences, but would converge to the same ethical views that we’d converge to.
In other words, in contrast to a hardcore moral anti-realist (such as myself) who doesn’t value moral reflection much, you seem happier to defer this reflection process to beings who don’t share your consumption or current ethical preferences. But you seem to think it’s OK to defer to humans but not unaligned AIs, implicitly drawing a moral distinction on the basis of species. Whereas I’m concerned that if I die and get replaced by either humans or AIs, my goals will not be furthered, including in the very long-run.
What is it about the human species exactly that makes you happy to defer your values to other members of that species?
Not exactly, I’m just defining “the good” as something like “what I would think was good after following a good reflection process which doesn’t go off the rails in an intuitive sense”. (Aka moral-quasi realism.)
I think I have a difficult time fully understanding your view because I think it’s a little underspecified. In my view, there seem to be a vast number of different ways that one can “reflect”, and intuitively I don’t think all (or even most) of these processes will converge to roughly the same place. Can you give me intuitions for why you hold this meta-ethical view? Perhaps you can also be more precise about what you see as the central claims of moral quasi-realism.
Depending on the structure of this meta-ethical view, I feel like you should be relatively happy to let unaligned AIs do the reflection for you in many plausible circumstances.
I’m certainly happy if we get to the same place. I think I have feel less good about the view the more contingent it is.
In other words, in contrast to a hardcore moral anti-realist (such as myself) who doesn’t value moral reflection much, you seem happier to defer this reflection process to beings who don’t share your consumption or current ethical preferences. But you seem to think it’s OK to defer to humans but not unaligned AIs, implicitly drawing a moral distinction on the basis of species.
I mean, I certainly think you lose some value from it being other humans. My guess is that you lose more like 5-20x of the value from my perspective with humans than like 1000x and that this 5-20x of the value lost is more like 20-100x for unaligned AI.
I think I have a difficult time fully understanding your view because I think it’s a little underspecified. In my view, there seem to be a vast number of different ways that one can “reflect”, and intuitively I don’t think all (or even most) of these processes will converge to roughly the same place. Can you give me intuitions for why you hold this meta-ethical view? Perhaps you can also be more precise about what you see as the central claims of moral quasi-realism.
I think my views about what I converge to are distinct about my views on quasi-realism. I think a weak notion of quasi-realism is extremely intuitive: you would do better things if you thought more about what would be good (at least relatively to the current returns, eventually returns to thinking would saturate). Because e.g., there are interesting empirical facts (where did my current biases come from evolutionarily? what are brains doing?) I’m not claiming that quasi-realism implies my conclusions, just that it’s an important part of where I’m coming from.
I separately think that reflection and getting smarter are likely to cause convergence due to a variety of broad intuitions and some vague historical analysis. I’m not hugely confident in this, but I’m confident enough to think the expect value looks pretty juicy.
I’m assuming some level of moral-quasi realism: I care about what I would think is good after reflecting on the situation for a long time and becoming much smarter.
For more on this perspective consider: this post by Holden. I think there is a bunch of other discussion elsewhere from Paul Christiano and Joe Carlsmith, but I can’t find posts immediately.
I think the case for being a moral-quasi realist is very strong and depends on very few claims.
Not exactly, I’m just defining “the good” as something like “what I would think was good after following a good reflection process which doesn’t go off the rails in an intuitive sense”. (Aka moral-quasi realism.)
I’m not certain that after reflection I would end up at something which is that well described as utilitarian. Something vaguely in the ball park seems plausible though.
A reasonable fraction of my view is that many of the moral intuitions of humans might mostly be biases which end up not being that important if people decide to thoughtfully reflect. I predict that humans converge more after reflection and becoming much, much smarter. I don’t know exactly what humans converge towards, but it seems likely that I converge toward a cluster which benefits from copious amounts of resources and which has reasonable support among the things which humans think on reflection.
Depending on the structure of this meta-ethical view, I feel like you should be relatively happy to let unaligned AIs do the reflection for you in many plausible circumstances. The intuition here is that if you are happy to defer your reflection to other humans, such as future humans who will replace us in the future, then you should potentially also be open to deferring your reflection to a large range of potential other beings, including AIs who might initially not share human preferences, but would converge to the same ethical views that we’d converge to.
In other words, in contrast to a hardcore moral anti-realist (such as myself) who doesn’t value moral reflection much, you seem happier to defer this reflection process to beings who don’t share your consumption or current ethical preferences. But you seem to think it’s OK to defer to humans but not unaligned AIs, implicitly drawing a moral distinction on the basis of species. Whereas I’m concerned that if I die and get replaced by either humans or AIs, my goals will not be furthered, including in the very long-run.
What is it about the human species exactly that makes you happy to defer your values to other members of that species?
I think I have a difficult time fully understanding your view because I think it’s a little underspecified. In my view, there seem to be a vast number of different ways that one can “reflect”, and intuitively I don’t think all (or even most) of these processes will converge to roughly the same place. Can you give me intuitions for why you hold this meta-ethical view? Perhaps you can also be more precise about what you see as the central claims of moral quasi-realism.
I’m certainly happy if we get to the same place. I think I have feel less good about the view the more contingent it is.
I mean, I certainly think you lose some value from it being other humans. My guess is that you lose more like 5-20x of the value from my perspective with humans than like 1000x and that this 5-20x of the value lost is more like 20-100x for unaligned AI.
I think my views about what I converge to are distinct about my views on quasi-realism. I think a weak notion of quasi-realism is extremely intuitive: you would do better things if you thought more about what would be good (at least relatively to the current returns, eventually returns to thinking would saturate). Because e.g., there are interesting empirical facts (where did my current biases come from evolutionarily? what are brains doing?) I’m not claiming that quasi-realism implies my conclusions, just that it’s an important part of where I’m coming from.
I separately think that reflection and getting smarter are likely to cause convergence due to a variety of broad intuitions and some vague historical analysis. I’m not hugely confident in this, but I’m confident enough to think the expect value looks pretty juicy.