I’m assuming some level of moral-quasi realism: I care about what I would think is good after reflecting on the situation for a long time and becoming much smarter.
Depending on the structure of this meta-ethical view, I feel like you should be relatively happy to let unaligned AIs do the reflection for you in many plausible circumstances. The intuition here is that if you are happy to defer your reflection to other humans, such as future humans who will replace us in the future, then you should potentially also be open to deferring your reflection to a large range of potential other beings, including AIs who might initially not share human preferences, but would converge to the same ethical views that we’d converge to.
In other words, in contrast to a hardcore moral anti-realist (such as myself) who doesn’t value moral reflection much, you seem happier to defer this reflection process to beings who don’t share your consumption or current ethical preferences. But you seem to think it’s OK to defer to humans but not unaligned AIs, implicitly drawing a moral distinction on the basis of species. Whereas I’m concerned that if I die and get replaced by either humans or AIs, my goals will not be furthered, including in the very long-run.
What is it about the human species exactly that makes you happy to defer your values to other members of that species?
Not exactly, I’m just defining “the good” as something like “what I would think was good after following a good reflection process which doesn’t go off the rails in an intuitive sense”. (Aka moral-quasi realism.)
I think I have a difficult time fully understanding your view because I think it’s a little underspecified. In my view, there seem to be a vast number of different ways that one can “reflect”, and intuitively I don’t think all (or even most) of these processes will converge to roughly the same place. Can you give me intuitions for why you hold this meta-ethical view? Perhaps you can also be more precise about what you see as the central claims of moral quasi-realism.
Depending on the structure of this meta-ethical view, I feel like you should be relatively happy to let unaligned AIs do the reflection for you in many plausible circumstances.
I’m certainly happy if we get to the same place. I think I have feel less good about the view the more contingent it is.
In other words, in contrast to a hardcore moral anti-realist (such as myself) who doesn’t value moral reflection much, you seem happier to defer this reflection process to beings who don’t share your consumption or current ethical preferences. But you seem to think it’s OK to defer to humans but not unaligned AIs, implicitly drawing a moral distinction on the basis of species.
I mean, I certainly think you lose some value from it being other humans. My guess is that you lose more like 5-20x of the value from my perspective with humans than like 1000x and that this 5-20x of the value lost is more like 20-100x for unaligned AI.
I think I have a difficult time fully understanding your view because I think it’s a little underspecified. In my view, there seem to be a vast number of different ways that one can “reflect”, and intuitively I don’t think all (or even most) of these processes will converge to roughly the same place. Can you give me intuitions for why you hold this meta-ethical view? Perhaps you can also be more precise about what you see as the central claims of moral quasi-realism.
I think my views about what I converge to are distinct about my views on quasi-realism. I think a weak notion of quasi-realism is extremely intuitive: you would do better things if you thought more about what would be good (at least relatively to the current returns, eventually returns to thinking would saturate). Because e.g., there are interesting empirical facts (where did my current biases come from evolutionarily? what are brains doing?) I’m not claiming that quasi-realism implies my conclusions, just that it’s an important part of where I’m coming from.
I separately think that reflection and getting smarter are likely to cause convergence due to a variety of broad intuitions and some vague historical analysis. I’m not hugely confident in this, but I’m confident enough to think the expect value looks pretty juicy.
Depending on the structure of this meta-ethical view, I feel like you should be relatively happy to let unaligned AIs do the reflection for you in many plausible circumstances. The intuition here is that if you are happy to defer your reflection to other humans, such as future humans who will replace us in the future, then you should potentially also be open to deferring your reflection to a large range of potential other beings, including AIs who might initially not share human preferences, but would converge to the same ethical views that we’d converge to.
In other words, in contrast to a hardcore moral anti-realist (such as myself) who doesn’t value moral reflection much, you seem happier to defer this reflection process to beings who don’t share your consumption or current ethical preferences. But you seem to think it’s OK to defer to humans but not unaligned AIs, implicitly drawing a moral distinction on the basis of species. Whereas I’m concerned that if I die and get replaced by either humans or AIs, my goals will not be furthered, including in the very long-run.
What is it about the human species exactly that makes you happy to defer your values to other members of that species?
I think I have a difficult time fully understanding your view because I think it’s a little underspecified. In my view, there seem to be a vast number of different ways that one can “reflect”, and intuitively I don’t think all (or even most) of these processes will converge to roughly the same place. Can you give me intuitions for why you hold this meta-ethical view? Perhaps you can also be more precise about what you see as the central claims of moral quasi-realism.
I’m certainly happy if we get to the same place. I think I have feel less good about the view the more contingent it is.
I mean, I certainly think you lose some value from it being other humans. My guess is that you lose more like 5-20x of the value from my perspective with humans than like 1000x and that this 5-20x of the value lost is more like 20-100x for unaligned AI.
I think my views about what I converge to are distinct about my views on quasi-realism. I think a weak notion of quasi-realism is extremely intuitive: you would do better things if you thought more about what would be good (at least relatively to the current returns, eventually returns to thinking would saturate). Because e.g., there are interesting empirical facts (where did my current biases come from evolutionarily? what are brains doing?) I’m not claiming that quasi-realism implies my conclusions, just that it’s an important part of where I’m coming from.
I separately think that reflection and getting smarter are likely to cause convergence due to a variety of broad intuitions and some vague historical analysis. I’m not hugely confident in this, but I’m confident enough to think the expect value looks pretty juicy.