It seems odd to me that you don’t focus almost entirely on this sort of argument when considering total utilitarian style arguments.
I feel I did consider this argument in detail, including several considerations that touch on the arguments you gave. However, I primarily wanted to survey the main points that people have previously given me, rather than focusing heavily on a small set of arguments that someone like you might consider to be the strongest ones. And I agree that I may have missed some important considerations in this post.
In regards to your specific points, I generally find your arguments underspecified because, while reading them, it is difficult for me to identify a concrete mechanism for why alignment with human preferences creates astronomically more value from a total utilitarian perspective relative to the alternative. As it is, you seem to have a lot of confidence that human values, upon reflection, would converge onto values that would be far better in expectation than the alternative. However, I’m not a moral realist, and by comparison to you, I think I don’t have much faith in the value of moral reflection, absent additional arguments.
My speculative guess is that part of this argument comes from simply defining “human preferences” as aligned with utilitarian objectives. For example, you seem to think that aligning AIs would help empower the fraction of humans who are utilitarians, or at least would become utilitarians on reflection. But as I argued in the post, the vast majority of humans are not total utilitarians, and indeed, anti-total utilitarian moral intuitions are quite common among humans, which would act against the creation of large amounts of utilitarian value in an aligned scenario.
These are my general thoughts on what you wrote, although I admit I have not responded in detail to any of your specific arguments, and I think you did reveal a genuine blindspot in the arguments I gave. I may write a comment at some future point that considers your comment more thoroughly.
As it is, you seem to have a lot of confidence that human values, upon reflection, would converge onto values that would be far better in expectation than the alternative. However, I’m not a moral realist, and by comparison to you, I think I don’t have much faith in the value of moral reflection, absent additional arguments.
I’m assuming some level of moral-quasi realism: I care about what I would think is good after reflecting on the situation for a long time and becoming much smarter.
For more on this perspective consider: this post by Holden. I think there is a bunch of other discussion elsewhere from Paul Christiano and Joe Carlsmith, but I can’t find posts immediately.
I think the case for being a moral-quasi realist is very strong and depends on very few claims.
My speculative guess is that part of this argument comes from simply defining “human preferences” as aligned with utilitarian objectives.
Not exactly, I’m just defining “the good” as something like “what I would think was good after following a good reflection process which doesn’t go off the rails in an intuitive sense”. (Aka moral-quasi realism.)
I’m not certain that after reflection I would end up at something which is that well described as utilitarian. Something vaguely in the ball park seems plausible though.
But as I argued in the post, the vast majority of humans are not total utilitarians, and indeed, anti-total utilitarian moral intuitions are quite common among humans, which would act against the creation of large amounts of utilitarian value in an aligned scenario
A reasonable fraction of my view is that many of the moral intuitions of humans might mostly be biases which end up not being that important if people decide to thoughtfully reflect. I predict that humans converge more after reflection and becoming much, much smarter. I don’t know exactly what humans converge towards, but it seems likely that I converge toward a cluster which benefits from copious amounts of resources and which has reasonable support among the things which humans think on reflection.
I’m assuming some level of moral-quasi realism: I care about what I would think is good after reflecting on the situation for a long time and becoming much smarter.
Depending on the structure of this meta-ethical view, I feel like you should be relatively happy to let unaligned AIs do the reflection for you in many plausible circumstances. The intuition here is that if you are happy to defer your reflection to other humans, such as future humans who will replace us in the future, then you should potentially also be open to deferring your reflection to a large range of potential other beings, including AIs who might initially not share human preferences, but would converge to the same ethical views that we’d converge to.
In other words, in contrast to a hardcore moral anti-realist (such as myself) who doesn’t value moral reflection much, you seem happier to defer this reflection process to beings who don’t share your consumption or current ethical preferences. But you seem to think it’s OK to defer to humans but not unaligned AIs, implicitly drawing a moral distinction on the basis of species. Whereas I’m concerned that if I die and get replaced by either humans or AIs, my goals will not be furthered, including in the very long-run.
What is it about the human species exactly that makes you happy to defer your values to other members of that species?
Not exactly, I’m just defining “the good” as something like “what I would think was good after following a good reflection process which doesn’t go off the rails in an intuitive sense”. (Aka moral-quasi realism.)
I think I have a difficult time fully understanding your view because I think it’s a little underspecified. In my view, there seem to be a vast number of different ways that one can “reflect”, and intuitively I don’t think all (or even most) of these processes will converge to roughly the same place. Can you give me intuitions for why you hold this meta-ethical view? Perhaps you can also be more precise about what you see as the central claims of moral quasi-realism.
Depending on the structure of this meta-ethical view, I feel like you should be relatively happy to let unaligned AIs do the reflection for you in many plausible circumstances.
I’m certainly happy if we get to the same place. I think I have feel less good about the view the more contingent it is.
In other words, in contrast to a hardcore moral anti-realist (such as myself) who doesn’t value moral reflection much, you seem happier to defer this reflection process to beings who don’t share your consumption or current ethical preferences. But you seem to think it’s OK to defer to humans but not unaligned AIs, implicitly drawing a moral distinction on the basis of species.
I mean, I certainly think you lose some value from it being other humans. My guess is that you lose more like 5-20x of the value from my perspective with humans than like 1000x and that this 5-20x of the value lost is more like 20-100x for unaligned AI.
I think I have a difficult time fully understanding your view because I think it’s a little underspecified. In my view, there seem to be a vast number of different ways that one can “reflect”, and intuitively I don’t think all (or even most) of these processes will converge to roughly the same place. Can you give me intuitions for why you hold this meta-ethical view? Perhaps you can also be more precise about what you see as the central claims of moral quasi-realism.
I think my views about what I converge to are distinct about my views on quasi-realism. I think a weak notion of quasi-realism is extremely intuitive: you would do better things if you thought more about what would be good (at least relatively to the current returns, eventually returns to thinking would saturate). Because e.g., there are interesting empirical facts (where did my current biases come from evolutionarily? what are brains doing?) I’m not claiming that quasi-realism implies my conclusions, just that it’s an important part of where I’m coming from.
I separately think that reflection and getting smarter are likely to cause convergence due to a variety of broad intuitions and some vague historical analysis. I’m not hugely confident in this, but I’m confident enough to think the expect value looks pretty juicy.
I feel I did consider this argument in detail, including several considerations that touch on the arguments you gave. However, I primarily wanted to survey the main points that people have previously given me, rather than focusing heavily on a small set of arguments that someone like you might consider to be the strongest ones. And I agree that I may have missed some important considerations in this post.
In regards to your specific points, I generally find your arguments underspecified because, while reading them, it is difficult for me to identify a concrete mechanism for why alignment with human preferences creates astronomically more value from a total utilitarian perspective relative to the alternative. As it is, you seem to have a lot of confidence that human values, upon reflection, would converge onto values that would be far better in expectation than the alternative. However, I’m not a moral realist, and by comparison to you, I think I don’t have much faith in the value of moral reflection, absent additional arguments.
My speculative guess is that part of this argument comes from simply defining “human preferences” as aligned with utilitarian objectives. For example, you seem to think that aligning AIs would help empower the fraction of humans who are utilitarians, or at least would become utilitarians on reflection. But as I argued in the post, the vast majority of humans are not total utilitarians, and indeed, anti-total utilitarian moral intuitions are quite common among humans, which would act against the creation of large amounts of utilitarian value in an aligned scenario.
These are my general thoughts on what you wrote, although I admit I have not responded in detail to any of your specific arguments, and I think you did reveal a genuine blindspot in the arguments I gave. I may write a comment at some future point that considers your comment more thoroughly.
I’m assuming some level of moral-quasi realism: I care about what I would think is good after reflecting on the situation for a long time and becoming much smarter.
For more on this perspective consider: this post by Holden. I think there is a bunch of other discussion elsewhere from Paul Christiano and Joe Carlsmith, but I can’t find posts immediately.
I think the case for being a moral-quasi realist is very strong and depends on very few claims.
Not exactly, I’m just defining “the good” as something like “what I would think was good after following a good reflection process which doesn’t go off the rails in an intuitive sense”. (Aka moral-quasi realism.)
I’m not certain that after reflection I would end up at something which is that well described as utilitarian. Something vaguely in the ball park seems plausible though.
A reasonable fraction of my view is that many of the moral intuitions of humans might mostly be biases which end up not being that important if people decide to thoughtfully reflect. I predict that humans converge more after reflection and becoming much, much smarter. I don’t know exactly what humans converge towards, but it seems likely that I converge toward a cluster which benefits from copious amounts of resources and which has reasonable support among the things which humans think on reflection.
Depending on the structure of this meta-ethical view, I feel like you should be relatively happy to let unaligned AIs do the reflection for you in many plausible circumstances. The intuition here is that if you are happy to defer your reflection to other humans, such as future humans who will replace us in the future, then you should potentially also be open to deferring your reflection to a large range of potential other beings, including AIs who might initially not share human preferences, but would converge to the same ethical views that we’d converge to.
In other words, in contrast to a hardcore moral anti-realist (such as myself) who doesn’t value moral reflection much, you seem happier to defer this reflection process to beings who don’t share your consumption or current ethical preferences. But you seem to think it’s OK to defer to humans but not unaligned AIs, implicitly drawing a moral distinction on the basis of species. Whereas I’m concerned that if I die and get replaced by either humans or AIs, my goals will not be furthered, including in the very long-run.
What is it about the human species exactly that makes you happy to defer your values to other members of that species?
I think I have a difficult time fully understanding your view because I think it’s a little underspecified. In my view, there seem to be a vast number of different ways that one can “reflect”, and intuitively I don’t think all (or even most) of these processes will converge to roughly the same place. Can you give me intuitions for why you hold this meta-ethical view? Perhaps you can also be more precise about what you see as the central claims of moral quasi-realism.
I’m certainly happy if we get to the same place. I think I have feel less good about the view the more contingent it is.
I mean, I certainly think you lose some value from it being other humans. My guess is that you lose more like 5-20x of the value from my perspective with humans than like 1000x and that this 5-20x of the value lost is more like 20-100x for unaligned AI.
I think my views about what I converge to are distinct about my views on quasi-realism. I think a weak notion of quasi-realism is extremely intuitive: you would do better things if you thought more about what would be good (at least relatively to the current returns, eventually returns to thinking would saturate). Because e.g., there are interesting empirical facts (where did my current biases come from evolutionarily? what are brains doing?) I’m not claiming that quasi-realism implies my conclusions, just that it’s an important part of where I’m coming from.
I separately think that reflection and getting smarter are likely to cause convergence due to a variety of broad intuitions and some vague historical analysis. I’m not hugely confident in this, but I’m confident enough to think the expect value looks pretty juicy.