(Having written this comment and then re-read your comment, I have a sense that I might be sort-of talking past you or totally misunderstanding you, so let me know if that’s the case.)
Responding to your conversation text:
I still find it hard to wrap my head around what the claims or arguments in that conversation would actually mean. Though I might say the same about a lot of other arguments about extremely long-term trajectories, so this comment isn’t really meant as a critique.
Some points where I’m confused about what you mean, or about how to think about it:
What, precisely, do we mean by “value lock-in”?
If we mean something as specific as “a superintelligent AI is created with a particular set of values, and then its values never change and the accessible universe is used however the superintelligent AI decided to use it”, then I think moral advocacy can clearly have a lasting impact without that sort of value lock-in.
Do we mean that some actors’ (e.g., current humans) values are locked in, or that there’s a lock-in of what values will determine how the accessible universe is used?
Do we mean that a specific set of values are locked in, or that something like a particular “trajectory” or “range” of values are locked in? E.g., would we count it as “value lock-in” if we lock in a particular recurring pattern of shifts in values? Or if we just lock-in disvaluing suffering, but values could still shift along all other dimensions?
“if the graph of future moral progress is a sine wave”—do you essentially mean that there’s a recurring pattern of values getting “better” and then later getting “worse”?
And do you mean that that pattern lasts indefinitely—i.e., until something like the heat death of the universe?
Do you see it as plausible that that sort of a pattern could last an extremely long time? If so, what sort of things do you think would drive it?
At first glance, it feels to me like that would be extremely unlikely to happen “by chance”, and that there’s no good reason to believe we’re already stuck with this sort of a pattern happening indefinitely. So it feels like it would have to be the case that something in particular happens (which we currently could still prevent) that causes us to be stuck with this recurring pattern.
If so, I think I’d want to say that this is meaningfully similar to a value lock-in; it seems like a lock-in of a particular trajectory has to occur at a particular point, and that what matters is whether that lock-in occurs, and what trajectory we’re locked into when it occurs. (Though it could be that the lock-in occurs “gradually”, in the sense that it gradually becomes harder and harder to get out of that pattern. I think this is also true for lock-in of a specific set of values.)
I think that thinking about what might cause us to end up with an indefinite pattern of improving and then worsening moral values would help us think about whether moral advocacy work would just speed us along one part of the pattern, shift the whole pattern, change what pattern we’re likely to end up with, or change whether we end up with such a pattern at all. (For present purposes, I’d say we could call farm animal welfare work “indirect moral advocacy work”, if its ultimate aim is shifting values.)
I also think an argument can be made that, given a few plausible yet uncertain assumptions, there’s practically guaranteed to eventually be a lock-in of major aspects of how the accessible universe is used. I’ve drafted a brief outline of this argument and some counterpoints to it, which I’ll hopefully post next month, but could also share on request.
(Having written this comment and then re-read your comment, I have a sense that I might be sort-of talking past you or totally misunderstanding you, so let me know if that’s the case.)
Responding to your conversation text:
I still find it hard to wrap my head around what the claims or arguments in that conversation would actually mean. Though I might say the same about a lot of other arguments about extremely long-term trajectories, so this comment isn’t really meant as a critique.
Some points where I’m confused about what you mean, or about how to think about it:
What, precisely, do we mean by “value lock-in”?
If we mean something as specific as “a superintelligent AI is created with a particular set of values, and then its values never change and the accessible universe is used however the superintelligent AI decided to use it”, then I think moral advocacy can clearly have a lasting impact without that sort of value lock-in.
Do we mean that some actors’ (e.g., current humans) values are locked in, or that there’s a lock-in of what values will determine how the accessible universe is used?
Do we mean that a specific set of values are locked in, or that something like a particular “trajectory” or “range” of values are locked in? E.g., would we count it as “value lock-in” if we lock in a particular recurring pattern of shifts in values? Or if we just lock-in disvaluing suffering, but values could still shift along all other dimensions?
“if the graph of future moral progress is a sine wave”—do you essentially mean that there’s a recurring pattern of values getting “better” and then later getting “worse”?
And do you mean that that pattern lasts indefinitely—i.e., until something like the heat death of the universe?
Do you see it as plausible that that sort of a pattern could last an extremely long time? If so, what sort of things do you think would drive it?
At first glance, it feels to me like that would be extremely unlikely to happen “by chance”, and that there’s no good reason to believe we’re already stuck with this sort of a pattern happening indefinitely. So it feels like it would have to be the case that something in particular happens (which we currently could still prevent) that causes us to be stuck with this recurring pattern.
If so, I think I’d want to say that this is meaningfully similar to a value lock-in; it seems like a lock-in of a particular trajectory has to occur at a particular point, and that what matters is whether that lock-in occurs, and what trajectory we’re locked into when it occurs. (Though it could be that the lock-in occurs “gradually”, in the sense that it gradually becomes harder and harder to get out of that pattern. I think this is also true for lock-in of a specific set of values.)
I think that thinking about what might cause us to end up with an indefinite pattern of improving and then worsening moral values would help us think about whether moral advocacy work would just speed us along one part of the pattern, shift the whole pattern, change what pattern we’re likely to end up with, or change whether we end up with such a pattern at all. (For present purposes, I’d say we could call farm animal welfare work “indirect moral advocacy work”, if its ultimate aim is shifting values.)
I also think an argument can be made that, given a few plausible yet uncertain assumptions, there’s practically guaranteed to eventually be a lock-in of major aspects of how the accessible universe is used. I’ve drafted a brief outline of this argument and some counterpoints to it, which I’ll hopefully post next month, but could also share on request.