(Having written this comment and then re-read your comment, I have a sense that I might be sort-of talking past you or totally misunderstanding you, so let me know if thatās the case.)
Responding to your conversation text:
I still find it hard to wrap my head around what the claims or arguments in that conversation would actually mean. Though I might say the same about a lot of other arguments about extremely long-term trajectories, so this comment isnāt really meant as a critique.
Some points where Iām confused about what you mean, or about how to think about it:
What, precisely, do we mean by āvalue lock-inā?
If we mean something as specific as āa superintelligent AI is created with a particular set of values, and then its values never change and the accessible universe is used however the superintelligent AI decided to use itā, then I think moral advocacy can clearly have a lasting impact without that sort of value lock-in.
Do we mean that some actorsā (e.g., current humans) values are locked in, or that thereās a lock-in of what values will determine how the accessible universe is used?
Do we mean that a specific set of values are locked in, or that something like a particular ātrajectoryā or ārangeā of values are locked in? E.g., would we count it as āvalue lock-inā if we lock in a particular recurring pattern of shifts in values? Or if we just lock-in disvaluing suffering, but values could still shift along all other dimensions?
āif the graph of future moral progress is a sine waveāādo you essentially mean that thereās a recurring pattern of values getting ābetterā and then later getting āworseā?
And do you mean that that pattern lasts indefinitelyāi.e., until something like the heat death of the universe?
Do you see it as plausible that that sort of a pattern could last an extremely long time? If so, what sort of things do you think would drive it?
At first glance, it feels to me like that would be extremely unlikely to happen āby chanceā, and that thereās no good reason to believe weāre already stuck with this sort of a pattern happening indefinitely. So it feels like it would have to be the case that something in particular happens (which we currently could still prevent) that causes us to be stuck with this recurring pattern.
If so, I think Iād want to say that this is meaningfully similar to a value lock-in; it seems like a lock-in of a particular trajectory has to occur at a particular point, and that what matters is whether that lock-in occurs, and what trajectory weāre locked into when it occurs. (Though it could be that the lock-in occurs āgraduallyā, in the sense that it gradually becomes harder and harder to get out of that pattern. I think this is also true for lock-in of a specific set of values.)
I think that thinking about what might cause us to end up with an indefinite pattern of improving and then worsening moral values would help us think about whether moral advocacy work would just speed us along one part of the pattern, shift the whole pattern, change what pattern weāre likely to end up with, or change whether we end up with such a pattern at all. (For present purposes, Iād say we could call farm animal welfare work āindirect moral advocacy workā, if its ultimate aim is shifting values.)
I also think an argument can be made that, given a few plausible yet uncertain assumptions, thereās practically guaranteed to eventually be a lock-in of major aspects of how the accessible universe is used. Iāve drafted a brief outline of this argument and some counterpoints to it, which Iāll hopefully post next month, but could also share on request.
(Having written this comment and then re-read your comment, I have a sense that I might be sort-of talking past you or totally misunderstanding you, so let me know if thatās the case.)
Responding to your conversation text:
I still find it hard to wrap my head around what the claims or arguments in that conversation would actually mean. Though I might say the same about a lot of other arguments about extremely long-term trajectories, so this comment isnāt really meant as a critique.
Some points where Iām confused about what you mean, or about how to think about it:
What, precisely, do we mean by āvalue lock-inā?
If we mean something as specific as āa superintelligent AI is created with a particular set of values, and then its values never change and the accessible universe is used however the superintelligent AI decided to use itā, then I think moral advocacy can clearly have a lasting impact without that sort of value lock-in.
Do we mean that some actorsā (e.g., current humans) values are locked in, or that thereās a lock-in of what values will determine how the accessible universe is used?
Do we mean that a specific set of values are locked in, or that something like a particular ātrajectoryā or ārangeā of values are locked in? E.g., would we count it as āvalue lock-inā if we lock in a particular recurring pattern of shifts in values? Or if we just lock-in disvaluing suffering, but values could still shift along all other dimensions?
āif the graph of future moral progress is a sine waveāādo you essentially mean that thereās a recurring pattern of values getting ābetterā and then later getting āworseā?
And do you mean that that pattern lasts indefinitelyāi.e., until something like the heat death of the universe?
Do you see it as plausible that that sort of a pattern could last an extremely long time? If so, what sort of things do you think would drive it?
At first glance, it feels to me like that would be extremely unlikely to happen āby chanceā, and that thereās no good reason to believe weāre already stuck with this sort of a pattern happening indefinitely. So it feels like it would have to be the case that something in particular happens (which we currently could still prevent) that causes us to be stuck with this recurring pattern.
If so, I think Iād want to say that this is meaningfully similar to a value lock-in; it seems like a lock-in of a particular trajectory has to occur at a particular point, and that what matters is whether that lock-in occurs, and what trajectory weāre locked into when it occurs. (Though it could be that the lock-in occurs āgraduallyā, in the sense that it gradually becomes harder and harder to get out of that pattern. I think this is also true for lock-in of a specific set of values.)
I think that thinking about what might cause us to end up with an indefinite pattern of improving and then worsening moral values would help us think about whether moral advocacy work would just speed us along one part of the pattern, shift the whole pattern, change what pattern weāre likely to end up with, or change whether we end up with such a pattern at all. (For present purposes, Iād say we could call farm animal welfare work āindirect moral advocacy workā, if its ultimate aim is shifting values.)
I also think an argument can be made that, given a few plausible yet uncertain assumptions, thereās practically guaranteed to eventually be a lock-in of major aspects of how the accessible universe is used. Iāve drafted a brief outline of this argument and some counterpoints to it, which Iāll hopefully post next month, but could also share on request.