You don’t need to agree with premise 3 to think that working on MCE is a cost-effective way to reduce s-risks. Below is the text from a conversation I recently had, where alternating paragraphs are alternating between the two participants.
The brief conversation below focuses on the idea of premise 3, but I’d also note that the existing psychological evidence for the “secondary transfer effect” is relevant to premise 4. I think that you could make progress on empirically testing premise 4. I agree that this would be fairly high priority to run more tests on (focused specifically on farmed animals / wild animals / artificial sentience) from the perspective of prioritising between different potential “targets” of MCE advocacy, and also perhaps for deciding how important MCE work is within the longtermist portfolio of interventions. I can imagine this research fitting well within Sentience Institute. If you know anyone who could be interested in applying to our current researcher job opening (closing in ~one week’s time), to work on this or other questions, please do let them know about the opening.
----
“I don’t personally see value lock-in as a prerequisite for farmed animals → artificial sentience far future impact… if the graph of future moral progress is a sine wave, and you increase the whole sine wave by 2 units, then your expected value is still 2*duration-of-everything, even if you don’t increase the value at the lock-in point.”
“It doesn’t seem that likely to me that you would increase the whole sine wave by 2 units, as opposed to just increasing the gradient of one of the upward slopes or something like that.”
“Hm, why do you think increasing the gradient is more likely? If you just add an action into the world that wouldn’t happen otherwise (e.g. donate $100 to an animal rights org), then it seems the default is an increase in the whole sine wave. For that to be simply an increase in upward slope, you’d need to think there’s a fundamental dynamic in society changing the impact of that contribution, such as a limit on short-term progress. But one can also imagine the opposite effect where progress is easier during certain periods, so you could cause >2 units of increase. There are lots of intuitions that can pull the impact up or down, but overall, a +2 increase in the whole wave seems like the go-to assumption.”
“Presumably it depends on the extent to which you think there’s something like a secondary transfer effect, or some other mechanism by which successful advocacy for farmed animals enables advocacy for other sentient beings. E.g. imagine that we have 100% certainty that animal farming will end within 1000 years, and we know that all advocates (apart from us) are interested in farmed animal advocacy specifically, rather than MCE advocacy. Then, all MCE work would be doing would be speeding up the time before the end of animal farming. But if we remove those assumptions, then I guess it would have some sort of “increase” effect, rather than just an effect on the slope. Both those assumptions are unreasonable, but presumably you could get something similar if it was close to 100% and most farmed animal advocacy efforts seemed likely to terminate at the end of animal farming, as oppose to be shifted into other forms of MCE advocacy.”
“Yep, that makes sense if you don’t think there’s some diminishing factor on the flow-through from farmed animal advocacy to moral inclusion of AS, as long as you don’t think there are increasing factors that outweigh it.”
I’d also note that the existing psychological evidence for the “secondary transfer effect” is relevant to premise 4
Good point, thanks! Judging only from that paper’s abstract, I’d guess that it’d indeed be useful for work on these questions to draw on evidence and theorising about secondary transfer effects.
I can imagine this research fitting well within Sentience Institute. If you know anyone who could be interested in applying to our current researcher job opening (closing in ~one week’s time), to work on this or other questions, please do let them know about the opening.
Yes, I’d agree that this kind of work seems to clearly fit the Sentience Institute’s mission, and that SI seems like it’s probably among the best homes for this kind of work. (Off the top of my head, other candidate “best homes” might be Rethink Priorities or academic psychology. But it’s possible that, in the latter, it’d be hard to sell people on focusing a lot of resources on the most relevant questions.)
So I’m glad you stated that explicitly (perhaps I should’ve too), and mentioned SI’s job opening here, so people interested in researching these questions can see it.
(Having written this comment and then re-read your comment, I have a sense that I might be sort-of talking past you or totally misunderstanding you, so let me know if that’s the case.)
Responding to your conversation text:
I still find it hard to wrap my head around what the claims or arguments in that conversation would actually mean. Though I might say the same about a lot of other arguments about extremely long-term trajectories, so this comment isn’t really meant as a critique.
Some points where I’m confused about what you mean, or about how to think about it:
What, precisely, do we mean by “value lock-in”?
If we mean something as specific as “a superintelligent AI is created with a particular set of values, and then its values never change and the accessible universe is used however the superintelligent AI decided to use it”, then I think moral advocacy can clearly have a lasting impact without that sort of value lock-in.
Do we mean that some actors’ (e.g., current humans) values are locked in, or that there’s a lock-in of what values will determine how the accessible universe is used?
Do we mean that a specific set of values are locked in, or that something like a particular “trajectory” or “range” of values are locked in? E.g., would we count it as “value lock-in” if we lock in a particular recurring pattern of shifts in values? Or if we just lock-in disvaluing suffering, but values could still shift along all other dimensions?
“if the graph of future moral progress is a sine wave”—do you essentially mean that there’s a recurring pattern of values getting “better” and then later getting “worse”?
And do you mean that that pattern lasts indefinitely—i.e., until something like the heat death of the universe?
Do you see it as plausible that that sort of a pattern could last an extremely long time? If so, what sort of things do you think would drive it?
At first glance, it feels to me like that would be extremely unlikely to happen “by chance”, and that there’s no good reason to believe we’re already stuck with this sort of a pattern happening indefinitely. So it feels like it would have to be the case that something in particular happens (which we currently could still prevent) that causes us to be stuck with this recurring pattern.
If so, I think I’d want to say that this is meaningfully similar to a value lock-in; it seems like a lock-in of a particular trajectory has to occur at a particular point, and that what matters is whether that lock-in occurs, and what trajectory we’re locked into when it occurs. (Though it could be that the lock-in occurs “gradually”, in the sense that it gradually becomes harder and harder to get out of that pattern. I think this is also true for lock-in of a specific set of values.)
I think that thinking about what might cause us to end up with an indefinite pattern of improving and then worsening moral values would help us think about whether moral advocacy work would just speed us along one part of the pattern, shift the whole pattern, change what pattern we’re likely to end up with, or change whether we end up with such a pattern at all. (For present purposes, I’d say we could call farm animal welfare work “indirect moral advocacy work”, if its ultimate aim is shifting values.)
I also think an argument can be made that, given a few plausible yet uncertain assumptions, there’s practically guaranteed to eventually be a lock-in of major aspects of how the accessible universe is used. I’ve drafted a brief outline of this argument and some counterpoints to it, which I’ll hopefully post next month, but could also share on request.
You don’t need to agree with premise 3 to think that working on MCE is a cost-effective way to reduce s-risks.
Yeah, I agree.
For one thing, if we just cut premise 3 in my statement of that argument, all that the conclusion would automatically lose is the claim of “urgency”, not the claim of “importance”. And it may sometimes be best to work on very important things even if they’re not urgent. (For this reason, and because my key focus was really on Premise 4, I’d actually considered leaving Premise 3 out of my comment.)
For another thing, which I think is your focus, it seems conceivable that MCE could be urgent even if value lock-in was pretty much guaranteed not to happen anytime soon. That said, I don’t think I’ve heard an alternative argument for MCE being urgent that I’ve understood and been convinced by. Tomorrow, with more sleep under my belt, I plan to have another crack at properly thinking through the dialogue you provide (including trying to think about what it would actually mean for the graph of future moral progress to be like a sine wave, and what could cause that to occur indefinitely).
You don’t need to agree with premise 3 to think that working on MCE is a cost-effective way to reduce s-risks. Below is the text from a conversation I recently had, where alternating paragraphs are alternating between the two participants.
The brief conversation below focuses on the idea of premise 3, but I’d also note that the existing psychological evidence for the “secondary transfer effect” is relevant to premise 4. I think that you could make progress on empirically testing premise 4. I agree that this would be fairly high priority to run more tests on (focused specifically on farmed animals / wild animals / artificial sentience) from the perspective of prioritising between different potential “targets” of MCE advocacy, and also perhaps for deciding how important MCE work is within the longtermist portfolio of interventions. I can imagine this research fitting well within Sentience Institute. If you know anyone who could be interested in applying to our current researcher job opening (closing in ~one week’s time), to work on this or other questions, please do let them know about the opening.
----
“I don’t personally see value lock-in as a prerequisite for farmed animals → artificial sentience far future impact… if the graph of future moral progress is a sine wave, and you increase the whole sine wave by 2 units, then your expected value is still 2*duration-of-everything, even if you don’t increase the value at the lock-in point.”
“It doesn’t seem that likely to me that you would increase the whole sine wave by 2 units, as opposed to just increasing the gradient of one of the upward slopes or something like that.”
“Hm, why do you think increasing the gradient is more likely? If you just add an action into the world that wouldn’t happen otherwise (e.g. donate $100 to an animal rights org), then it seems the default is an increase in the whole sine wave. For that to be simply an increase in upward slope, you’d need to think there’s a fundamental dynamic in society changing the impact of that contribution, such as a limit on short-term progress. But one can also imagine the opposite effect where progress is easier during certain periods, so you could cause >2 units of increase. There are lots of intuitions that can pull the impact up or down, but overall, a +2 increase in the whole wave seems like the go-to assumption.”
“Presumably it depends on the extent to which you think there’s something like a secondary transfer effect, or some other mechanism by which successful advocacy for farmed animals enables advocacy for other sentient beings. E.g. imagine that we have 100% certainty that animal farming will end within 1000 years, and we know that all advocates (apart from us) are interested in farmed animal advocacy specifically, rather than MCE advocacy. Then, all MCE work would be doing would be speeding up the time before the end of animal farming. But if we remove those assumptions, then I guess it would have some sort of “increase” effect, rather than just an effect on the slope. Both those assumptions are unreasonable, but presumably you could get something similar if it was close to 100% and most farmed animal advocacy efforts seemed likely to terminate at the end of animal farming, as oppose to be shifted into other forms of MCE advocacy.”
“Yep, that makes sense if you don’t think there’s some diminishing factor on the flow-through from farmed animal advocacy to moral inclusion of AS, as long as you don’t think there are increasing factors that outweigh it.”
Thanks for this comment!
Good point, thanks! Judging only from that paper’s abstract, I’d guess that it’d indeed be useful for work on these questions to draw on evidence and theorising about secondary transfer effects.
Yes, I’d agree that this kind of work seems to clearly fit the Sentience Institute’s mission, and that SI seems like it’s probably among the best homes for this kind of work. (Off the top of my head, other candidate “best homes” might be Rethink Priorities or academic psychology. But it’s possible that, in the latter, it’d be hard to sell people on focusing a lot of resources on the most relevant questions.)
So I’m glad you stated that explicitly (perhaps I should’ve too), and mentioned SI’s job opening here, so people interested in researching these questions can see it.
(Having written this comment and then re-read your comment, I have a sense that I might be sort-of talking past you or totally misunderstanding you, so let me know if that’s the case.)
Responding to your conversation text:
I still find it hard to wrap my head around what the claims or arguments in that conversation would actually mean. Though I might say the same about a lot of other arguments about extremely long-term trajectories, so this comment isn’t really meant as a critique.
Some points where I’m confused about what you mean, or about how to think about it:
What, precisely, do we mean by “value lock-in”?
If we mean something as specific as “a superintelligent AI is created with a particular set of values, and then its values never change and the accessible universe is used however the superintelligent AI decided to use it”, then I think moral advocacy can clearly have a lasting impact without that sort of value lock-in.
Do we mean that some actors’ (e.g., current humans) values are locked in, or that there’s a lock-in of what values will determine how the accessible universe is used?
Do we mean that a specific set of values are locked in, or that something like a particular “trajectory” or “range” of values are locked in? E.g., would we count it as “value lock-in” if we lock in a particular recurring pattern of shifts in values? Or if we just lock-in disvaluing suffering, but values could still shift along all other dimensions?
“if the graph of future moral progress is a sine wave”—do you essentially mean that there’s a recurring pattern of values getting “better” and then later getting “worse”?
And do you mean that that pattern lasts indefinitely—i.e., until something like the heat death of the universe?
Do you see it as plausible that that sort of a pattern could last an extremely long time? If so, what sort of things do you think would drive it?
At first glance, it feels to me like that would be extremely unlikely to happen “by chance”, and that there’s no good reason to believe we’re already stuck with this sort of a pattern happening indefinitely. So it feels like it would have to be the case that something in particular happens (which we currently could still prevent) that causes us to be stuck with this recurring pattern.
If so, I think I’d want to say that this is meaningfully similar to a value lock-in; it seems like a lock-in of a particular trajectory has to occur at a particular point, and that what matters is whether that lock-in occurs, and what trajectory we’re locked into when it occurs. (Though it could be that the lock-in occurs “gradually”, in the sense that it gradually becomes harder and harder to get out of that pattern. I think this is also true for lock-in of a specific set of values.)
I think that thinking about what might cause us to end up with an indefinite pattern of improving and then worsening moral values would help us think about whether moral advocacy work would just speed us along one part of the pattern, shift the whole pattern, change what pattern we’re likely to end up with, or change whether we end up with such a pattern at all. (For present purposes, I’d say we could call farm animal welfare work “indirect moral advocacy work”, if its ultimate aim is shifting values.)
I also think an argument can be made that, given a few plausible yet uncertain assumptions, there’s practically guaranteed to eventually be a lock-in of major aspects of how the accessible universe is used. I’ve drafted a brief outline of this argument and some counterpoints to it, which I’ll hopefully post next month, but could also share on request.
Yeah, I agree.
For one thing, if we just cut premise 3 in my statement of that argument, all that the conclusion would automatically lose is the claim of “urgency”, not the claim of “importance”. And it may sometimes be best to work on very important things even if they’re not urgent. (For this reason, and because my key focus was really on Premise 4, I’d actually considered leaving Premise 3 out of my comment.)
For another thing, which I think is your focus, it seems conceivable that MCE could be urgent even if value lock-in was pretty much guaranteed not to happen anytime soon. That said, I don’t think I’ve heard an alternative argument for MCE being urgent that I’ve understood and been convinced by. Tomorrow, with more sleep under my belt, I plan to have another crack at properly thinking through the dialogue you provide (including trying to think about what it would actually mean for the graph of future moral progress to be like a sine wave, and what could cause that to occur indefinitely).
(commented in wrong place, sorry)