If we achieve existential security and launch the von Neumann probes successfully, we will be able to do >>10^80 operations in expectation. We could tile the universe with hedonium or do acausal trade or something and it’s worth >>10^60 happy human lives in expectation. Digital minds are super important.
Short-term AI suffering will be small-scale—less than 10^40 FLOP and far from optimized for suffering, even if suffering is incidental—and worth <<10^20 happy human lives (very likely <10^10).
10^20 isn’t even a feather in the scales when 10^60 is at stake.
“Lock-in” [edit: of “AI welfare” trends on Earth] is very unlikely; potential causes of short-term AI suffering (like training and deploying LLMs) are very different from potential causes of astronomical-scale digital suffering (like tiling the universe with dolorium, the arrangement of matter optimized for suffering). And digital-mind-welfare research doesn’t need to happen yet; there will be plenty of subjective time for it before the von Neumann probes’ goals are set.
Therefore, to a first approximation, we should not trade off existential security for short-term AI welfare, and normal AI safety work is the best way to promote long-term digital-mind-welfare.
I discuss related topics here and what fraction of resources should go to AI welfare. (A section in the same post I link above.)
The main caveats to my agreement are:
From a deontology-style perspective, I think there is a pretty good case for trying to do something reasonable on AI welfare. Minimally, we should try to make sure that AIs consent to their current overall situation insofar as they are capable of consenting. I don’t put a huge amount of weight on deontology, but enough to care a bit.
As you discuss in the sibling comment, I think various interventions like paying AIs (and making sure AIs are happy with their situation) to reduce takeover risk are potentially compelling and they are very similar to AI welfare interventions. I also think there is a weak decision theory case that blends in with deontology case from the prior bullet.
I think that there is a non-trivial chance that AI welfare is a big and important field at the point when AIs are powerful regardless of whether I push for such a field to exist. In general, I would prefer that important fields related to AI have better more thoughtful views. (Not with any specific theory of change, just a general heuristic.)
Aligned AI concentrates power in a small group of humans
AI technology allows them to dictate aspects of the future / cause some “lock in” if they want. That’s because:
These humans control the AI systems that have all the hard power in the world
Those AI systems will retain all the hard power indefinitely; their wishes cannot be subverted
Those AI systems will continue to obey whatever instructions they are given indefinitely
Those humans decide to dictate some or all of what the future looks like, and lots of AIs end up suffering in this future because their welfare isn’t considered by the decision makers.
(Also, the decision makers could pick a future which isn’t very good in other ways.)
You could imagine AI welfare work now improving things by putting AI welfare on the radar of those people, so they’re more likely to take AI welfare into account when making decisions.
I’d be interested in which step of this story seems implausible to you—is it about AI technology making “lock in” possible?
I agree this is possible, and I think a decent fraction of the value of “AI welfare” work comes from stuff like this.
Those humans decide to dictate some or all of what the future looks like, and lots of AIs end up suffering in this future because their welfare isn’t considered by the decision makers.
This would be very weird: it requires that either the value-setters are very rushed or that they have lots of time to consult with superintelligent advisors but still make the wrong choice. Both paths seem unlikely.
This would be very weird: it requires that either the value-setters are very rushed or [...]
As an intuition pump: if the Trump administration,[1] or a coalition of governments led by the U.S., is faced all of a sudden—on account of intelligence explosion[2] plus alignment going well—with deciding what to do with the cosmos, will they proceed thoughtfully or kind of in a rush? I very much hope the answer is “thoughtfully,” but I would not bet[3] that way.
What about if we end up in a multipolar scenario, as forecasters think is about 50% likely? In this case, I think rushing is the default?
Pausing for a long reflection may be the obvious path to you or me or EAs in general if suddenly in charge of an aligned ASI singleton, but the way we think is very strange compared to most people in the world.[4] I expect that without a good deal of nudging/convincing, the folks calling the shots will not opt for such reflection.[5]
(Note that I don’t consider this a knockdown argument for putting resources towards AI welfare in particular: I only voted slightly in the direction of “agree” for this debate week. I do, however, think that many more EA resources should be going towards ASI governance / setting up a long reflection, as I have writtenbefore.)
This would be very weird: it requires that either the value-settlers [...] or that they have lots of time to consult with superintelligent advisors but still make the wrong choice.
One thread here that feels relevant: I don’t think it’s at all obvious that superintelligent advisors will be philosophically competent.[6] Wei Dai has written a series of posts on this topic (which I collected here); this is an open area of inquiry that serious thinkers in our sphere are funding. In my model, this thread links up with AI welfare since welfare is in part an empirical problem, which superintelligent advisors will be great at helping with, but also in part a problem of values and philosophy.[7]
My mainline prediction is that decision makers will put some thought towards things like AI welfare—in fact, by normal standards they’ll put quite a lot of thought towards these things—but they will fall short of the extreme thoughtfulness that a scope-sensitive assessment of the stakes calls for. (This prediction is partly informed by someone I know who’s close to national security, and who has been testing the waters there to gauge the level of openness towards something like a long reflection.)
One might argue that this is a contradictory statement, since the most common definition of superintelligence is an AI system (or set of systems) that’s better than the best human experts in all domains. So, really, what I’m saying is that I believe it’s very possible we end up in a situation in which we think we have superintelligence—and the AI we have sure is superhuman at many/most/almost-all things—but, importantly, philosophy is its Achilles heel.
(To be clear, I don’t believe there’s anything special about biological human brains that makes us uniquely suited to philosophy; I don’t believe that philosophically competent AIs are precluded from the space of all possible AIs. Nonetheless, I do think there’s a substantial chance that the “aligned” “superintelligence” we build in practice lacks philosophical competence, to catastrophic effect. (For more, see Wei Dai’s posts.))
(Minor point: in an unstable multipolar world, it’s not clear how things get locked in, and for the von Neumann probes in particular, note that if you can launch slightly faster probes a few years later, you can beat rushed-out probes.)
Yeah, I agree that it’s unclear how things get locked in in this scenario. However, my best guess is that solving the technological problem of designing and building probes that travel as fast as allowed by physics—i.e., just shy of light speed[1]—takes less time than solving the philosophical problem of what to do with the cosmos.
If one is in a race, then one is forced into launching probes as soon as one has solved the technological problem of fast-as-physically-possible probes (because delaying means losing the race),[2] and so in my best guess the probes launched will be loaded with values that one likely wouldn’t endorse if one had more time to reflect.[3]
Additionally, if one is in a race to build fast-as-physically-possible probes, then one is presumably putting most of one’s compute toward winning that race, leaving one with little compute for solving the problem of what values to load the probes with.[4]
Overall, I feel pretty pessimistic about a multipolar scenario going well,[5] but I’m not confident.
There’s some nuance here: maybe one has a lead and can afford some delay. Also, the prize is continuous rather than discrete—that is, one still gets some of the cosmos if one launches late (although on account of how the probes reproduce exponentially, one does lose out big time by being second)*.
you could imagine a state letting loose this robotic machinery that replicates at a very rapid rate. If it doubles 12 times in a year, you have 4,096 times as much. By the time other powers catch up to that robotic technology, if they were, say, a year or so behind, it could be that there are robots loyal to the first mover that are already on all the asteroids, on the Moon, and whatnot. And unless one tried to forcibly dislodge them, which wouldn’t really work because of the disparity of industrial equipment, then there could be an indefinite and permanent gap in industrial and military equipment.
It’s very unclear to me how large this discrepancy is likely to be. Are the loaded values totally wrong according to one’s idealized self? Or are they basically right, such that the future is almost ideal?
There’s again some nuance here, like maybe one believes that the set of world-states/matter-configurations that would score well according to one’s idealized values is very narrow. In this case, the EV calculation could indicate that it’s better to take one’s time even if this means losing almost all of the cosmos, since a single probe loaded with one’s idealized values is worth more to one than a trillion probes loaded with the values one would land on through a rushed reflective process.
There are also decision theory considerations/wildcards, like maybe the parties racing are mostly AI-led rather than human-led (in a way in which the humans are still empowered, somehow), and the AIs—being very advanced, at this point—coordinate in an FDT-ish fashion and don’t in fact race.
On top of race dynamics resulting in suboptimal values being locked in, as I’ve focused on above, I’m worried about very bad, s-risky stuff like threats and conflict, as discussed in this research agenda from CLR.
I think my worry is people who don’t think they need advice about what the future should look like. When I imagine them making the bad decision despite having lots of time to consult superintelligent AIs, I imagine them just not being that interested in making the “right” decision? And therefore their advisors not being proactive in telling them things that are only relevant for making the “right” decision.
That is, assuming the AIs are intent aligned, they’ll only help you in the ways you want to be helped:
Thoughtful people might realise the importance of getting the decision right, and might ask “please help me to get this decision right” in a way that ends up with the advisors pointing out that AI welfare matters and the decision makers will want to take that into account.
But unthoughtful or hubristic people might not ask for help in that way. They might just ask for help in implementing their existing ideas, and not be interested in making the “right” decision or in what they would endorse on reflection.
I do hope that people won’t be so thoughtless as to impose their vision of the future without seeking advice, but I’m not confident.
At some point we’ll send out lightspeed probes to tile the universe with some flavor of computronium. The key question (for scope-sensitive altruists) is what that computronium will compute. Will an unwise agent or incoherent egregore answer that question thoughtlessly? I intuit no.
I can’t easily make this intuition legible. (So I likely won’t reply to messages about this.)
I endorse the argument we should figure out how to use LLM-based systems without accidentally torturing them because they’re more likely to take catastrophic actions if we’re torturing them.
I haven’t tried to understand the argument we should try to pay AIs to [not betray us / tell on traitors / etc.] and working on AI-welfare stuff would help us offer AIs payment better; there might be something there.
I don’t understand the decision theory mumble mumble argument; there might be something there.
(Other than that, it seems hard to tell a story about how “AI welfare” research/interventions now could substantially improve the value of the long-term future.)
(My impression is these arguments are important to very few AI-welfare-prioritizers / most AI-welfare-prioritizers have the wrong reasons.)
My position on “AI welfare”
If we achieve existential security and launch the von Neumann probes successfully, we will be able to do >>10^80 operations in expectation. We could tile the universe with hedonium or do acausal trade or something and it’s worth >>10^60 happy human lives in expectation. Digital minds are super important.
Short-term AI suffering will be small-scale—less than 10^40 FLOP and far from optimized for suffering, even if suffering is incidental—and worth <<10^20 happy human lives (very likely <10^10).
10^20 isn’t even a feather in the scales when 10^60 is at stake.
“Lock-in” [edit: of “AI welfare” trends on Earth] is very unlikely; potential causes of short-term AI suffering (like training and deploying LLMs) are very different from potential causes of astronomical-scale digital suffering (like tiling the universe with dolorium, the arrangement of matter optimized for suffering). And digital-mind-welfare research doesn’t need to happen yet; there will be plenty of subjective time for it before the von Neumann probes’ goals are set.
Therefore, to a first approximation, we should not trade off existential security for short-term AI welfare, and normal AI safety work is the best way to promote long-term digital-mind-welfare.
[Edit: the questionable part of this is #4.]
I basically agree with this with some caveats. (Despite writing a post discussing AI welfare interventions.)
I discuss related topics here and what fraction of resources should go to AI welfare. (A section in the same post I link above.)
The main caveats to my agreement are:
From a deontology-style perspective, I think there is a pretty good case for trying to do something reasonable on AI welfare. Minimally, we should try to make sure that AIs consent to their current overall situation insofar as they are capable of consenting. I don’t put a huge amount of weight on deontology, but enough to care a bit.
As you discuss in the sibling comment, I think various interventions like paying AIs (and making sure AIs are happy with their situation) to reduce takeover risk are potentially compelling and they are very similar to AI welfare interventions. I also think there is a weak decision theory case that blends in with deontology case from the prior bullet.
I think that there is a non-trivial chance that AI welfare is a big and important field at the point when AIs are powerful regardless of whether I push for such a field to exist. In general, I would prefer that important fields related to AI have better more thoughtful views. (Not with any specific theory of change, just a general heuristic.)
Why does “lock-in” seem so unlikely to you?
One story:
Assume AI welfare matters
Aligned AI concentrates power in a small group of humans
AI technology allows them to dictate aspects of the future / cause some “lock in” if they want. That’s because:
These humans control the AI systems that have all the hard power in the world
Those AI systems will retain all the hard power indefinitely; their wishes cannot be subverted
Those AI systems will continue to obey whatever instructions they are given indefinitely
Those humans decide to dictate some or all of what the future looks like, and lots of AIs end up suffering in this future because their welfare isn’t considered by the decision makers.
(Also, the decision makers could pick a future which isn’t very good in other ways.)
You could imagine AI welfare work now improving things by putting AI welfare on the radar of those people, so they’re more likely to take AI welfare into account when making decisions.
I’d be interested in which step of this story seems implausible to you—is it about AI technology making “lock in” possible?
I agree this is possible, and I think a decent fraction of the value of “AI welfare” work comes from stuff like this.
This would be very weird: it requires that either the value-setters are very rushed or that they have lots of time to consult with superintelligent advisors but still make the wrong choice. Both paths seem unlikely.
As an intuition pump: if the Trump administration,[1] or a coalition of governments led by the U.S., is faced all of a sudden—on account of intelligence explosion[2] plus alignment going well—with deciding what to do with the cosmos, will they proceed thoughtfully or kind of in a rush? I very much hope the answer is “thoughtfully,” but I would not bet[3] that way.
What about if we end up in a multipolar scenario, as forecasters think is about 50% likely? In this case, I think rushing is the default?
Pausing for a long reflection may be the obvious path to you or me or EAs in general if suddenly in charge of an aligned ASI singleton, but the way we think is very strange compared to most people in the world.[4] I expect that without a good deal of nudging/convincing, the folks calling the shots will not opt for such reflection.[5]
(Note that I don’t consider this a knockdown argument for putting resources towards AI welfare in particular: I only voted slightly in the direction of “agree” for this debate week. I do, however, think that many more EA resources should be going towards ASI governance / setting up a long reflection, as I have written before.)
One thread here that feels relevant: I don’t think it’s at all obvious that superintelligent advisors will be philosophically competent.[6] Wei Dai has written a series of posts on this topic (which I collected here); this is an open area of inquiry that serious thinkers in our sphere are funding. In my model, this thread links up with AI welfare since welfare is in part an empirical problem, which superintelligent advisors will be great at helping with, but also in part a problem of values and philosophy.[7]
the likely U.S. presidential administration for the next four years
in this world, TAI has been nationalized
I apologize to Nuño, who will receive an alert, for not using “bet” in the strictly correct way.
All recent U.S. presidents have been religious, for instance.
My mainline prediction is that decision makers will put some thought towards things like AI welfare—in fact, by normal standards they’ll put quite a lot of thought towards these things—but they will fall short of the extreme thoughtfulness that a scope-sensitive assessment of the stakes calls for. (This prediction is partly informed by someone I know who’s close to national security, and who has been testing the waters there to gauge the level of openness towards something like a long reflection.)
One might argue that this is a contradictory statement, since the most common definition of superintelligence is an AI system (or set of systems) that’s better than the best human experts in all domains. So, really, what I’m saying is that I believe it’s very possible we end up in a situation in which we think we have superintelligence—and the AI we have sure is superhuman at many/most/almost-all things—but, importantly, philosophy is its Achilles heel.
(To be clear, I don’t believe there’s anything special about biological human brains that makes us uniquely suited to philosophy; I don’t believe that philosophically competent AIs are precluded from the space of all possible AIs. Nonetheless, I do think there’s a substantial chance that the “aligned” “superintelligence” we build in practice lacks philosophical competence, to catastrophic effect. (For more, see Wei Dai’s posts.))
Relatedly, if illusionism is true, then welfare is a fully subjective problem.
(Minor point: in an unstable multipolar world, it’s not clear how things get locked in, and for the von Neumann probes in particular, note that if you can launch slightly faster probes a few years later, you can beat rushed-out probes.)
Yeah, I agree that it’s unclear how things get locked in in this scenario. However, my best guess is that solving the technological problem of designing and building probes that travel as fast as allowed by physics—i.e., just shy of light speed[1]—takes less time than solving the philosophical problem of what to do with the cosmos.
If one is in a race, then one is forced into launching probes as soon as one has solved the technological problem of fast-as-physically-possible probes (because delaying means losing the race),[2] and so in my best guess the probes launched will be loaded with values that one likely wouldn’t endorse if one had more time to reflect.[3]
Additionally, if one is in a race to build fast-as-physically-possible probes, then one is presumably putting most of one’s compute toward winning that race, leaving one with little compute for solving the problem of what values to load the probes with.[4]
Overall, I feel pretty pessimistic about a multipolar scenario going well,[5] but I’m not confident.
assuming that new physics permitting faster-than-light travel is ruled out (or otherwise not discovered)
There’s some nuance here: maybe one has a lead and can afford some delay. Also, the prize is continuous rather than discrete—that is, one still gets some of the cosmos if one launches late (although on account of how the probes reproduce exponentially, one does lose out big time by being second)*.
*From Carl Shulman’s recent 80k interview:
It’s very unclear to me how large this discrepancy is likely to be. Are the loaded values totally wrong according to one’s idealized self? Or are they basically right, such that the future is almost ideal?
There’s again some nuance here, like maybe one believes that the set of world-states/matter-configurations that would score well according to one’s idealized values is very narrow. In this case, the EV calculation could indicate that it’s better to take one’s time even if this means losing almost all of the cosmos, since a single probe loaded with one’s idealized values is worth more to one than a trillion probes loaded with the values one would land on through a rushed reflective process.
There are also decision theory considerations/wildcards, like maybe the parties racing are mostly AI-led rather than human-led (in a way in which the humans are still empowered, somehow), and the AIs—being very advanced, at this point—coordinate in an FDT-ish fashion and don’t in fact race.
On top of race dynamics resulting in suboptimal values being locked in, as I’ve focused on above, I’m worried about very bad, s-risky stuff like threats and conflict, as discussed in this research agenda from CLR.
Interesting!
I think my worry is people who don’t think they need advice about what the future should look like. When I imagine them making the bad decision despite having lots of time to consult superintelligent AIs, I imagine them just not being that interested in making the “right” decision? And therefore their advisors not being proactive in telling them things that are only relevant for making the “right” decision.
That is, assuming the AIs are intent aligned, they’ll only help you in the ways you want to be helped:
Thoughtful people might realise the importance of getting the decision right, and might ask “please help me to get this decision right” in a way that ends up with the advisors pointing out that AI welfare matters and the decision makers will want to take that into account.
But unthoughtful or hubristic people might not ask for help in that way. They might just ask for help in implementing their existing ideas, and not be interested in making the “right” decision or in what they would endorse on reflection.
I do hope that people won’t be so thoughtless as to impose their vision of the future without seeking advice, but I’m not confident.
Briefly + roughly (not precise):
At some point we’ll send out lightspeed probes to tile the universe with some flavor of computronium. The key question (for scope-sensitive altruists) is what that computronium will compute. Will an unwise agent or incoherent egregore answer that question thoughtlessly? I intuit no.
I can’t easily make this intuition legible. (So I likely won’t reply to messages about this.)
Caveats:
I endorse the argument we should figure out how to use LLM-based systems without accidentally torturing them because they’re more likely to take catastrophic actions if we’re torturing them.
I haven’t tried to understand the argument we should try to pay AIs to [not betray us / tell on traitors / etc.] and working on AI-welfare stuff would help us offer AIs payment better; there might be something there.
I don’t understand the decision theory mumble mumble argument; there might be something there.
(Other than that, it seems hard to tell a story about how “AI welfare” research/interventions now could substantially improve the value of the long-term future.)
(My impression is these arguments are important to very few AI-welfare-prioritizers / most AI-welfare-prioritizers have the wrong reasons.)
FWIW, these motivations seem reasonably central to me personally, though not my only motivations.
Among your friends, I agree; among EA Forum users, I disagree.
Yes, I meant central to me personally, edited the comment to clarify.
This is very similar to my current stance.