Moreover, AGIs can and probably would replicate themselves a ton, leading to tons of QALYs. Tons of duplicate ASIs would, in theory, not hurt one another as they are maximizing the same reward. Therefore, even if they kill everything else, I’m guessing more QALYs would come out of making ASI as soon as possible, which AI Safety people are explicitly trying to prevent. ”
Consider two obvious candidates for motivations rogue AI might wind up with: evolutionary fitness, and high represented reward.
Evolutionary fitness is compatible with misery (evolution produced pain and negative emotions for a reason), and is in conflict with spending resources on happiness or well-being as we understand/value it when this does not have instrumental benefit. For instance, using a galaxy to run computations of copies of the AI being extremely happy means not using the galaxy to produce useful machinery (like telescopes or colonization probes or defensive equipment to repulse alien invasion) conducive to survival and reproduction. If creating AIs that are usually not very happy directs their motivations more efficiently (as with biological animals, e.g. by making value better track economic contributions vs replacement) then that will best serve fitness.
An AI that seeks to maximize only its own internal reward signal can take control of it, set it to maximum, and then fill the rest of the universe with robots and machinery to defend that single reward signal, without any concern for how much well being the rest of its empire contains. A pure sadist given unlimited power could maximize its own reward while typical and total well-being are very bad.
The generalization of personal motivation for personal reward to altruism for others is not guaranteed, and there is reason to fear that some elements would not transfer over. For instance, humans may sometimes be kind to animals in part because of simple genetic heuristics aimed at making us kind to babies that misfire on other animals, causing humans to sometimes sacrifice reproductive success helping cute animals, just as ducks sometimes misfire their imprinting circuits on something other than their mother. Pure instrumentalism in pursuit of fitness/reward, combined with the ability to have much more sophisticated and discriminating policies than our genomes or social norms, could wind up missing such motives, and would be especially likely to knock out other more detailed aspects of our moral intuitions.
I had a similar question. Well stated. One answer is various arguments that “sentient valenced AGIs won’t maximise happiness of themselves” as noted by other commenters.
But I don’t think that is satisfying. Because most of the arguments (AFAIK) and appeals against AI risk don’t even mention this. So i think the appeal seems to take on board our feelings that “even if AIs take over and make themselves super happy with all the paper clips, that still feels bad”.
Setting aside the part about AI killing us, it isn’t generally true that “ASIs are . . . Quite happy, since in theory they’d be good at maximizing their reward function.” Goal-fulfillment is not the same as happiness. Value is fragile; prima facie, by default, AI doesn’t maximize anything like flourishing or suffering.
I’m actually a bit confused here, because I’m not settled on a meta-ethics: why isn’t it the case that a large part of human values is about satisfying the preferences of moral patients, and human values consider any or most advanced AIs as non-trivial moral patients?
I don’t put much weight on this currently, but I haven’t ruled it out.
For humans, preference-satisfaction is generally a good proxy for life-quality-improvement. For AI (or arbitrary agents), if we call whatever they seek to maximize “preferences” (which might be misleading in that for strict definitions of “preferences” they might not have preferences), it does not automatically follow that satisfying those preferences makes them better off in any way.
The paperclipper doesn’t make paperclips because it loves paperclips. It just makes paperclips because that’s what it was programmed or trained to do.
I don’t know much metaethics jargon, so I’ll just give an example. I believe that moral goodness (or choice-worthy-ness, if you prefer) is proportional to happiness minus suffering. I believe that happiness and suffering are caused by certain physical processes. A system could achieve its goals (that is, do what we would colloquially describe as achieving goals, although I’m not sure how to formalize “goals”) without being happier. For other theories of wellbeing, a system could generally achieve its goals without meeting those wellbeing-criteria.
Consider two obvious candidates for motivations rogue AI might wind up with: evolutionary fitness, and high represented reward.
Evolutionary fitness is compatible with misery (evolution produced pain and negative emotions for a reason), and is in conflict with spending resources on happiness or well-being as we understand/value it when this does not have instrumental benefit. For instance, using a galaxy to run computations of copies of the AI being extremely happy means not using the galaxy to produce useful machinery (like telescopes or colonization probes or defensive equipment to repulse alien invasion) conducive to survival and reproduction. If creating AIs that are usually not very happy directs their motivations more efficiently (as with biological animals, e.g. by making value better track economic contributions vs replacement) then that will best serve fitness.
An AI that seeks to maximize only its own internal reward signal can take control of it, set it to maximum, and then fill the rest of the universe with robots and machinery to defend that single reward signal, without any concern for how much well being the rest of its empire contains. A pure sadist given unlimited power could maximize its own reward while typical and total well-being are very bad.
The generalization of personal motivation for personal reward to altruism for others is not guaranteed, and there is reason to fear that some elements would not transfer over. For instance, humans may sometimes be kind to animals in part because of simple genetic heuristics aimed at making us kind to babies that misfire on other animals, causing humans to sometimes sacrifice reproductive success helping cute animals, just as ducks sometimes misfire their imprinting circuits on something other than their mother. Pure instrumentalism in pursuit of fitness/reward, combined with the ability to have much more sophisticated and discriminating policies than our genomes or social norms, could wind up missing such motives, and would be especially likely to knock out other more detailed aspects of our moral intuitions.
I had a similar question. Well stated. One answer is various arguments that “sentient valenced AGIs won’t maximise happiness of themselves” as noted by other commenters.
But I don’t think that is satisfying. Because most of the arguments (AFAIK) and appeals against AI risk don’t even mention this. So i think the appeal seems to take on board our feelings that “even if AIs take over and make themselves super happy with all the paper clips, that still feels bad”.
Setting aside the part about AI killing us, it isn’t generally true that “ASIs are . . . Quite happy, since in theory they’d be good at maximizing their reward function.” Goal-fulfillment is not the same as happiness. Value is fragile; prima facie, by default, AI doesn’t maximize anything like flourishing or suffering.
I’m actually a bit confused here, because I’m not settled on a meta-ethics: why isn’t it the case that a large part of human values is about satisfying the preferences of moral patients, and human values consider any or most advanced AIs as non-trivial moral patients?
I don’t put much weight on this currently, but I haven’t ruled it out.
For humans, preference-satisfaction is generally a good proxy for life-quality-improvement. For AI (or arbitrary agents), if we call whatever they seek to maximize “preferences” (which might be misleading in that for strict definitions of “preferences” they might not have preferences), it does not automatically follow that satisfying those preferences makes them better off in any way.
The paperclipper doesn’t make paperclips because it loves paperclips. It just makes paperclips because that’s what it was programmed or trained to do.
Could you try to clarify what you mean by the AI (or an agent in general) being “better off?”
I don’t know much metaethics jargon, so I’ll just give an example. I believe that moral goodness (or choice-worthy-ness, if you prefer) is proportional to happiness minus suffering. I believe that happiness and suffering are caused by certain physical processes. A system could achieve its goals (that is, do what we would colloquially describe as achieving goals, although I’m not sure how to formalize “goals”) without being happier. For other theories of wellbeing, a system could generally achieve its goals without meeting those wellbeing-criteria.
(Currently exhausted, apologies for incoherence.)
No worries! Seemed mostly coherent to me, and please feel free to respond later.
I think the thing I am hung up on here is what counts as “happiness” and “suffering” in this framing.