What’s your credence that humans create a utopia in the alternative? Depending on the strictness of one’s definition, I think a future utopia is quite unlikely either way, whether we solve alignment or not.
It seems you expect future unaligned AIs will either be unconscious or will pursue goals that result in few positive conscious experiences being created. I am not convinced of this myself. At the very least, I think such a claim demands justification.
Given the apparent ubiquity of consciousness in the animal kingdom, and the anticipated sophistication of AI cognition, it is difficult for me to imagine a future essentially devoid of conscious life, even if that life is made of silicon and it does not share human preferences.
I think the position I’m arguing for is basically the standard position among AI safety advocates so I haven’t really scrutinized it. But basically, (many) animals evolved to experience happiness because it was evolutionarily useful to do so. AIs are not evolved so it seems likely that by default, they would not be capable of experiencing happiness. This could be wrong—it might be that happiness is a byproduct of some sort of information processing, and sufficiently complex reinforcement learning agents necessarily experience happiness (or something like that).
Also: According to the standard story where an unaligned AI has some optimization target and then kills all humans in the interest of pursuing that target (e.g. a paperclip maximizer), it seems unlikely that this AI would experience much happiness (granting that it’s capable of happiness) because its own happiness is not the optimization target.
(Note: I realize I am ignoring some parts of your comment, I’m intentionally only responding to the central point so my response doesn’t get too frayed.)
According to the standard story where an unaligned AI has some optimization target and then kills all humans in the interest of pursuing that target (e.g. a paperclip maximizer), it seems unlikely that this AI would experience much happiness (granting that it’s capable of happiness) because its own happiness is not the optimization target.
I agree that this is the standard story regarding AI risk, but I haven’t seen convincing arguments that support this specific model.
In other words, I see no compelling evidence to believe that future AIs will have exclusively abstract, disconnected goals—like maximizing paperclip production—and that such AIs would fail to generate significant amounts of happiness, either as a byproduct of their goals or as an integral part of achieving them.
(Of course, it’s crucial to avoid wishful thinking. A favorable outcome is by no means guaranteed, and I’m not arguing otherwise. Instead, my point is that the core assumption underpinning this standard narrative seems weakly argued and poorly substantiated.)
The scenario I find most plausible is one in which AIs have a mixture of goals, much like humans. Some of these goals will likely be abstract, while others will be directly tied to the AI’s internal experiences and mental states.
Just as humans care about their own happiness but also care about external reality—such as the impact they have on the world or what happens after they’re dead—I expect that many AIs will place value on both their own mental states and various aspects of external reality.
This ultimately depends on how AIs are constructed and trained, of course. However, as you mentioned, there are some straightforward reasons to anticipate parallels between how goals emerge in animals and how they might arise in AIs. For example, robots and some other types of AIs will likely be trained through reinforcement learning. While RL on computers isn’t identical to the processes by which animals learn, it is similar enough in critical ways to suggest that these parallels could have significant implications.
What’s your credence that humans create a utopia in the alternative? Depending on the strictness of one’s definition, I think a future utopia is quite unlikely either way, whether we solve alignment or not.
It seems you expect future unaligned AIs will either be unconscious or will pursue goals that result in few positive conscious experiences being created. I am not convinced of this myself. At the very least, I think such a claim demands justification.
Given the apparent ubiquity of consciousness in the animal kingdom, and the anticipated sophistication of AI cognition, it is difficult for me to imagine a future essentially devoid of conscious life, even if that life is made of silicon and it does not share human preferences.
I think the position I’m arguing for is basically the standard position among AI safety advocates so I haven’t really scrutinized it. But basically, (many) animals evolved to experience happiness because it was evolutionarily useful to do so. AIs are not evolved so it seems likely that by default, they would not be capable of experiencing happiness. This could be wrong—it might be that happiness is a byproduct of some sort of information processing, and sufficiently complex reinforcement learning agents necessarily experience happiness (or something like that).
Also: According to the standard story where an unaligned AI has some optimization target and then kills all humans in the interest of pursuing that target (e.g. a paperclip maximizer), it seems unlikely that this AI would experience much happiness (granting that it’s capable of happiness) because its own happiness is not the optimization target.
(Note: I realize I am ignoring some parts of your comment, I’m intentionally only responding to the central point so my response doesn’t get too frayed.)
I agree that this is the standard story regarding AI risk, but I haven’t seen convincing arguments that support this specific model.
In other words, I see no compelling evidence to believe that future AIs will have exclusively abstract, disconnected goals—like maximizing paperclip production—and that such AIs would fail to generate significant amounts of happiness, either as a byproduct of their goals or as an integral part of achieving them.
(Of course, it’s crucial to avoid wishful thinking. A favorable outcome is by no means guaranteed, and I’m not arguing otherwise. Instead, my point is that the core assumption underpinning this standard narrative seems weakly argued and poorly substantiated.)
The scenario I find most plausible is one in which AIs have a mixture of goals, much like humans. Some of these goals will likely be abstract, while others will be directly tied to the AI’s internal experiences and mental states.
Just as humans care about their own happiness but also care about external reality—such as the impact they have on the world or what happens after they’re dead—I expect that many AIs will place value on both their own mental states and various aspects of external reality.
This ultimately depends on how AIs are constructed and trained, of course. However, as you mentioned, there are some straightforward reasons to anticipate parallels between how goals emerge in animals and how they might arise in AIs. For example, robots and some other types of AIs will likely be trained through reinforcement learning. While RL on computers isn’t identical to the processes by which animals learn, it is similar enough in critical ways to suggest that these parallels could have significant implications.