Might wireheaders turn into paperclippers?

Morally valuable post-human futures?

Effective altruism’s expanding circle of moral concern has become wide enough, for many people, to include “post-humans”: beings that may become the dominant agents of the future, rendering human experience as we know it negligible or nonexistent in comparison. These beings may be biological, computational, or a mixture of the two. In this category I include transhumanist scenarios, such as human enhancement and wireheading, certain simulations, and hedonium.

For the most part, these scenarios are viewed as highly net positive by some EAs. In fact, some people believe that these possible futures should be a goal of effective altruism and existential risk reduction. For example, in one argument for why people should care about existential risk, Nick Bostrom imagines a galactic supercluster being entirely converted into computers that would simulate happy human minds. A 1% chance of speeding this scenario by one second, he says, would bring astronomical utility. Of course, this Pascal’s-mugging-like argument is not the main argument for existential risk reduction, and I don’t intend to straw-man existential risk researchers. However, these post-human fantasies are popular enough in the EA community that they are worth addressing.

The typical objection to pursuing post-human fantasies is the possibility of post-human nightmares: universes that contain astronomical suffering. These arguments have been well-discussed by other people. However, there is a third option that has been largely ignored: what if post-humans have neither positive nor negative moral value?

Reward functions we don’t care about

How do we know if a post-human is happy? This may be very difficult if the post-human bears little similarity to us, because we wouldn’t be able to ask it about its internal experience in a way that would provide a trustworthy answer, much less identify neurotransmitters or brainwaves of value. Some might approximate happiness as the post-human’s reward function, which can be either satisfied or unsatisfied. By this definition, a paperclipper would be capable of pain and pleasure, because it has a reward function that’s satisfied when it produces paperclips. However, few people would view a universe full of paperclips as astronomically net positive, even though the paperclipper would be “happy”. If you believe that the paperclipper scenario would be very good, consider an even simpler reward function and ask yourself if you still care about it. For example, imagine an electrical switch with two positions: one labeled “pleasure” and one labeled “pain”. Would this system be algedonic in a way we’d care about?

Thus, it seems insufficient to maximize reward functions to maximize positive value.

Solutions

My best solution to the reductio ad absurdum of valuing paperclippers is to base my moral system on human experience as we know it. The experiences of animals and other entities are valuable to me insofar as they are similar to human experience (see following sections).

I still care about existential risk because of the value of future human generations, but the moral case feels less overwhelming than I previously believed, because I don’t value many possible post-human futures.

While I don’t object to mild forms of human enhancement per se, I am concerned that any post-humans would face selective pressures to become less like humans and more like paperclippers. Certain technological leaps would be irreversible and would send us hurtling toward futures that have little relationship to human values. I believe that MIRI shares these concerns in regards to AI but hasn’t extended concern to the other post-human scenarios I listed, which are widely viewed as positive. Some AI researchers have suggested putting restrictions on certain types of AI research, a policy which I may want applied to other technological developments as well. At the very least, I would avoid actively endorsing or pursuing the creation of post-humans.

Speculation

Note: I’m not very confident in the following, and it’s less important to my major concerns about post-humans.

Consciousness as “similarity to one’s own mind”

I’m not confident that the hard problem of consciousness can be answered empirically. Thus, I find a more tractable question is, “What do people mean when they say ‘consciousness’?” Top-down definitions of consciousness (e.g. “complexity” or “integration”) seem to miss the point. We can always come up with systems that meet these definitions of consciousness but don’t fit our intuitions of consciousness. For example, some have speculated that corporations may be conscious, but I think that few people would be willing to accept this conclusion. So what if we skipped the pretense and started with the intuitions, instead of trying to stretch definitions to fit our intuitions? Many people seem to have the following intuitions: humans are certainly conscious, animals are less conscious (and their consciousness decreases as their similarity to humans decreases), plants are not conscious, computers are only conscious if they behave similarly to human cognitive processes (e.g. by simulating human brains).

Defining consciousness as similarity to one’s own mind captures many of these intuitions, though it leads to some unpleasant conclusions. For example, it would lead to some degree of moral egotism. Additionally, from this perspective, I would view someone in Sub-Saharan Africa as less conscious than a fellow American. I’m very reluctant to accept these conclusions. One consolation is that the effects of this egotism would be negligible, since most humans are very similar to me. Also, these bullets are easier to bite than some others, such as conscious corporations. Finally, these conclusions are somewhat in line with “common-sense” ethics, which accepts slight egotism and strong kin preference.

Furthermore, egotism is implicitly central to discussions of consciousness. Thought experiments about qualia are about one’s personal experience with, for example, the color red. When we ask, “What is it like to be a bat?”, we are asking “What is it like for me to be a bat?”

The spectrum of personal identity

The problem of personal identity has been frequently discussed in the philosophical literature. If I’m constantly changing, down to the cellular level, then, like the ship of Theseus, how can I maintain a constant personal identity? Open individualism rejects the idea of personal identity entirely and asserts that all consciousness is one. This seems too vague to me. Instead, I view personal identity as a spectrum, based on similarity to my current self. The self of five minutes ago is more “me” than the self of five years ago, who is more “me” than Mike, a middle-aged Republican farmer in Iowa. Mike is far more “me” than Mike’s chicken.

Changing the status quo is death

Transhumanists note status quo bias in common objections to human enhancement: people are reluctant to accept changes to themselves or their societies, even if these changes appear objectively better. However, to some degree this bias can be viewed as rational. If I were to become Mike and lose all trace of my previous self, I would view this as almost as bad as death, even though Mike is happy. Of course, there are tradeoffs between self-improvement and status quo preservation. I don’t want to retain all features of my current mind for the rest of my life, but I’m sufficiently concerned about small deaths of my personal identity that I avoid long-term mind-altering substances.