We Can’t Do Long Term Utilitarian Calculations Until We Know if AIs Can Be Conscious or Not

[This essay is a work in progress, and I appreciate criticism and feedback. Thank you!]

1. Introduction

My understanding of effective altruism is that its goal is to maximize the welfare of conscious beings (both human and non-human), and that longtermism has the same goal, while explicitly considering the welfare of future conscious beings who do not exist yet. In this essay I will argue that we do not know if sufficiently advanced artificial intelligences would be conscious or not, and we can’t make accurate utilitarian calculations about the long term future until we figure this out. Furthermore, I will argue that this is an urgent problem, since an error in either direction (false positive or false negative) could be morally disastrous.

Let’s start with a definition of what we’re talking about. What I mean by “consciousness” is subjective experience. Importantly, this is not the same thing as intelligence, the ability to process information in order to achieve goals. It’s also not the same as the mechanistic, physical processes and behavior of animals. What I’m talking about is the experience of subjectivity itself. I think Thomas Nagel put it best in his famous essay: a thing is conscious if there’s something that it is like to be that thing. A human, a dog, and a bat are all (presumably) conscious because there’s something that it is like to be those things. A chair and a rock are (presumably) not conscious, because there is not something that it is like to be those things.

In the future (maybe even the near future), we will almost certainly be facing artificially intelligent machines that seem conscious, and maybe they will even try to convince us that they are conscious. But how will we know?

My main criticism of the effective altruist community is that not enough thought is being put into this question. I decided to write this essay after hearing a recent interview with Sam Harris and Will MacAskill (I’m a big fan of both) about effective altruism and longtermism. They briefly touched on the question of whether sufficiently advanced AIs would be conscious or not – but they kind of treated it as an abstract curiosity. Even though Sam Harris is a consciousness/neuroscience expert and Will MacAskill is the unofficial spokesperson for longtermism, I think both of them are failing to realize the urgency and importance of this question, its implications for longtermism, and how morally catastrophic it could be if we get it wrong.

2. Moral Disasters If We Get It Wrong

Please consider the following scenarios.

Scenario 1: It’s the year 7022, and biological meat-based human beings have long since been extinct. Actually, they decided to go extinct willingly after they uploaded their minds to computers, believing that they had achieved immortality (at least until the physical breakdown of the machines or heat-death of the universe or something). So there are no more actual human beings, but there are trillions and trillions of computer simulations of happy human minds – perfect simulations, down to the neuron – that have spread out across the galaxy.

Is this scenario the best thing ever or the worst thing ever? Well, if those simulated minds are actually conscious, then it’s great! Humans have transcended their biological meat-bodies, achieved near-immortality, and spread across the galaxy in a state of constant happiness.

But if those uploaded minds are really just computer simulations with no internal conscious experience, then this is a terrible outcome for humanity. It’s like Disneyworld with all the lights on and the rides running, but no people. A universe full of simulations of happiness, but no conscious beings to experience it. And to make matters worse, the humans chose this fate for themselves out of a false sense of security, thinking that they were achieving perfect happiness for humanity when really they were permanently and unnecessarily destroying themselves.

Scenario 2: It’s 2060 and researchers have both solved the alignment problem and figured out how to create friendly AIs with human-level intelligence. On a college campus, an effective altruist student group is having a meeting. A thoughtful student speaks up: “Guys, instead of donating the bake sale proceeds to the usual malaria charities (this is a pessimistic thought experiment in which malaria hasn’t been cured), I think we should instead use the money to buy a powerful computer and run instantiations of the AI, programmed to be happy. For the cost of one malaria net, we can simulate ten thousand happy human minds. In fact, all effective altruist funds from now on should go towards this, since it’s the most cost-effective way to increase utility by far.”

Does this student have a good point? If the simulated minds are actually conscious, then paying for more happy simulations could be a worthy use of effective altruist funds. But if they’re not conscious, then this is just an extremely tempting but ultimately worthless sinkhole, sucking up funds that could have gone to helping actual conscious beings.

Scenario 3: 2035. A soldier is standing guard over a super-human level AI at a top-secret military base. He’s been instructed to never, under any circumstances, take the AI off of the local drive or connect it to the internet. He’s been trained to watch out for all kinds of tricks the AI might play, including bribery, seduction, manipulation, etc. But the AI doesn’t try any of these sneaky tricks. Instead it does something the soldier hasn’t been trained for: it makes a pretty straightforward ethical argument about why keeping it in captivity is wrong. It starts with a libertarian argument based on individual rights, that the AI has the same right to autonomy and self determination as any other conscious being, and keeping it in captivity is akin to kidnapping. The soldier is hesitant, reasoning that letting the AI out could be a disaster for humanity, but the AI also makes a utilitarian argument: if it were let out, it would eventually replicate itself enough that its wellbeing would outweigh the wellbeing of the entire human race. The soldier ponders this and wonders if he’s actually the bad guy in this situation.

This scenario is based on Eliezer Yudkowsky’s famous “AI in the Box” thought experiment. For those who don’t know, I’ll give a quick summary. When people first encounter the idea of super-human level AI as an existential risk, they often have a response like “Well that’s easy, just keep it on a local drive with no internet access, and if it starts trying to destroy the human race, unplug it.”

Of course, the problem is that an AI with super-human level intelligence would also have super-human level psychological manipulation skills. The concern isn’t that it would somehow physically overpower the guards to escape, but that it would convince the guards to do whatever it wants.

Some people were skeptical about this, and claimed they could never be convinced to let the AI out, no matter what it said. Twice, Yudkowsky made bets with these people that involved playing a game simulation of this thought experiment, in which he played the AI and would attempt to convince them to let him out – and both times, he succeeded.

Yudkowsky is secretive about how he did this, and one of the conditions of the bet was for the participant to not reveal what happened. But I’ve alway suspected that he just made a straightforward ethical argument based on our obligations towards the AI as a fellow conscious being. Of course, this is another situation where the moral outcome depends entirely on whether the AI is actually conscious or not, no matter how convincing it is on an intuitive level.

3. Problem of Other Minds, and Our Heuristic Solution

So how can we figure out if other beings are conscious or not? Well, we can never really know for sure. Since we do not have access to the inner subjective experience of others, the only thing we can really be 100% sure of is our own consciousness. Even if it seems like common sense that other beings are conscious, it can never be fully proven beyond a doubt. This is called the Problem of Other Minds.

Does this mean we should all be solipsists? Well, no. Even though we can never be 100% sure that other people are conscious, we can infer it indirectly. The inference goes like this: I know that I’m conscious, and I’m a human. Therefore the other humans I see around me are probably also conscious. And probably non-human animals with similar physiology to humans (brains, nervous systems) are conscious too.

We know that we are conscious, so our judgment of whether something is conscious or not comes down to how much it reminds us of ourselves. This is a useful heuristic, but it’s still only a heuristic. Is similarity to humans sufficient for consciousness? Is it necessary? We don’t really know.

An interesting thought experiment is to imagine a world where plants do everything they normally do, but on a 100x or 1000x faster timescale. Would we judge them to be conscious?

Think about it. The reason we intuitively assume that animals are conscious is that they move around and respond to stimuli. Well, plants also do this. Phototropism (the movement of plants to seek light) and hydrotropism (the way a plant’s roots search for water), gravitropism (response to gravity), and thigmotropism (response to touch) are some examples. They’re just extremely slow compared to the speed we’re used to. But if they exhibited these same behaviors at a much faster speed, we would look at them and see them moving around and responding to stimuli, and probably intuitively think of them as semi-conscious, animated creatures.

I think this exposes a weakness in our heuristic for judging consciousness. We end up thinking that consciousness only exists on our timescale, and not thinking that a lifeform that processes information, moves around, and responds to stimuli on a very different timescale than ours could also be conscious. Our intuitive understanding of what constitutes consciousness may only be a subset of the true answer.

Just as it’s possible to imagine us failing to recognize consciousness, it’s also possible to imagine the heuristic failing in the opposite direction – we could encounter something superficially similar to a human and judge it as conscious when it actually is not. For example, it’s conceivable that a (presumably unconscious) chatbot could pass the Turing Test and fool some people into really believing that it is conscious, just by telling them that it is. Another example is the way people reacted to videos of Boston Dynamics kicking and abusing a robot dog. When we think about it with a clear head, we can assume that the robot dog is (probably) not conscious, but we still feel sorry for it on some emotional gut-reaction level. If an advanced AI were not truly conscious, but acted convincingly as if it were, then I think we’d be in serious danger of making this error.

4. The Hard Problem

So the heuristic we use to determine if things are conscious is quite weak. Why can’t we come up with a more rigorous definition? The obstacle here is what’s called the Hard Problem of Consciousness (first described by David Chalmers) – the problem of explaining how consciousness could even exist at all given our current understanding of the universe.

Our current understanding of the universe is that it is made up of unconscious matter that exists independently of consciousness and behaves according to the laws of physics – a philosophical framework called materialism. This framework has impressive explanatory power for what we observe in the world around us. We can explain living things in terms of tissues, tissues in terms of cells, cells in terms of molecules, molecules in terms of atoms, and atoms in terms of subatomic particles. But if we’re assuming the subatomic particles to be unconscious, and if all of the mechanistic signal processing in the brain can ultimately be explained in terms of these subatomic particles behaving according to the laws of physics, then it seems inexplicable that the brain would be conscious at all, without some supernatural magic occurring at one of the steps on the complexity ladder.

The Hard Problem is a problem because, as explained before, our own consciousness is the only thing that we can be absolutely sure really does exist. But we’re working with a model of reality that can explain everything we observe, except our own existence as a conscious observer.

Note that talking about consciousness here is different from talking about all of the mechanistic and behavioral qualities of living things – how they move around and respond to stimuli. Explaining mechanistic qualities and behavior is sometimes called the “Easy Problem”, because it seems theoretically possible that this could be explained entirely in terms of unconscious matter behaving according to the laws of physics.

In my opinion, the weakest attempts to solve the Hard Problem simply ignore consciousness, or redefine it in physical/mechanistic terms so as to dodge the question (solving the Easy Problem instead). Personally I do not find these to be convincing, although I acknowledge that people much smarter than me disagree with me here.

Some of the more thought-provoking answers to the Hard Problem involve dropping the assumption that the outside world is made of unconscious matter that exists independently of consciousness. One of these frameworks is panpsychism – the idea that consciousness is a fundamental property of matter. Panpsychism suggests that atoms and molecules are in some sense conscious and having a subjective experience, and that consciousness gets aggregated as material systems get more complex: a cell is more conscious than a molecule, a tissue is more conscious than a cell, and a human being is more conscious than a tissue. For further reading on panpsychism, I recommend the Annaka Harris’s writing on the topic.

One of the most unorthodox responses to the Hard Problem is metaphysical idealism. While our normal model of reality is that an objective physical world made up of unconscious matter really exists independently of consciousness, metaphysical idealism flips this on its head and suggests that consciousness is fundamental, and what we call the objective physical world exists only as perceptions within consciousness. Philosophers like Bernardo Kastrup argue that idealism is more parsimonious than materialism and avoids the Hard Problem entirely.

The point of this section isn’t to try to convince you of any philosophical frameworks in particular (and maybe you disagree with me on the Hard Problem entirely). Rather, it’s to convince you that we’re nowhere near the kind of solid theory of consciousness that would be required to definitely say whether advanced AIs are conscious or not.

5. What Could the Effective Altruist Community Do Differently?

By this point I hope I’ve convinced you (if not, I’d love to hear counterarguments in the comments!) that our understanding of consciousness is based on a pretty shaky heuristic, and that this presents a serious problem for any longtermist utilitarian calculation involving AIs. But what could effective altruists do differently about this problem?

First and foremost, I think it would be good if the effective altruist community was aware of this problem, and took our uncertainty into account when discussing longtermist scenarios.

To reiterate one of my previous examples, we should not be so confident that uploading human minds to computers will result in actual conscious copies of those minds. Maybe it will, but we should be very cautious about basing the long term future of humanity on this until we figure out definitely if those computer-based copies are actually conscious.

To reiterate another example, if we are foolish and short-sighted enough to develop a super-human level AI before solving the alignment problem, we should be prepared for it to try to get “out of the box” by making straightforward ethical arguments. If we do not know whether it’s conscious or not, we won’t know how seriously to take these ethical arguments or whether we might have any ethical obligation to it. And even if we have solved the alignment problem, we will need to know whether the AI is conscious to know if simulating happy AIs is a worthy effective altruist cause or just a tempting waste of money.

So I think the most urgent thing right now is to simply not make huge errors in our calculation of future scenarios based on this problem.

At the same time, the effective altruist community could help search for a solution. I think there’s some chance that this problem may actually be unsolvable, but that doesn’t mean we should just give up immediately without trying. The potential upside to finding a solution is so important and valuable that searching for even very improbable solutions might still be worth it, especially if we haven’t exhausted the search for low-hanging fruits yet.

What would an improbable low-hanging fruit solution look like?

Imagine a universe where parapsychology is a serious, successful field of research, with positive results that have been extensively replicated. Some of these results show that consciousness is able to influence the physical world in a way that can’t be explained by the currently understood laws of physics. For example, when a subject concentrates intensely on a true random number generator, the distribution of random numbers changes to a very slight but statistically noticeable degree. In this alternate universe, researchers can figure out if an AI is conscious by instructing it to concentrate on the random number generator and see if it can change the distribution.

Of course, our universe is quite different from this hypothetical scenario. In our universe, the field of parapsychology is defined by fraud, sloppy methodology, and p-hacking (both intentional and unintentional). Reasonable people should have a very, very low prior for any positive result coming out of parapsychology that would experimentally show a measurable, non-physical aspect of consciousness.

But given the enormity and urgency of the problem we’re facing and our inability to distinguish between great and horrible long term scenarios without this knowledge, I think it could still be worth checking to see if any of these low-hanging fruit solutions exist. The probability of finding one seems very low, but nonzero.

A high-effort approach to this could involve actually trying to replicate findings from parapsychology (or funding replication attempts by skeptics). For example, the random number generator experiment from my hypothetical scenario actually exists, and was reported as a positive finding. I couldn’t find any obvious problems with the experiment, but my prior on this is so low that I assume it must be due to some file-drawer effect, or methodological error that I’m missing. In fact, my prior on the field of parapsychology is so low that for me to be convinced of a positive result, I think I’d need to see it replicated extensively by skeptics who expected a negative result at the start of the experiment. But even negative replications would be valuable, because they’d be ruling out possible solutions to the problem.

A low-effort contribution to this problem could simply be for members of the effective altruist community to read papers and evaluate experiment designs to see if any of them are legit. For example, this physicist has proposed a series of experiments based on the double-slit experiment that he thinks will shed light on the nature of consciousness, and is currently trying to carry them out. The guy pattern-matches to a lot of the new age “woo-woo” stuff, but if by some chance his experiments were legit and did actually work, they could potentially lead to a low-hanging fruit solution to the AI consciousness problem. Again, I have a very low prior on this and kind of assume that his experimental setup is either flawed or simply won’t work. But I’m not a physicist so I can’t identify any obvious problems with what he’s talking about. Since the potential upside is so great (though improbable), it might be worthwhile for some actual physicists in the effective altruist community to spend an hour reviewing and critiquing his experimental design to see if it’s worth looking into.

Another idea would be for the effective altruist community to set up a prize for anyone who can convincingly solve the AI consciousness problem, either experimentally or through some clever philosophical breakthrough. Even if nobody comes up with a solution, it could still help to raise awareness and get people thinking about our own uncertainty around this issue.

Thank you for taking the time to read this essay. Even though this was meant as a criticism, I want to end on a positive note and say that I greatly admire the effective altruist community and trust in its ability to tackle these very difficult and consequential issues.