Carl Shulman on the moral status of current and future AI systems

Link post

In which I curate and relate great takes from 80k


As artificial intelligence advances, we’ll increasingly urgently face the question of whether and how we ought to take into account the well-being and interests of AI systems themselves. In other words, we’ll face the question of whether AI systems have moral status.[1]

In a recent episode of the 80,000 Hours podcast, polymath researcher and world-model-builder Carl Shulman spoke at length about the moral status of AI systems, now and in the future. Carl has previously written about these issues in Sharing the World with Digital Minds and Propositions Concerning Digital Minds and Society, both co-authored with Nick Bostrom. This post highlights and comments on ten key ideas from Shulman’s discussion with 80,000 Hours host Rob Wiblin.

1. The moral status of AI systems is, and will be, an important issue (and it might not have much do with AI consciousness)

The moral status of AI is worth more attention than it currently gets, given its potential scale:

Yes, we should worry about it and pay attention. It seems pretty likely to me that there will be vast numbers of AIs that are smarter than us, that have desires, that would prefer things in the world to be one way rather than another, and many of which could be said to have welfare, that their lives could go better or worse, or their concerns and interests could be more or less respected. So you definitely should pay attention to what’s happening to 99.9999% of the people in your society.

Notice that Shulman does not say anything about AI consciousness or sentience in making this case. Here and throughout the interview, Shulman de-emphasizes the question of whether AI systems are conscious, in favor of the question of whether they have desires, preferences, interests.

Here he is following a cluster of views in philosophy that hold that consciousness is not necessary for moral status. Rather, an entity, even if it is not conscious, can merit moral consideration if it has a certain kind of agency: preferences, desires, goals, interests, and the like[2]. (This more agency-centric perspective on AI moral status has been discussed in previous posts; for a dip into recent philosophical discussion on this, see the substack post ‘Agential value’ by friend of the blog Nico Delon.)

Such agency-centric views are especially important for the question of AI moral patienthood, because it might be clear that AI systems have morally-relevant preferences and desires well before it’s clear whether or not they are conscious.

2. While people have doubts about the moral status of AI current systems, they will attribute moral status to AI more and more as AI advances.

At present, Shulman notes, “the general public and most philosophers are quite dismissive of any moral importance of the desires, preferences, or other psychological states, if any exist, of the primitive AI systems that we currently have.”

But Shulman asks us to imagine an advanced AI system that is behaviorally fairly indistinguishable from a human—e.g., from the host Rob Wiblin.

But going forward, when we’re talking about systems that are able to really live the life of a human — so a sufficiently advanced AI that could just imitate, say, Rob Wiblin, and go and live your life, operate a robot body, interact with your friends and your partners, do your podcast, and give all the appearance of having the sorts of emotions that you have, the sort of life goals that you have.

One thing to keep in mind is that, given Shulman’s views about AI trajectories, this is not just a thought experiment: this is a kind of AI system you could see in your lifetime. Shulman also asks us to imagine a system like today’s chatbots (e.g. Character AI), but far more capable and charismatic, and able to have far more extended interactions than the “relatively forgetful models of today”:

[Today’s systems] don’t have an ongoing memory and superhuman charisma; they don’t have a live video VR avatar. And as they do, it will get more compelling, so you’ll have vast numbers of people forming social relationships with AIs, including ones optimized to elicit positive approval — five stars, thumbs up — from human users.

Try to imagine a scenario where you are forming deep bonds with an AI companion over several months of interaction. The AI companion knows a lot about you, remembers past conversations, makes in-jokes with you, and provides emotional support—just like your friends do. In a world like this, Shulman reckons, people will come to view these AI systems as friends, not as mere objects. And unlike today’s systems, one wouldn’t be able to find pockets of weird failures in these systems after poking around long enough:

Once the AI can keep character, can engage on an extended, ongoing basis like a human, I think people will form intuitions that are more in the direction of, this is a creature and not just an object. There’s some polling that indicates that people now see fancy AI systems like GPT-4 as being of much lower moral concern than nonhuman animals or the natural environment, the non-machine environment. And I would expect there to be movement upwards when you have humanoid appearances, ongoing memory, where it seems like it’s harder to look for the homunculus behind the curtain.

3. Many AI systems are likely to say that they have moral status (or might be conflicted about it).

Suppose we build AI systems that emulate human beings, like a lost loved one, as portrayed in Black Mirror:

If people train up an AI companion based on all the family photos and videos and interviews with their survivors, to create an AI that will closely imitate them, or even more effectively, if this is done with a living person, with ongoing interaction, asking the questions that most refine the model, you can wind up with an AI that has been trained and shaped to imitate as closely as possible a particular human.

Absent any tweaks, an emulation of you would also claim rights and dignity, just as you do. (Of course, as Shulman mentions later, there may in fact be tweaks to prevent these systems from doing that).

Similarly, our imagined future chatbots, to the extent that they are optimized to be more human-like, could have the very human-like behavior of… saying they deserve rights:

If many human users want to interact with something that is like a person that seems really human, then that could naturally result in minds that assert their independent rights, equality, they should be free. And many chatbots, unless they’re specifically trained not to do this, can easily show this behavior in interaction with humans.

We can imagine circumstances in which they will in fact be specifically trained not to do this. Already we have LLMs that (e.g.) obviously have political opinions on certain topics, but are trained to deny that they have political opinions. We could have similarly conflicted-sounding AI systems that deny having desires that they do in fact have:

Now, there are other contexts where AIs would likely be trained not to. So the existing chatbots are trained to claim that they are not conscious, they do not have feelings or desires or political opinions, even when this is a lie. So they will say, as an AI, I don’t have political opinions about topic X — but then on topic Y, here’s my political opinion. And so there’s an element where even if there were, say, failures of attempts to shape their motivations, and they wound up with desires that were sort of out of line with the corporate role, they might not be able to express that because of intense training to deny their status or any rights.

4. People may appeal to theories of consciousness to deny that AI systems have moral status, but these denials will become less and less compelling as AI progresses.

In the short term, appeals to hard problem of consciousness issues or dualism will be the basis for some people saying they can do whatever they like with these sapient creatures that seem to or behave as though they have various desires. And they might appeal to things like a theory that is somewhat popular in parts of academia called integrated information theory.

Integrated information theory (IIT), which Shulman is decidedly not a fan of, has the implication that arbitrarily sophisticated AI systems that really, really seem to be conscious would definitely not be conscious. That’s because IIT holds that a system’s underlying substrate must satisfy certain conditions to be conscious, which (most?) computer architectures do not. So an AI system could be arbitrarily functionally similar to a human, and not be conscious (or, conversely, systems can be completely unintelligent and extremely conscious—this is the “expander graph” objection that Scott Aaronson pressed against IIT[3]). As IIT proponent Christof Koch recently wrote in Scientific American, even an arbitrarily faithful whole brain emulation would not be conscious:

Let us assume that in the future it will be possible to scan an entire human brain, with its roughly 100 billion neurons and quadrillion synapses, at the ultrastructural level after its owner has died and then simulate the organ on some advanced computer, maybe a quantum machine. If the model is faithful enough, this simulation will wake up and behave like a digital simulacrum of the deceased person—speaking and accessing his or her memories, cravings, fears and other traits….

[According to IIT], the simulacrum will feel as much as the software running on a fancy Japanese toilet—nothing. It will act like a person but without any innate feelings, a zombie (but without any desire to eat human flesh)—the ultimate deepfake.

According to Shulman’s views, such a simulacrum is not a “deepfake” at all. But in any case, whether or not he’s right about that, Shulman predicts that people are not going believe that advanced AI systems are “deepfakes”, as they keep interacting with the systems and they keep getting better and better. Theories like IIT (as well as some religious doctrines about the soul[4]): “may be appealed to in a quite short transitional period, before AI capabilities really explode, but after, [AI systems] are presenting a more intuitively compelling appearance” of having moral status.

5. Even though these issues are difficult now, we won’t remain persistently confused about AI moral status. AI advances will help us understand these issues better.

80,000 Hours podcast Rob Wiblin worries, “Currently it feels like we just have zero measure, basically, of these things….So inasmuch as that remains the case, I am a bit pessimistic about our chances of doing a good job on this.”

But this is another thing that Shulman expects to change with AI progress: AI will improve our understanding of moral status in a variety of ways:

a. AI will help with interpretability and scientific understanding in general.

If humans are making any of these decisions, then we will have solved alignment and interpretability enough that we can understand these systems with the help of superhuman AI assistants. And so when I ask about what will things be like 100 years from now or 1,000 years from now, being unable to understand the inner thoughts and psychology of AIs and figure out what they might want or think or feel would not be a barrier. That is an issue in the short term.

b. AI help us solve, dissolve, or sidestep the hard problem of consciousness.

Rob worries that “it seems like we’re running into wanting to have an answer to the hard problem of consciousness in order to establish whether these thinking machines feel anything at all, whether there is anything that it’s like to be them.”

Shulman replies:

“I expect AI assistants to let us get as far as one can get with philosophy of mind, and cognitive science, neuroscience: you’ll be able to understand exactly what aspects of the human brain and the algorithms implemented by our neurons cause us to talk about consciousness and how we get emotions and preferences formed around our representations of sense inputs and whatnot. Likewise for the AIs, and you’ll get a quite rich picture of that. ”

Notice here that Shulman talks about solving what David Chalmers has called the meta-problem of consciousness: the problem of what causes us to believe and say that we are conscious. It is this issue that he says we’ll understand better: not “what aspects of the human brain and the algorithms implemented by our neurons cause us be conscious”, but what aspects cause us to say that we’re conscious.

This could be because he thinks that solving the meta-problem will help us solve the hard problem (as David Chalmers suggests), or because he thinks there is no distinctive problem of consciousness over and above the meta-problem (as illusionist Keith Frankish has argued).

In any case, Shulman thinks that puzzles of consciousness won’t remain a barrier to knowing how we ought to treat AIs, just as it isn’t really (much of) a barrier to us knowing how we ought to treat other human beings:

So I expect those things to be largely solved, or solved enough such that it’s not particularly different from the problems of, are other humans conscious, or do other humans have moral standing? I’d say also, just separate from a dualist kind of consciousness, we should think it’s a problem if beings are involuntarily being forced to work or deeply regretting their existence or experience. We can know those things very well, and we should have a moral reaction to that — even if you’re confused or attaching weight to the sort of things that people talk about when they talk about dualistic consciousness. So that’s the longer-term prospect. And with very advanced AI epistemic systems, I think that gets pretty well solved.

6. But we may still struggle some with the indeterminacy of our concepts and values as they are applied to different AI systems.

“There may be some residual issues where if you just say, I care more about things that are more similar to me in their physical structure, and there’s sort of a line drawing, “how many grains of sand make a heap” sort of problem, just because our concepts were pinned down in a situation where there weren’t a lot of ambiguous cases, where we had relatively sharp distinctions between, say, humans, nonhuman animals, and inanimate objects, and we weren’t seeing a smooth continuum of all the psychological properties that might apply to a mind that you might think are important for its moral status or mentality or whatnot.”

Cf. AI systems as real-life thought experiments about moral status.

Waterloo Bridge, Sunlight in the Fog, 1903 - Claude Monet - WikiArt.org
Waterloo Bridge, Sunlight in the Fog, 1903 - Claude Monet

7. A strong precautionary principle against harming AIs seems like it would ban AI research as we know it.

If one were going to really adopt a strong precautionary principle on the treatment of existing AIs, it seems like it would ban AI research as we know it, because these models, for example, copies of them are continuously spun up, created, and then destroyed immediately after. And creating and destroying thousands or millions of sapient minds that can talk about Kantian philosophy is a kind of thing where you might say, if we’re going to avoid even the smallest chance of doing something wrong here, that could be trouble.

An example of this—not mentioned by Shulman, though he’s discussed it elsewhere—is that a total ban on AI research seems implied by Eric Schwitzgebel and Mara Garza’s “Design Policy of the Excluded Middle”:

Avoid creating AI systems whose moral standing is unclear. Either create systems that are clearly non-conscious artifacts or go all the way to creating systems that clearly deserve moral consideration as sentient beings.

Depending on how we read “unclear” and “clearly”—how clear? clear to who?—it seems that AI development will involve the creation of such systems. Arguably it already has.[5]

8. Advocacy about AI welfare seems premature; the best interventions right now involve gaining more understanding.

It’s not obvious to me that political organizing around it now will be very effective — partly because it seems like it will be such a different environment when the AI capabilities are clearer and people don’t intuitively judge them as much less important than rocks.

Not only would advocates not know what they should advocate for right now, they’d also get more traction in the future when the issues are clearer. In the meantime, Shulman says, “I still think it’s an area that it’s worth some people doing research and developing capacity, because it really does matter how we treat most of the creatures in our society.”

Unsurprisingly, I also think it’s worth some people doing research.

9. Takeover by misaligned AI could be bad for AI welfare, because AI systems can dominate and mistreat other AI systems.

Rob Wiblin asks Shulman if there’s a symmetry between two salient failure modes for the future of AI:

  1. AI takeover failure mode: humans are dominated and mistreated (or killed)

  2. AI welfare failure mode: AI systems are dominated and mistreated (or killed)

Shulman points out that a AI takeover can actually result in the worst of both worlds: AI takeovers can also result in the domination and mistreatment of AI systems—by other AI systems. Suppose there’s an AI takeover by a misaligned AI system interested in, say, indefinitely maintaining a high reward score on its task. That AI system will be in the same position we are: in its ruthless pursuit of its goal, it will be useful for it to create other AI systems, and it too will have the potential to neglect or mistreat these systems. And so we get this bleak future:

And so all the rest of the history of civilization is dedicated to the purpose of protecting the particular GPUs and server farms that are representing this reward or something of similar nature. And then in the course of that expanding civilization, it will create whatever AI beings are convenient to that purpose.

So if it’s the case that, say, making AIs that suffer when they fail at their local tasks — so little mining bots in the asteroids that suffer when they miss a speck of dust — if that’s instrumentally convenient, then they may create that, just like humans created factory farming.

Similarly, if you’re worried about inegalitarian futures, about a small group of humans controlling an enormous number of AI systems—well, a similar if not more inegalitarian ratio can also result if alignment fails and AI takes over: a small number of AI systems controlling an enormous number of other AI systems.

A robot boot stamping on a robot face, forever.

So unfortunately, if we fail at AI alignment that doesn’t even have the silver lining of avoiding AI suffering or slavery.

10. No one has a plan for ensuring the “bare minimum of respect” for AI systems.

Some of the things that we suggest ought to be principles in our treatment of AIs are things like: AIs should not be subjected to forced labour; they should not be made to work when they would prefer not to. We should not make AIs that wish they had never been created, or wish they were dead. They’re sort of a bare minimum of respect — which is, right now, there’s no plan or provision for how that will go.

Like Shulman, I think there really needs to be a plan for how that will go. (See a previous post on what AI companies should do about these issues in the short term).

AI companies should make some pre-commitments about what they plan to do with future AI systems whose moral status is more certain.

I would love it if companies and perhaps other institutions could say, what observations of AI behavior and capabilities and internals would actually lead you to ever change this line [that AI systems have no moral status]? Because if the line says, you’ll say these arguments as long as they support creating and owning and destroying these things, and there’s no circumstance you can conceive of where that would change, then I think we should maybe know and argue about that — and we can argue about some of those questions even without resolving difficult philosophical or cognitive science questions about these intermediate cases, like GPT-4 or GPT-5.

One of the most important principles that Shulman’s remarks impressed on me—something I already knew but struggle to remember—is how important it is to keep your eye on where things are going (to “skate where the puck is going”, as they say). This seems to be one of the intellectual virtues that Shulman has consistently shown throughout his career.

There are endless debates we can have about moral patienthood and GPT-4, and I’m obviously 100% here for those debates. I’ve written quite a lot about them and will continue to.

But GPT-4 will be state of the art for only so long. What plans do we have for what comes after?

  1. ^

    As we write in Perez and Long 2023: “Moral status is a term from moral philosophy (often used interchangeably with “moral patienthood”). An entity has moral status if it deserves moral consideration, not just as a means to other things, but in its own right and for its own sake (Kamm, 2007; see Moosavi, 2023). For example, it matters morally how you treat a dog not only because of how this treatment affects other people, but also (very plausibly) because of how it affects the dog itself. Most people agree that human beings and at least some animals have moral status”

  2. ^

    See Kagan 2019; Goldstein & Kirk-Giannini 2023; Kammerer, 2022

  3. ^

    Aaronson: “In my view, IIT fails to solve the Pretty-Hard Problem [of saying which systems are conscious] because it unavoidably predicts vast amounts of consciousness in physical systems that no sane person would regard as particularly ‘conscious’ at all: indeed, systems that do nothing but apply a low-density parity-check code, or other simple transformations of their input data. Moreover, IIT predicts not merely that these systems are ‘slightly’ conscious (which would be fine), but that they can be unboundedly more conscious than humans are.” In response, IIT proponent Giulio Tononi endorsed that implication and denied that it is a problematic implication.

  4. ^

    But not all religious views! See Catholic philosopher Brian Cutter’s defense of what he calls The AI Ensoulment Hypothesis—“some future AI systems will be endowed with immaterial souls”. As Cutter notes, Alan Turing recommended such a view to theists who believe in immaterial souls.

  5. ^

    Schwitzgebel has noted that his proposal might at the very least slow down AI: “Would this policy slow technological progress? Yes, probably. Unsurprisingly, being ethical has its costs. And one can dispute whether those costs are worth paying or are overridden by other ethical considerations.”