Research Fellow at the Center for AI Safety
rgb
Lucius Caviola’s post mentioned the “happy servants” problem:
If AIs have moral patienthood but don’t desire autonomy, certain interpretations of utilitarian theories would consider it morally justified to keep them captive. After all, they would be happy to be our servants. However, according to various non-utilitarian moral views, it would be immoral to create “happy servant” AIs that lack a desire for autonomy and self-respect (Bales, 2024; Schwitzgebel & Garza, 2015). As an intuition pump, imagine we genetically engineered a group of humans with the desire to be our servants. Even if they were happy, it would feel wrong.
This issue also mentioned as a key research question in Digital Minds: Importance and Key Research Questions by Mogensen, Saad, and Butlin.
This is just a note to flag that there’s also some discussion of this issue in Carl Shulman’s recent 80,000 podcast episode. (cf. also my post about that episode.)
Rob Wiblin: Yeah. The idea of training a thinking machine to just want to take care of you and to serve your every whim, on the one hand, that sounds a lot better than the alternative. On the other hand, it does feel a little bit uncomfortable. There’s that famous example, the famous story of the pig that wants to be eaten, where they’ve bred a pig that really wants to be farmed and consumed by human beings. This is not quite the same, but I think raises some of the same discomfort that I imagine people might have at the prospect of creating beings that enjoy subservience to them, basically. To what extent do you think that discomfort is justified?
Carl Shulman: So the philosopher Eric Schwitzgebel has a few papers on this subject with various coauthors, and covers that kind of case. He has a vignette, “Passion of the Sun Probe,” where there’s an AI placed in a probe designed to descend into the sun and send back telemetry data, and then there has to be an AI present in order to do some of the local scientific optimisation. And it’s made such that, as it comes into existence, it absolutely loves achieving this mission and thinks this is an incredibly valuable thing that is well worth sacrificing its existence.
And Schwitzgebel finds that his intuitions are sort of torn in that case, because we might well think it sort of heroic if you had some human astronaut who was willing to sacrifice their life for science, and think this is achieving a goal that is objectively worthy and good. And then if it was instead the same sort of thing, say, in a robot soldier or a personal robot that sacrifices its life with certainty to divert some danger that maybe had a 1-in-1,000 chance of killing some human that it was protecting. Now, that actually might not be so bad if the AI was backed up, and valued its backup equally, and didn’t have qualms about personal identity: to what extent does your backup carry on the things you care about in survival, and those sorts of things.
There’s this aspect of, do the AIs pursue certain kinds of selfish interests that humans have as much as we would? And then there’s a separate issue about relationships of domination, where you could be concerned that, maybe if it was legitimate to have Sun Probe, and maybe legitimate to, say, create minds that then try and earn money and do good with it, and then some of the jobs that they take are risky and whatnot. But you could think that having some of these sapient beings being the property of other beings, which is the current legal setup for AI — which is a scary default to have — that’s a relationship of domination. And even if it is consensual, if it is consensual by way of manufactured consent, then it may not be wrong to have some sorts of consensual interaction, but can be wrong to set up the mind in the first place so that it has those desires.
And Schwitzgebel has this intuition that if you’re making a sapient creature, it’s important that it wants to survive individually and not sacrifice its life easily, that it has maybe a certain kind of dignity. So humans, because of our evolutionary history, we value status to differing degrees: some people are really status hungry, others not as much. And we value our lives very much: if we die, there’s no replacing that reproductive capacity very easily.
There are other animal species that are pretty different from that. So there are solitary species that would not be interested in social status in the same kind of way. There are social insects where you have sterile drones that eagerly enough sacrifice themselves to advance the interests of their extended family.
Because of our evolutionary history, we have these concerns ourselves, and then we generalise them into moral principles. So we would therefore want any other creatures to share our same interest in status and dignity, and then to have that status and dignity. And being one among thousands of AI minions of an individual human sort offends that too much, or it’s too inegalitarian. And then maybe it could be OK to be a more autonomous, independent agent that does some of those same functions. But yeah, this is the kind of issue that would have to be addressed.
Rob Wiblin: What does Schwitzgebel think of pet dogs, and our breeding of loyal, friendly dogs?
Carl Shulman: Actually, in his engagement with another philosopher, Steve Petersen — who takes the contrary position that it can be OK to create AIs that wish to serve the interests or objectives that their creators intended — does raise the example of a sheepdog really loves herding. It’s quite happy herding. It’s wrong to prevent the sheepdog from getting a chance to herd. I think that’s animal abuse, to always keep them inside or not give them anything that they can run circles around and collect into clumps. And so if you’re objecting with the sheepdog, it’s got to be not that it’s wrong for the sheepdog to herd, but it’s wrong to make the sheepdog so that it needs and wants to herd.
And I think this kind of case does make me suspect that Schwitzgebel’s position is maybe too parochial. A lot of our deep desires exist for particular biological reasons. So we have our desires about food and external temperature that are pretty intrinsic. Our nervous systems are adjusted until our behaviours are such that it keeps our predicted skin temperature within a certain range; it keeps predicted food in the stomach within a certain range.
And we could probably get along OK without those innate desires, and then do them instrumentally in service to some other things, if we had enough knowledge and sophistication. The attachment to those in particular seems not so clear. Status, again: some people are sort of power hungry and love status; others are very humble. It’s not obvious that’s such a terrible state. And then on the front of survival that’s addressed in the Sun Probe case and some of Schwitzgebel’s other cases: if minds that are backed up, the position that having all of my memories and emotions and whatnot preserved less a few moments of recent experience, that’s pretty good to carry on, that seems like a fairly substantial point. And the point that the loss of a life that is quickly physically replaced, that it’s pretty essential to the badness there, that the person in question wanted to live, right?
Rob Wiblin: Right. Yeah.
Carl Shulman: These are fraught issues, and I think that there are reasons for us to want to be paternalistic in the sense of pushing that AIs have certain desires, and that some desires we can instil that might be convenient could be wrong. An example of that, I think, would be you could imagine creating an AI such that it willingly seeks out painful experiences. This is actually similar to a Derek Parfit case. So where parts of the mind, maybe short-term processes, are strongly opposed to the experience that it’s undergoing, while other processes that are overall steering the show keep it committed to that.
And this is the reason why just consent, or even just political and legal rights, are not enough. Because you could give an AI self-ownership, you could give it the vote, you could give it government entitlements — but if it’s programmed such that any dollar that it receives, it sends back to the company that created it; and if it’s given the vote, it just votes however the company that created it would prefer, then these rights are just empty shells. And they also have the pernicious effect of empowering the creators to reshape society in whatever way that they wish. So you have to have additional requirements beyond just, is there consent?, when consent can be so easily manufactured for whatever.
True, I should have been more precise—by consciousness I meant phenomenal consciousness. On your (correct) point about Kammerer being open to consciousness more generally, here’s Kammerer (I’m sure he’s made this point elsewhere too):
Illusionists are not committed to the view that our introspective states (such as the phenomenal judgment “I am in pain”) do not reliably track any real and important psychological property. They simply deny that such properties are phenomenal, and that there is something it is like to instantiate them. Frankish suggests calling such properties “quasi-phenomenal properties” (Frankish 2016, p. 15)—purely physico-functional and non-phenomenal properties which are reliably tracked (but mischaracterized as phenomenal) by our introspective mechanisms. For the same reason (Frankish 2016, p. 21), illusionists are not committed to the view that a mature psychological science will not mention any form of consciousness beyond, for example, access-consciousness. After all, quasi-phenomenal consciousness may very well happen to have interesting distinctive features from the point of view of a psychologist.
But on your last sentence
He could still believe moral status should be about consciousness, just not phenomenal consciousness.
While that position is possible, Kammerer does make it clear that he does not hold it, and thinks it is untenable for similar reasons that he thinks moral status is not about phenomenal consciousness. (cf. p. 8)
It’s an excellent question! There are two ways to go here:
keep the liberal notion of preferences/desires, one that seems like it would apply to plants and bacteria, and conclude that moral patienthood is very widespread indeed. As you note, few people go for this view (I don’t either). But you can find people bumping up against this view:
Korsgaard: “The difference between the plant’s tropic responses and the animal’s action might even, ultimately, be a matter of degree. In that case, plants would be, in a very elementary sense, agents, and so might be said to have a final good.” (quoted in this lecture on moral patienthood by Peter Godfrey-Smith.
Think that for patienthood what’s required is a more more demanding notion of “preference”, such that plants don’t satisfy it but dogs and people do. And there are ways of making “preference” more demanding besides “conscious preference”. You might think that morally-relevant preferences/desires have to have some kind of complexity, or some kind of rational structure, or something like that. That’s of course quite hand-wavy—I don’t think anyone has a really satisfying account.
Here’s a remark from Francois Kammerer, who thinks that moral status cannot be about consciousness (which he thinks does not exist), argues that it should be about desire, and who lays out nicely the ‘scale’ of desires of various levels of demandingness:
On the one extreme, we can think of the most basic way of desiring: a creature can value negatively or positively certain state of affairs, grasped in the roughest way through some basic sensing system. On some views, entities as simple as bacteria can do that (Lyon & Kuchling, 2021). On the other hand, we can think of the most sophisticated ways of desiring. Creatures such as, at least, humans, can desire for a thing to thrive in what they take to be its own proper way to thrive and at the same time desire their own desire for this thing to thrive to persist – an attitude close to what Harry Frankfurt called “caring” (Frankfurt, 1988). Between the two, we intuitively admit that there is some kind of progressive and multidimensional scale of desires, which is normatively relevant – states of caring matter more than the most basic desires. When moving towards an ethic without sentience, we would be wise to ground our ethical system on conceptsthat we will treat as complex and degreed, and even more as “complexifiable” as the study of human, animal and artificial minds progresses.
small correction that Jonathan Birch is at LSE, not QMUL. Lars Chittka, the co-lead of the project, is at QUML
You’re correct, Fai—Jeff is not on a co-author on the paper. The other participants—Patrick Butlin, Yoshua Bengio, and Grace Lindsay—are.
What’s something about you that might surprise people who only know your public, “professional EA” persona?
I suggest that “why I don’t trust pseudonymous forecasters” would be a more appropriate title. When I saw the title I expected an argument that would apply to all/most forecasting, but this worry is only about a particular subset
Unsurprisingly, I agree with a lot of this! It’s nice to see these principles laid out clearly and concisely:
You write
AI welfare is potentially an extremely large-scale issue. In the same way that the invertebrate population is much larger than the vertebrate population at present, the digital population has the potential to be much larger than the biological population in the future.
Do you know of any work that estimates these sizes? There are various places that people have estimated the ‘size of the future’ including potential digital moral patients in the long run, but do you know of anything that estimates how many AI moral patients there could be by (say) 2030?
Hi Timothy! I agree with your main claim that “assumptions [about sentience] are often dubious as they are based on intuitions that might not necessarily ‘track’ sentience”, shaped as they are by potentially unreliable evolutionary and cultural factors. I also think it’s a very important point! I commend you for laying it out in a detailed way.
I’d like to offer a piece of constructive criticism if I may. I’d add more to the piece that answers, for the reader:
what kind of piece am I reading? What is going to happen in it?
why should I care about the central points? (as indicated, I think there are many reasons to care, and could name quite a few myself)
how does this piece relate to what other people say about this topic?
While getting ‘right to the point’ is a virtue, I feel like more framing and intro would make this piece more readable, and help prospective readers decide if it’s for them.
[meta-note: if other readers disagree, please do of course vote ‘disagree’ on this comment!]
Hi Brian! Thanks for your reply. I think you’re quite right to distinguish between your flavor of panpsychism and the flavor I was saying doesn’t entail much about LLMs. I’m going to update my comment above to make that clearer, and sorry for running together your view with those others.
Ah, thanks! Well, even if it wasn’t appropriately directed at your claim, I appreciate the opportunity to rant about how panpsychism (and related views) don’t entail AI sentience :)
The Brian Tomasik post you link to considers the view that fundamental physical operations may have moral weight (call this view “Physics Sentience”).
[Edit: see Tomasik’s comment below. What I say below is true of a different sort of Physics Sentience view like constitutive micropsychism, but not necessarily of Brian’s own view, which has somewhat different motivations and implications]
But even if true, [many versions of] Physics Sentience [but not necessarily Tomasik’s] doesn’t have straightforward implications about what high-level systems, like organisms and AI systems, also comprise a sentient subject of experience. Consider: a human being touching a stove is experiencing pain on Physics Sentience; but a pan touching a stove is not experiencing pain. On Physics Sentience, the pan is made up of sentient matter, but this doesn’t mean that the pan qua pan is also a moral patient, another subject of experience that will suffer if it touches the stove.
To apply this to the LLMs case:
-Physics Sentience will hold that the hardware on which LLMs run is sentient—after all, it’s a bunch of fundamental physical operations.
-But Physics Sentience will also hold that the hardware on which a giant lookup table is running is sentient, to the same extent and for the same reason.
-Physics Sentience is silent on whether there’s a difference between (1) and (2), in the way that there’s a difference between the human and the pan.
The same thing holds for other panpsychist views of consciousness, fwiw. Panpsychist views that hold that fundamental matter is consciousness don’t tell us anything, themselves, about what animals or AI systems are sentient. It just says they are made of conscious (or proto-conscious) matter.
I like it! I think one thing the post itself could have been clearer on is that reports could be indirect evidence for sentience, in that they are evidence of certain capabilities that are themselves evidence of sentience. To give an example (though it’s still abstract), the ability of LLMs to fluently mimic human speech —> evidence for capability C—> evidence for sentience. You can imagine the same thing for parrots: ability to say “I’m in pain”—> evidence of learning and memory —> evidence of sentience. But what they aren’t are reports of sentience.
so maybe at the beginning: aren’t “strong evidence” or “straightforward evidence”
Thanks for the comment. A couple replies:
I want to clarify that these are examples of self-reports about consciousness and not evidence of consciousness in humans.
Self-report is evidence of consciousness in Bayesian sense (and in common parlance): in a wide range of scenarios, if a human says they are conscious of something, you should have a higher credence than if they do not say they are. And in the scientific sense: it’s commonly and appropriately taken as evidence in scientific practice; here is Chalmers’s “How Can We Construct a Science of Consciousness?” on the practice of using self-reports to gather data about people’s conscious experiences:
Of course our access to this data depends on our making certain assumptions: in particular, the assumption that other subjects really are having conscious experiences, and that by and large their verbal reports reflect these conscious experiences. We cannot directly test this assumption; instead, it serves as a sort of background assumption for research in the field. But this situation is present throughout other areas of science. When physicists use perception to gather information about the external world, for example, they rely on the assumption that the external world exists, and that perception reflects the state of the external world. They cannot directly test this assumption; instead, it serves as a sort of background assumption for the whole field. Still, it seems a reasonable assumption to make, and it makes the science of physics possible. The same goes for our assumptions about the conscious experiences and verbal reports of others. These seem to be reasonable assumptions to make, and they make the science of consciousness possible .
It’s suppose it’s true that self-reports can’t budge someone from the hypothesis that other actual people are p-zombies, but few people (if any) think that. From the SEP:
Few people, if any, think zombies actually exist. But many hold that they are at least conceivable, and some that they are possible....The usual assumption is that none of us is actually a zombie, and that zombies cannot exist in our world. The central question, however, is not whether zombies can exist in our world, but whether they, or a whole zombie world (which is sometimes a more appropriate idea to work with), are possible in some broader sense.
So yeah: my take is that no one, including anti-physicalists who discuss p-zombies like Chalmers, really thinks that we can’t use self-report as evidence, and correctly so.
Agree, that’s a great pointer! For those interested, here is the paper and here is the podcast episode.
[Edited to add a nit-pick: the term ‘meta-consciousness’ is not used, it’s the ‘meta-problem of consciousness’, which is the problem of explaining why people think and talk the way they do about consciousness]
Thank you!
I enjoyed this excerpt and the pointer to the interview, thanks. It might be helpful to say in the post who Jim Davies is.
That may be right—an alternative would be to taboo the word in the post, and just explain that they are going to use people with an independent, objective track record of being good at reasoning under uncertainty.
Of course, some people might be (wrongly, imo) skeptical of even that notion, but I suppose there’s only such much one can do to get everyone on board. It’s a tricky balance of making it accessible to outsiders while still just saying what you believe about how the contest should work.
I think that the post should explain briefly, or even just link to, what a “superforecaster” is. And if possible explain how and why this serves an independent check.
The superforecaster panel is imo a credible signal of good faith, but people outside of the community may think “superforecasters” just means something arbitrary and/or weird and/or made up by FTX.
(The post links to Tetlock’s book, but not in the context of explaining the panel)
As one of the philosophers in question, I will now say there’s a very high chance this contains something worthwhile! And even if it’s not entirely novel (I’m not sure), I’m having trouble finding any papers that are obviously about this topic / concept, so it’s still very worth laying out.
And another literature pointer: Integrated Information Theory (IIT) specifies an “amount” of consciousness that a given system has. Adam Pautz criticizes IIT’s notion of “amount” as being ambiguous and potentially incoherent. Interestingly, Pautz’s list of potential ways in which experiences can be degreed does not (as far as I can tell) contain anything corresponding to your “size” notion.