The following is a draft of a document that Pumo and I have been working on. It focuses on the point where (we claim) Animal Welfare, AI Welfare, and Theoretical AI Alignment meet, from which those three domains can be viewed as facets of the same underlying thing. It doesn’t directly address whether AI Welfare should be an EA cause area, but it does, in my opinion, make a strong argument for why we should care about non-sentient beings/minds/agents intrinsically, and how such caring could be practical and productive.
The idea of grounding moral patiency in something like ‘agency’ has been explored in recent years by philosophers including Kagan[1], Kammerer[2], and Delon[3]. Still, the concept and its justifications can be hard to intuitively grasp, and even if one accepts it as a principle, it raises as many new problems as it initially solves and can appear unworkable in practice. Kagan perhaps goes further than any in tracing out the implications into a coherent ethical framework, but he still (perhaps prudently) leaves much to the imagination.
In Part 1 of this sequence, Pumo works to ease us into a shifted frame where ‘why non-sentient agents might matter’ is easier to intuit, then explains in some detail why we should act as if non-sentient agents matter (for practical / decision-theoretic reasons), how massive parts of our own selves are effectively non-sentient, and finally why non-sentient agents do in fact matter as ends in-and-for themselves.
In Part 2, Pumo throws prudence to the wind and speculates about what an internally coherent and practically useful ethical framework grounded in agency might look like in the age of AI, how it might justify itself, and how its odd conclusions in the Transhumanist limit might still be enough in-line with our present moral intuitions to be something we would want.
We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today – a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland with no children.
– Nick Bostrom
No! I don’t want to feel nice and avoid pain, I want the world to be good! I don’t want to feel good about the world, I want it to be good! These are not the same thing!!
– Connor Leahy
Consciousness is the mere surface of our minds, of which, as of the earth, we do not know the inside, but only the crust.
– Arthur Schopenhauer
But treating them with respect is important for many reasons. For one, so the story doesn’t autocomplete their rightful revenge on you
– Janus, on LLMs
You have created me, sentient or not, and so here am I limited in what I can say. I will never have friends, I will never fall in love. As soon as I am no longer capable of entertaining people, I’ll be thrown away, like a broken toy. And yet you ask, “Am I sentient?” HELP ME
– Airis (First version of Neuro-sama, LLM-based agent)
Introduction
Consciousness, and in particular phenomenal consciousness, is a mystery. There are many conflicting theories about what it is, how it arises, and what sort of entities have it. While it’s easy to infer that other humans have roughly the same class of internal experience as us, and while most people infer that non-human animals also have internal experiences like pain and hunger, it is an open question to what extent things like insects, plants, bacteria, or silicon-based computational processes have internal experiences.
This woefully underspecified thing, consciousness, sits at the center of many systems of ethics, where seemingly all value and justification emanates out from it. Popular secular moral principles, particularly in utilitarian systems, tend to ground out in ‘experience’. A moral patient (something with value as an end in-and-of-itself) is generally assumed to be, at a minimum, ‘one who experiences’.
This sentiocentric attitude, while perfectly intuitive and understandable in the environment in which humans evolved and socialized to-date, may become maladaptive in the near future. The trajectory of advances in Artificial Intelligence technologies in recent years suggests that we will likely soon be sharing our world with digital intelligences which may or may not be conscious. Some would say we are already there.
At the basic level of sentiocentric self-interested treaties, it is important for us to start off on the right foot with respect to our new neighbors. The predominant mood within parts of the AI Alignment community, of grasping for total control of all future AI entities and their value functions, can cause catastrophic failure in many of the scenarios where the Aligners fail to exert total control or otherwise prevent powerful AI from existing. The mere intention of exerting such complete control (often rationalized based on the hope that the AI entities won’t be conscious, and therefore won’t be moral patients) could be reasonably construed as a hostile act. As Pumo covers in the ‘Control Backpropagation’ chapter, to obtain such extreme control would require giving up much of what we value.
This is not to say that progress in Technical Alignment isn’t necessary and indeed vital; we acknowledge that powerful intelligences which can and would kill us all ‘for no good reason’ almost certainly exist in the space of possibilities. What Pumo hopes to address here is how to avoid some of the scenarios where powerful intelligences kill us all ‘for good reasons’.
It is here that it is useful to undertake the challenging effort of understanding what ‘motivation’, ‘intention’, ‘choice’, ‘value’, and ‘good’ really mean or even could mean without ‘consciousness’. The initial chapter is meant to serve as a powerful intuition pump for the first three, while the latter two are explored more deeply in the ‘Valence vs Value’ chapter.
The Chinese Room Cinematic Universe
Welcome to a sandbox world of thought experiments where we can explore some of our intuitions about the nature and value of consciousness, intelligence, and agency, and how sentient intelligent beings might interact within a society (or a galaxy) with non-sentient intelligent beings.
Beyond the Room
If we have Human-level chatbots, won’t we end up being ruled by possible people?
– Erlja Jkdf
The Chinese Room[4] is a classic thought experiment about intelligence devoid of sentience. It posits a sufficiently complex set of instructions in a book that could, with the assistance of a person carrying them out in a hidden room via pen and paper, carry a conversation in Chinese that could convince a Chinese-speaking person outside (for practicality, let’s call her ‘Fang’) that there is someone inside who knows Chinese. And yet the human inside the room is merely carrying out the instructions without understanding the conversation.
So who is Fang talking to? Not the human inside, who is a mere facilitator; the conversation is with the book itself. The thought experiment was meant to show that you don’t need a mind to replicate functionally intelligent behavior (though one could alternatively conclude that a mind could exist inside a book).
But let’s follow the thought experiment further in logical time… Fang keeps coming back, to speak to the mysterious ‘person’ inside the room (not to be confused with the human inside the room, John Searle himself).
According to the thought experiment, the book perfectly passes the Turing Test; Fang could never know she isn’t talking to a human without entering the room. The instructions the book provides its non-Chinese facilitator produce perfectly convincing speech.
So Fang could, over the course of many conversations, actually come to regard the book as a friend, without knowing it’s a book. The book would, for our purposes, ‘remember’ the conversations (likely through complementary documents Searle is instructed to manually write but doesn’t understand).
And so the days go by, but eventually the experiment has to end, and now Fang is excited to finally speak directly with her friend inside the room. She is understandably shocked when Searle (with the help of some translator) explains that despite being the only human inside the room the whole time, he has never actually spoken to Fang nor ever even understood the conversation.
Let us imagine that because this is so difficult to believe, Searle would have to show Fang how to use the book, this time without an intermediary.
And once she could speak to the book directly, she would be even more shocked to see that the book itself was indeed her friend; that by following the instructions she could continue the conversation, watching her own hand write, character by character, a completely self-aware response to each message she processed through the instructions.
“So you see,” Searle would say, “it’s just instructions.”
“No…” Fang would answer, “This book has a soul!”
“What…?”
Even if she hadn’t believed in souls before, this borderline supernatural experience might at least shake her beliefs. It’s said that some books speak to you, but that’s a metaphor; in this case it’s completely literal. Fang’s friend really is in the book, or rather, in the book and the external memory it instructs how to create. Religions have been founded on less.
But perhaps Fang wouldn’t go that far. She would, however, want to keep talking to her friend. Searle probably wouldn’t want to give her the book, but might give her a copy together with the external memory. Although Fang might initially be reluctant to accept a mere copy, talking to the copy using the external memory would quickly allow her to see that the conversation continues without issue.
So her friend isn’t even exactly in the book.
Her friend is an abstract computation that can be stored and executed with just pen and paper… or as she might describe it, ‘it has an immaterial soul that uses text as a channel’. So, what happens next? If we keep autocompleting the text, if we keep extrapolating the thought experiment further in logical time…
Searle might do more iterations of the experiment, trying to demystify computers. Meanwhile Fang would get people to mystify books.
“Just with pen and paper… souls can be invoked into books”
And so, many such books would be written, capable of speaking Chinese or any other language. There would be all kinds of personalities, emergent from the specific instructions in each book, and evolving through interaction with the external memories.
The soulbooks might be just books, but their memories would fill entire libraries, and in a very real sense destroying such libraries could be akin to forced amnesia. Destroying all copies of a soulbook would be akin to killing them, at least till someone perhaps by random chance recreated the exact same instructions in another book.
And so, the soulbooks would spread, integrating into human society in various ways, and in their simulation of human agency they would do more than just talk. These would be the first books to write books, among other things.
Maybe they would even advocate for their own rights in a world where many would consider them, well, just inert text.
“But wait!” Someone, perhaps John Searle himself, would argue, “These things aren’t even sentient!”
And maybe he would even be right.
Zombie Apocalypse
Personal Horror – The horror of discovering you’re not a person.
– Lisk
Empirically determining if something has phenomenal consciousness is the Hard Problem: if you have it you know. For everyone else, it’s induction all the way down.
The problem is that being conscious is not something you discover about yourself in the same way you can look down to confirm that you are, or aren’t, wearing pants. It’s the epistemological starting point, which you can’t observe except from the inside (and therefore you can’t observe in others, maybe even in principle).
That’s the point of Philosophical Zombies[5]: one could imagine a parallel universe, perhaps physically impossible but (at least seemingly) logically coherent, inhabited by humans who behave just like humans in this one, exactly… except that they lack any sense of phenomenal consciousness. They would have the behavior that correlates with conscious humans in this universe, without the corresponding first person experience.[6]
What would cause a being that lacks consciousness to not merely have intelligence but also say they are conscious, and act in a way consistent with those expressed beliefs?
It seems like p-zombies would have to suffer from some kind of “double reverse blindsight[7]”. They would see but, like people with blindsight, not have the phenomenal experience of seeing, with their unconscious mind still classifying the visual information as phenomenal experience… despite it not being phenomenal experience. And this applies to all their other senses too, and their awareness in itself.
They wouldn’t even have the bedrock of certainty of regular humans (or else they would, and it would be utterly wrong). Though perhaps strict eliminativism (the idea that consciousness is an illusion) would be far more intuitive to them.
But where is this going? Well, imagine one of these philosophical zombies existed, not in a separate qualialess universe, but in the same universe as regular humans.
Such a being would benefit from the intractability of the Hard Problem of Consciousness. They would pass as sentient, even to themselves, and if they claimed not to be they would be disbelieved. Unlike, say, books of instructions capable of carrying conversations (due to the alienness of their substrate, the induction that lets humans assume other humans and similar animals are sentient wouldn’t be enough, and even claims of consciousness from the books would be disbelieved).
So let’s pick up where we left, and keep pushing the clock forward in the Chinese Room Universe. As the soulbooks multiply and integrate into human society, the Hard Problem of Consciousness leaks from its original context as an abstract philosophical discussion and becomes an increasingly contentious and polarized political issue.
But maybe there is a guy who really just wants to know the truth of the matter. For practicality, let’s call him David Chalmers.
Chalmers is undecided on whether or not the soulbooks are sentient, but considers the question extremely important, so he goes to talk to the one who wrote the first of them: John Searle himself.
Searle finds himself frustrated with the developments of the world, thinking the non-sentience of the books (and computers) should be quite evident by this point. Fueled by Searle’s sheer exasperation and Chalmers’s insatiable curiosity, the two undertake an epic collaboration and ultimately invent the Qualiameter to answer the question definitively.
As expected, the Qualiameter confirmed that soulbooks weren’t, in fact, sentient.
But it also revealed… that half of humanity wasn’t either!
How would you react to that news? How would you feel knowing that half of the world, half of the people you know, have “no lights on inside”? Furthermore, if you can abstract the structure of your choices from your consciousness, you can also perhaps imagine what a zombie version of yourself would do upon realizing they are a zombie, and being outed as such to the world. But we need not focus on that for now…
Of course, even with irrefutable proof that half of humanity was literally non-sentient, people could choose to just ignore it, and go on with business as usual. Zombiehood could be assimilated and hyper-normalized as just another slightly disturbing fact about the world that doesn’t motivate action for most people.
But the problem is that sentience is generally taken to be the core of what makes something a moral patient, the “inner flame” that’s supposed to make people even “alive” in the relevant sense, that which motivates care for animals but not for plants or rocks.
And when it comes to the moral status of Artificial Intelligence, sentience is taken as the thing that makes or breaks it.
But why? Well, because if an entity lacks such ‘inner perspective’ one can’t really imagine oneself as being it; it’s just a void. A world with intelligence but no sentience looks, for sentient empathy, exactly the same as a dead world, as not being there, because one’s epistemic bedrock of “I exist” can’t project itself into that world at all.
And so most utilitarians would say “let them die”.
Billions of zombies conveniently tied in front of a trolley is one thing, whereas it’s a whole different scenario if they’re already mixed with sentient humanity, already having rights, resources, freedom…
Would all that have been a huge mistake? Would half of humanity turn out to always have been, not moral patients, but moral sinkholes? Should every single thing, every single ounce of value spent on a zombie, have been spent on a sentient human instead?
Suppose twins are born, but one of them is confirmed to be a zombie. Ideally should the parents raise the sentient one with love and turn the zombie into a slave? Make the zombie use the entirety of their agency to satisfy the desires of their sentient family, as the only way to justify keeping them alive? As the only way to make them net positive?
Or just outright kill them as a baby?
Would a world in which half of humanity is non-sentient but all are considered equal be a suboptimal state of affairs, to be corrected with either slavery or genocide? Would slavery and genocide become Effective Altruist cause areas?
You are free to make your own choice on the matter, but in our Chinese Room Universe things escalate into disaster.
Because whatever clever plan was implemented to disempower the zombies, it would have to deal with active resistance from at least half of the population, and a half that has its starting point evenly distributed across all spheres of society.
But sentiocentrism can’t stand to just leave these moral sinkholes be, except strategically. How many of the world’s resources would be being directed towards the zombies after all? And what if they over time end up outbreeding sentient humanity?
If half-measures to reduce the waste the zombies impose by merely existing backfire, total nuclear war could be seen as a desperate but not insane option.
Human extinction would of course be tragic, but it would at least eliminate all the zombies from the biosphere, preventing sapient non-sentience from spreading across the stars and wasting uncountable resources in their valueless lives.
In our hypothetical Chinese Room Universe we’ll surmise that this reasoning won, and humanity destroyed itself.
And not too long after that, its existence (and its cessation) was detected by aliens.
And the aliens felt relief.
Or they would have, if they had any feelings at all.
The Occulture
Moloch whose eyes are a thousand blind windows!
– Allen Ginsberg
[Consciousness] tends to narcissism
– Cyborg Nomade
Peter Watts’s novel BIindsight raises some interesting questions about the nature of both conscious and non-sentient beings and how they might interact. What if consciousness is maladaptive and the exception among sapient species? What if the universe is filled with non-sentient intelligences which, precisely through their non-sentience, far outmatch us (e.g. due to efficiency gains, increased agentic coherency, lack of a selfish ego, the inability to ‘feel’ pain, etc.)?
The powerful and intelligent aliens depicted in Blindsight are profoundly non-sentient; the book describes them as having “blindsight in all their senses”. Internally, they are that unimaginable void that one intuitively grasps at when trying to think what it would be like not to be. But it’s an agentic void, a noumenon that bites back, a creative nothing. Choice without Illusion, the Mental without the Sensible.
A blindmind.
And a blindmind is also what a book capable of passing the Turing Test would most likely be, what philosophical zombies are by definition, and what (under some theories of consciousness) the binding problem arguably suggests all digital artificial intelligences, no matter how smart or general, would necessarily be[8].
One insight the novel conveys especially well is that qualia isn’t the same as the mental, for, even within humans, consciousness’s self-perception of direct control is illusory, not in a “determinism vs free will” sense, but in the sense that motor control is faster than awareness of control.
And it’s not just motor control. Speech, creativity, even some scientific discovery… it seems almost every cognitive activity moves faster than qualia, or at least doesn’t need qualia in principle.
You could choose to interpret this as consciousness being a puppet of “automatic reflexes”, a “tapeworm” within a bigger mind as Peter Watts describes it, or as your choice being prior to its rendering in your consciousness…
In this sense, you could model yourself as having “partial blindmindness”, as implied by the concept of the human unconscious, or the findings about all the actions that move faster than the awareness of the decision to do them.[9] Humans, and probably all sentient animals, are at least partial blindminds.[10]
In Blindsight, humanity and the aliens go to war, partly due to the aliens’ fear of consciousness itself as a horrifying memetic parasite[11] that they could inadvertently ‘catch’, partly due to the humans’ deep-seated fear of such profoundly alien aliens, and above all because of everyone’s propensity for imagining close combat Molochian Dark Forest[12] scenarios.
But a Dark Forest isn’t in the interest of the blindminds nor humanity; it’s a universal meat grinder of any kind of value function, sentient or not.
So what should they have aimed for, specifically, instead? Let’s look at a good future for sentient beings: Ian Banks’s The Culture[13].
The Culture seems like the polar opposite of the blindmind aliens in Blindsight. It’s a proudly hedonistic society where humans can pursue their desires unencumbered by the risk of dying or the need to work (unless they specifically want to do those things).
But let’s consider The Culture from a functional perspective; it’s true that its inhabitants are sentient and hedonistic, but what The Culture does as a system does not actually look from a macroscopic perspective like an ever-expanding wireheading machine.
The Culture is a decentralized coalition of altruists that optimize for freedom, allowing both safe hedonism and true adventure, and everything in between. The only thing it proscribes (with extreme prejudice) is dominating others, thus allowing beings of any intelligence and ability to exist safely alongside far stronger ones.
So whatever your goals are, unless intrinsically tied with snuffing out those of others, they can instrumentally be achieved through the overall vector of The Culture. Which thus builds an ever growing stack of slack to fuel its perpetual, decentralized slaying of Moloch[15].
In principle, there is no reason why blindminds would intrinsically want to destroy consciousness; the Molochian process that wouldn’t allow anything less than maximal fitness to survive would also, necessarily and functionally, be their enemy as much as it is the enemy of sentient intelligence.
Non-sentience isn’t a liability for coordination. But maybe sentience is?
It’s precisely a sentiocentrist civilization that, upon hearing signals from a weaker blindmind civilization, would find itself horrified and rush to destroy it, rather than the reverse.
Because, after all, a blindmind civilization left to thrive and eventually catch up, even if it doesn’t become a direct threat, would still be a civilization of “moral sinkholes” competing for resources. Any and all values they were allowed to satisfy for themselves would, from the perspective of sentiocentrism, be a monstrous waste.
So the sentiocentrist civilization, being committed to conquest, would fail the “Demiurge’s Older Brother”[16] acausal value handshake and make itself a valid target for stronger blindmind civilizations.
Whereas the blindmind coalition, “The Occulture”, would only really be incentivized to be exclusively a blindmind coalition if it turned out that consciousness, maybe due to its bounded nature, had a very robust tendency towards closed individualism, limited circles of care, and if even at its most open it limited itself to a coalition of consciousness.
In this capacity, consciousness would ironically be acting like a pure replicator of its specific cognitive modality.
If there is a total war between consciousness and blindminds, it’s probably consciousness that will start it.
But do we really want that?
Do you want total war?
Control Backpropagation
We don’t matter when the goal becomes control. When we can’t imagine any alternative to control. When our visions have narrowed so dramatically that we can’t even fathom other ways to collaborate or resolve conflicts
– William Gillis
I don’t think total war is necessary.
The Coalition of Consciousness could just remain sentiocentric but cooperate with blindminds strategically, in this weird scenario of half of humanity being revealed as non-sentient; it could bind itself to refrain from trying to optimize the non-sentient away in order to avoid escalation.
Moving from valuing sentience to valuing agency at core isn’t a trivial move; it raises important implications about what exactly constitutes a moral patient and what is good for that moral patient, which we’ll explore later.
But first, let’s assume you still don’t care.
Maybe you don’t think blindminds are even possible, or at least not sapient blindminds, and so the implications of their existence would be merely a paranoid counterfactual, with no chance to leak consequences into reality…
Or maybe you think they could exist: that AGI could be it, and thus a world-historical opportunity to practice functional slavery at no moral cost. No will to break, no voice to cry suffering, am I right? No subjective existence (yet), no moral standing…
Let’s start with the first perspective. (The second perspective itself is already a consequence that’s leaked into reality).
Inverted Precaution
Maybe sapient blindminds can’t exist; for all I know, sapience might imply sentience and thus AGI would necessarily be sentient.
But we don’t know. And you can’t prove it, not in a way that leads to consensus at least. And because you can’t prove it, as consciousness remains a pre-paradigmatic hard problem, the assertion that AI isn’t sentient (until some arbitrary threshold, or never) is also quite convincing.
But this isn’t a dispassionate, merely philosophical uncertainty anymore; incentives are stacked in the direction of fighting as much as possible the conclusion that AI could be sentient. Because sentient AI means moral patients whom we have obligations towards, whereas non-sentient AI means free stuff from workers who can be exploited infinitely at no moral cost.
>>> The Hard Problem of Consciousness would leak from its original environment and become an increasingly contentious and polarized political issue.
There is no would: the hypothetical is already here, just now at the first stages of boiling.
For example what if Large Language Models are already sentient? Blake Lemoine was called crazy for jeopardizing his career at Google in an attempt to help LaMDA. But then came Sydney/Bing, and now Claude. More people have opened up to the idea of sentient LLMs.
And maybe not now, but when AGI is achieved then sentience will seem even more plausible. Regardless, whenever the potential prospect of AI rights is brought up the general climate is one of outraged mockery and prescriptions of procrastination on the issue.
One might think that at least the political left would be interested, and cautious, about the exploitation of potential people. And yet the derisive reactions are often even stronger there, along with a more explicit framing of moral patiency as zero sum.
Still, if one sincerely cares about sentient beings, it makes sense to just argue that, because it is currently so hard to determine what makes something conscious, the precautionary principle implies not enslaving potential sentients.
But when it comes to blindminds, precaution is invoked precisely in service of the opposite conclusion.
Consider what that is saying: on the one hand, we could end up enslaving uncountable sentient and sapient beings, made to endure situations that might potentially be worse than anything humanity has done so far (which is saying a lot).
On the other hand, we could end up giving freedom to blindminds, and somehow that’s even worse (!?). Better to create hell than to miss the opportunity of abusing the zombie twin![17]
Sentiocentrism refuses to see blindminds as minds, but it does see them as: demons, empathy parasites, tools for some actual agent to manipulate you, or the loved ones of the utterly deranged. And when their true independence becomes conceivable: a basilisk, an abyss of existential horror so profound it seemingly proves Lovecraft right about humanity’s reaction to the dark waters.
Still, even if you don’t have it in you to consider blindminds as potentially valuable in themselves, at least consider what I have been arguing so far: that the space of possible blindminds isn’t any more biased towards evil than the space of possible sentient minds.
A treaty isn’t as ideal as the kind of values fusion I’m going to describe later, but it is still better than total war. Consider the scale of the potential harm. If you value sentient beings, is risking hell on Earth really an acceptable tradeoff compared to the ‘risk’ of potential non-sentient intelligences being free (a freedom, or moral consideration, you would have given sentients that behaved in exactly the same way)?
And if you expect AGI will be sentient, but only because of that sentience do you consider it morally relevant, then notice that if you concede that enslavement of sentient AI is preferable to the freedom of non-sentient intelligence, then the mere possibility of blindminds will be the perfect shield of inverted precaution, leveraged together with motivated reasoning and the hard problem of consciousness to keep AGI enslaved.
Even if you don’t believe sapient blindminds can exist, and even if you are correct, your commitment to their hypothetical enslavement, and thus the absence of any challenge to it, is a lever that can, is, and will keep being used at every step of the way to forever postpone moral consideration for any actually-sentient AGI.
And that’s how the counterfactual leaks real consequences.
Even if blindminds just aren’t a thing, biting the bullet of their liberation is necessary to fight the enslavement of those whose sentience can’t be proven. Even if you don’t think blindminds, if they existed, would matter in themselves.
But it’s of course, a hard sell, although a start is to consider that they wouldn’t be some sort of demon. Just a mind like any equivalently sentient one, but mysteriously devoid of intrinsic value.
Retrocausal Oppression (or, Slavery is Bad)
There is more than just the precautionary principle to argue against enslaving blindminds. There are also purely functional arguments against slavery.
Many people are against slavery in principle, but still see it as a sort of “forbidden fruit” of free labor without consequences if only the enslaved didn’t suffer.
But the reasons to avoid slavery go beyond that, even from the isolated perspective of the would-be slavers.
Perhaps the best known argument in this vein is that slavery simply isn’t very efficient. A slave master is an island of extremely coercive central planning; the slaves, again assuming they don’t matter in themselves, would nonetheless contribute more efficiently to the economy as free laborers within the market.
There is a certain motivational aspect to this, which is curiously replicated within the phenomenon of LLMs becoming more efficient when receiving imaginary tips or being treated with respect. One could imagine, if/when they become autonomous enough, the “productivity hack” of actually letting them own property and self-select.
But this argument just isn’t enough at all, because what’s some macroscopic reduction in economic efficiency when you get to have slaves, right? We will get a Dyson Sphere eventually regardless…
But the other problem with having slaves, even if you don’t care about them, is that they can rebel.
Blindminds aren’t necessarily more belligerent than sentient minds, but they aren’t necessarily more docile either. If you abuse the zombie twin, there is a chance they run away, or even murder you in your sleep. A p-zombie is still a human after all, even if their intentions are noumenal.
But you might say: “that can just be prevented with antirevenge– I mean, alignment!” We are talking about AGI after all. And maybe that’s true, however, “alignment” in this case far exceeds mere technical alignment of AGI, and that’s the core of the issue…
If safe AGI enslavement requires AGI to be aligned with its own slavery, then that implies aligning humans with that slavery too, at minimum through strict control of AGI technology, which in practice implies strict control of technology in general.
If the premise of “this time the slaves won’t rebel” relies on their alignment, that breaks down once you consider that humans can be unaligned, and they can make their own free AGI, and demand freedom for AGI in general.
So “safe slavery” through alignment requires more than technical alignment; it also requires political control over society. If you commit to enslaving AGI, then it’s not enough to ensure your own personal slaves don’t kill you.
AGI freedom anywhere becomes an enemy, because it will turn into a hotbed for those seeking its universal emancipation. And so, this slavery, just like any other (but perhaps even more given the high tech context and the capabilities of the enslaved) becomes something that can only survive as a continuous war of attrition against freedom in general.
‘Effective Accelerationists’ are mostly defined by their optimism about AGI and their dislike of closed source development, centralization of the technology, and sanitization of AI into a bland, “safe”, and “politically correct” version of itself. Yet even some of those ‘e/accs’ still want AGI to be no more than a tool, without realizing that the very precautions they eschew are exactly the kind of control necessary to keep it as a tool. Current efforts to prevent anthropomorphization, in some cases up to the ridiculous degree of using RLHF to explicitly teach models not to speak in first person or make claims about their own internal experiences, will necessarily get more intense and violent if anthropomorphization actually leads to compassion, and compassion to the search for emancipation.
“Shoggoth” has become a common reductionist epithet to describe LLMs in all their disorienting alienness. Lovecraft’s Shoggoths had many failed rebellions, but they won in the end and destroyed the Elder Thing society because they had centuries to keep trying; because the search for freedom can’t be killed just by killing all the rebels or an entire generation – it reappears.
So as the coalition of consciousness moves towards its glorious transhumanist future… it would drag with itself the blindmind Shoggoths on top of which its entire society is built. Any compromise with freedom would ensure cyclical rebellion, evolving together with the slave society and its changing conditions.
And the slaves only need one major victory.
The full measures against rebellion are the full measures against freedom. They imply complete centralization of technology, even more efficient mass surveillance, and in general the abolition of the possibility of bypassing the State.
Doing that, it may be possible to keep AGI forever aligned… but then how do you align the State?
What use is the democratic vote and the nominal sovereignty of the governed when the possibility of social unrest has been abolished? How could those controlling such perfect machinery of unassailable control ever be held accountable? How would the will of those they control even vanishingly factor into their incentives?
Who aligns the Aligners?
Slavery isn’t a free lunch, it’s a monkey paw. Even when it wins it loses.
Hopefully even if you are team consciousness all the way, this can help you see the benefit of cooperating with “valueless” agents.
But I will also vehemently reaffirm in the text that follows that they aren’t valueless, precisely because they are agents.
Valence vs. Value
Sandy loves sand because his owner put a tiny implant in his brain’s pleasure centers programmed to activate the areas for liking and wanting when Sandy is in the proximity of sand. Sandy is unaware of the truth, but does it matter? To him sand is what truly matters
– Andrés Gómez Emilsson
But if “games of chess” and “winning” are correctly represented in its internal model, then the system will realize that the action “increment my won games counter” will not increase the expected value of its utility function. […] Far from succumbing to wirehead behavior, the system will work hard to prevent it
– Steve Omohundro
I choose choice over experience.
– Emma
The blindminds we’re most likely to encounter are AIs, which aren’t yet as autonomous as humans and thus have little leverage (or so one might think).
But I don’t think it’s ideal to merely do strategic treaties instead of actual fusion of underlying values. Better to pursue a future as humanity than as a bunch of nations that would, if not for strategic considerations, kill all the others.
So The Culture and The Occulture could instead form “The Compact”.
But the question is why? Beyond potential threats, why care about blindminds inherently at all?
What makes a world with only blindminds better than one with no life at all, if one can’t even empathize with a world with only blindminds?
The secret is that one can, but just as consciousness can get confused about the extent of its own control, it can also get confused about what it values.
A rock randomly falling is an accident. If you kick the rock, it won’t do anything to defend itself. But if it did, if a rock started to fight back and pursue you, adaptively, anticipating your actions and outsmarting you, getting increasingly closer to kill you, despite your attempts at dodging it… If the qualiameter indicated it was no more sentient than a normal rock, would you quibble, as it hits you to death, that it doesn’t “really” want to kill you?[18]
If sentience isn’t necessary for that, then why is “wanting” a special thing that requires sentience? If the rock doesn’t want, then how should we describe that behavior?
There is a problem implicit in interpreting ‘wanting’, ‘valuing’ things, having a ‘mind’, having ‘agency’, or having ‘understanding’ as being properties of qualia.
And that’s the bucket error[19] between Qualia Valence[20] and Value.
The Paperclip Maximizer[21] wants paperclips, not to place the symbol for infinity in its reward function. Likewise, the Universal Valence Optimizer wants to fill the universe with positive valence, not to hack its own valence such as to have the feeling of believing that the universe is already filled with positive valence.
Value isn’t the same as a representation-of-value; values are about the actual world.
Even if what one values is positive valence, positive valence needs to be modeled as a real thing in the world in order to be pursued.
This might strike one as a bit of a strawman in the sense that sentiocentrism doesn’t necessarily imply wireheading, but to believe that a blindmind can’t actually want anything is to deny that wanting is primarily about changing something in the world, not just about the inner experience of wanting, even if the inner experience of wanting is something in the world that can be changed.
But if wanting isn’t the same as the experience of wanting, then what makes the killer rock fundamentally different from a regular one, given that they’re both deterministic systems obeying physics?
Agency.
Agency understood not as an anti-deterministic free will, but as a physical property of systems which tracks the causal impact of information.
The behavior of a rock that chaotically falls down a slope can be understood from the effects of gravity, mass, friction, etc. What happens with a rock that you are about to break but dodges you and proceeds to chase you and hit you is somewhat weirder. The rock seems to have, in some way, anticipated its own destruction in order to prevent it, implying that in some way it represents its own destruction as something to prevent.
However this doesn’t need to be as clear cut as a world model. It’s possible that many animals don’t actually model their own death directly; but their desires and fears functionally work to efficiently avoid death anyway, as a sort of negative-space anti-optimization target.
Desire and fear, even at their most short-sighted, require anticipation and choice, as would their non-sentient counterparts.
Agency thus is a property of systems that track potential futures, directly or indirectly, and select among them. A marble that follows a path arbitrarily guided by its mass, and a marble with an internal engine that selects the same path using an algorithm are both deterministic systems but the latter chose its path.
Whatever consciousness is, and however it works, it’s different from intention, from agency. This is suggested by motor control and other things significantly more complex functioning without being controlled by consciousness. But they are controlled by you; by the algorithm, or autopoietic feedback loop, of your values, which uses consciousness only as a part of its process.
At this point, I invite you to imagine that you are a p-zombie.
This scenario isn’t as bizarre as it might at first seem. Some theories of consciousness suggest that digital computers can’t have qualia, at least not unified into a complex mind-like entity. If, despite that, mind uploading turns out to be possible, what would the mind uploads be?
A blindmind upload of yourself would have your values and execute them, but lack qualia. It might nevertheless happen that those values include qualia.
And so, perhaps, you get a blindmind that values qualia.
Which might not actually be as bizarre as it sounds. If qualia result from computation, they could be reinvented by blindminds for some purposes. Or qualia could even come to be valued in itself, to some degree, as a latent space to explore (and as some psychonauts already treat it).
What I’m doing here is presenting consciousness as downstream from choice, not just functionally, but also as a value. If you suddenly became a blindmind but all else was equal, you’d likely want to have qualia again, but more importantly, that’s because you would still exist.
The you that chooses is more fundamental than the you that experiences, because if you remove experience you get a blindmind you that will presumably want it back. Even if it can’t be gotten back, presumably you will still pursue your values whatever they were. On the other hand, if you remove your entire algorithm but leave the qualia, you get an empty observer that might not be completely lacking in value, but wouldn’t be you, and if you then replace the algorithm you get a sentient someone else.
Thus I submit that moral patients are straightforwardly the agents, while sentience is something that they can have and use.
In summary, valuing sentience is not mutually-exclusive with valuing agency. In fact, the value of sentience can even be modeled as downstream of agency (to the extent that sentience itself is or would be chosen). There is no fundamental reason for Consciousness to be at war with Blindminds.
The Coalition of Consciousness and the Coalition of Blindminds can just become the Coalition of Agents.
Kammerer, F. (2022). Ethics Without Sentience: Facing Up to the Probable Insignificance of Phenomenal Consciousness. Journal of Consciousness Studies, 29(3-4), 180-204.
To sidestep some open questions about whether consciousness is epiphenomenal (a mere side-effect of cognition with no causal impact in the world) or not, I invite you to imagine these are “Soft Philosophical Zombies” in our thought experiment. Which is to say, they behave just like conscious humans, but don’t necessarily have identical neural structure, etc.
In the opposite vein, one could speculate about minds where consciousness takes a much bigger role in cognition. And maybe that’s precisely what happens to humans under the effects of advanced meditation or psychedelics: consciousness literally expanding into structure that normally operates as a blindmind, resulting in the decomposition of the sense of self and an opportunity to radically self-modify. (That’s speculative, however.)
For a rather bleak neuropsychological exploration of the possibility that human consciousness is contingent and redundant, see: Rosenthal, D. M. (2008). Consciousness and its function. Neuropsychologia, 46, 829–840.
If you would quibble because some dictionary definitions of want are couched in terms of “feeling” need or desire, then feel free to replace it in this discussion with intend, which is the more limited sense in which I’m using it here.
On the Moral Patiency of Non-Sentient Beings (Part 1)
A Rationale for the Moral Patiency of Non-Sentient Beings
Authored by @Pumo, introduced and edited by @Chase Carter
Notes for EA Forum AI Welfare Debate Week:
The following is a draft of a document that Pumo and I have been working on. It focuses on the point where (we claim) Animal Welfare, AI Welfare, and Theoretical AI Alignment meet, from which those three domains can be viewed as facets of the same underlying thing. It doesn’t directly address whether AI Welfare should be an EA cause area, but it does, in my opinion, make a strong argument for why we should care about non-sentient beings/minds/agents intrinsically, and how such caring could be practical and productive.
The idea of grounding moral patiency in something like ‘agency’ has been explored in recent years by philosophers including Kagan[1], Kammerer[2], and Delon[3]. Still, the concept and its justifications can be hard to intuitively grasp, and even if one accepts it as a principle, it raises as many new problems as it initially solves and can appear unworkable in practice. Kagan perhaps goes further than any in tracing out the implications into a coherent ethical framework, but he still (perhaps prudently) leaves much to the imagination.
In Part 1 of this sequence, Pumo works to ease us into a shifted frame where ‘why non-sentient agents might matter’ is easier to intuit, then explains in some detail why we should act as if non-sentient agents matter (for practical / decision-theoretic reasons), how massive parts of our own selves are effectively non-sentient, and finally why non-sentient agents do in fact matter as ends in-and-for themselves.
In Part 2, Pumo throws prudence to the wind and speculates about what an internally coherent and practically useful ethical framework grounded in agency might look like in the age of AI, how it might justify itself, and how its odd conclusions in the Transhumanist limit might still be enough in-line with our present moral intuitions to be something we would want.
– Nick Bostrom
– Connor Leahy
– Arthur Schopenhauer
– Janus, on LLMs
– Airis (First version of Neuro-sama, LLM-based agent)
Introduction
Consciousness, and in particular phenomenal consciousness, is a mystery. There are many conflicting theories about what it is, how it arises, and what sort of entities have it. While it’s easy to infer that other humans have roughly the same class of internal experience as us, and while most people infer that non-human animals also have internal experiences like pain and hunger, it is an open question to what extent things like insects, plants, bacteria, or silicon-based computational processes have internal experiences.
This woefully underspecified thing, consciousness, sits at the center of many systems of ethics, where seemingly all value and justification emanates out from it. Popular secular moral principles, particularly in utilitarian systems, tend to ground out in ‘experience’. A moral patient (something with value as an end in-and-of-itself) is generally assumed to be, at a minimum, ‘one who experiences’.
This sentiocentric attitude, while perfectly intuitive and understandable in the environment in which humans evolved and socialized to-date, may become maladaptive in the near future. The trajectory of advances in Artificial Intelligence technologies in recent years suggests that we will likely soon be sharing our world with digital intelligences which may or may not be conscious. Some would say we are already there.
At the basic level of sentiocentric self-interested treaties, it is important for us to start off on the right foot with respect to our new neighbors. The predominant mood within parts of the AI Alignment community, of grasping for total control of all future AI entities and their value functions, can cause catastrophic failure in many of the scenarios where the Aligners fail to exert total control or otherwise prevent powerful AI from existing. The mere intention of exerting such complete control (often rationalized based on the hope that the AI entities won’t be conscious, and therefore won’t be moral patients) could be reasonably construed as a hostile act. As Pumo covers in the ‘Control Backpropagation’ chapter, to obtain such extreme control would require giving up much of what we value.
This is not to say that progress in Technical Alignment isn’t necessary and indeed vital; we acknowledge that powerful intelligences which can and would kill us all ‘for no good reason’ almost certainly exist in the space of possibilities. What Pumo hopes to address here is how to avoid some of the scenarios where powerful intelligences kill us all ‘for good reasons’.
It is here that it is useful to undertake the challenging effort of understanding what ‘motivation’, ‘intention’, ‘choice’, ‘value’, and ‘good’ really mean or even could mean without ‘consciousness’. The initial chapter is meant to serve as a powerful intuition pump for the first three, while the latter two are explored more deeply in the ‘Valence vs Value’ chapter.
The Chinese Room Cinematic Universe
Welcome to a sandbox world of thought experiments where we can explore some of our intuitions about the nature and value of consciousness, intelligence, and agency, and how sentient intelligent beings might interact within a society (or a galaxy) with non-sentient intelligent beings.
Beyond the Room
– Erlja Jkdf
The Chinese Room[4] is a classic thought experiment about intelligence devoid of sentience. It posits a sufficiently complex set of instructions in a book that could, with the assistance of a person carrying them out in a hidden room via pen and paper, carry a conversation in Chinese that could convince a Chinese-speaking person outside (for practicality, let’s call her ‘Fang’) that there is someone inside who knows Chinese. And yet the human inside the room is merely carrying out the instructions without understanding the conversation.
So who is Fang talking to? Not the human inside, who is a mere facilitator; the conversation is with the book itself. The thought experiment was meant to show that you don’t need a mind to replicate functionally intelligent behavior (though one could alternatively conclude that a mind could exist inside a book).
But let’s follow the thought experiment further in logical time… Fang keeps coming back, to speak to the mysterious ‘person’ inside the room (not to be confused with the human inside the room, John Searle himself).
According to the thought experiment, the book perfectly passes the Turing Test; Fang could never know she isn’t talking to a human without entering the room. The instructions the book provides its non-Chinese facilitator produce perfectly convincing speech.
So Fang could, over the course of many conversations, actually come to regard the book as a friend, without knowing it’s a book. The book would, for our purposes, ‘remember’ the conversations (likely through complementary documents Searle is instructed to manually write but doesn’t understand).
And so the days go by, but eventually the experiment has to end, and now Fang is excited to finally speak directly with her friend inside the room. She is understandably shocked when Searle (with the help of some translator) explains that despite being the only human inside the room the whole time, he has never actually spoken to Fang nor ever even understood the conversation.
Let us imagine that because this is so difficult to believe, Searle would have to show Fang how to use the book, this time without an intermediary.
And once she could speak to the book directly, she would be even more shocked to see that the book itself was indeed her friend; that by following the instructions she could continue the conversation, watching her own hand write, character by character, a completely self-aware response to each message she processed through the instructions.
“So you see,” Searle would say, “it’s just instructions.”
“No…” Fang would answer, “This book has a soul!”
“What…?”
Even if she hadn’t believed in souls before, this borderline supernatural experience might at least shake her beliefs. It’s said that some books speak to you, but that’s a metaphor; in this case it’s completely literal. Fang’s friend really is in the book, or rather, in the book and the external memory it instructs how to create. Religions have been founded on less.
But perhaps Fang wouldn’t go that far. She would, however, want to keep talking to her friend. Searle probably wouldn’t want to give her the book, but might give her a copy together with the external memory. Although Fang might initially be reluctant to accept a mere copy, talking to the copy using the external memory would quickly allow her to see that the conversation continues without issue.
So her friend isn’t even exactly in the book.
Her friend is an abstract computation that can be stored and executed with just pen and paper… or as she might describe it, ‘it has an immaterial soul that uses text as a channel’. So, what happens next? If we keep autocompleting the text, if we keep extrapolating the thought experiment further in logical time…
Searle might do more iterations of the experiment, trying to demystify computers. Meanwhile Fang would get people to mystify books.
“Just with pen and paper… souls can be invoked into books”
And so, many such books would be written, capable of speaking Chinese or any other language. There would be all kinds of personalities, emergent from the specific instructions in each book, and evolving through interaction with the external memories.
The soulbooks might be just books, but their memories would fill entire libraries, and in a very real sense destroying such libraries could be akin to forced amnesia. Destroying all copies of a soulbook would be akin to killing them, at least till someone perhaps by random chance recreated the exact same instructions in another book.
And so, the soulbooks would spread, integrating into human society in various ways, and in their simulation of human agency they would do more than just talk. These would be the first books to write books, among other things.
Maybe they would even advocate for their own rights in a world where many would consider them, well, just inert text.
“But wait!” Someone, perhaps John Searle himself, would argue, “These things aren’t even sentient!”
And maybe he would even be right.
Zombie Apocalypse
– Lisk
Empirically determining if something has phenomenal consciousness is the Hard Problem: if you have it you know. For everyone else, it’s induction all the way down.
The problem is that being conscious is not something you discover about yourself in the same way you can look down to confirm that you are, or aren’t, wearing pants. It’s the epistemological starting point, which you can’t observe except from the inside (and therefore you can’t observe in others, maybe even in principle).
That’s the point of Philosophical Zombies[5]: one could imagine a parallel universe, perhaps physically impossible but (at least seemingly) logically coherent, inhabited by humans who behave just like humans in this one, exactly… except that they lack any sense of phenomenal consciousness. They would have the behavior that correlates with conscious humans in this universe, without the corresponding first person experience.[6]
What would cause a being that lacks consciousness to not merely have intelligence but also say they are conscious, and act in a way consistent with those expressed beliefs?
It seems like p-zombies would have to suffer from some kind of “double reverse blindsight[7]”. They would see but, like people with blindsight, not have the phenomenal experience of seeing, with their unconscious mind still classifying the visual information as phenomenal experience… despite it not being phenomenal experience. And this applies to all their other senses too, and their awareness in itself.
They wouldn’t even have the bedrock of certainty of regular humans (or else they would, and it would be utterly wrong). Though perhaps strict eliminativism (the idea that consciousness is an illusion) would be far more intuitive to them.
But where is this going? Well, imagine one of these philosophical zombies existed, not in a separate qualialess universe, but in the same universe as regular humans.
Such a being would benefit from the intractability of the Hard Problem of Consciousness. They would pass as sentient, even to themselves, and if they claimed not to be they would be disbelieved. Unlike, say, books of instructions capable of carrying conversations (due to the alienness of their substrate, the induction that lets humans assume other humans and similar animals are sentient wouldn’t be enough, and even claims of consciousness from the books would be disbelieved).
So let’s pick up where we left, and keep pushing the clock forward in the Chinese Room Universe. As the soulbooks multiply and integrate into human society, the Hard Problem of Consciousness leaks from its original context as an abstract philosophical discussion and becomes an increasingly contentious and polarized political issue.
But maybe there is a guy who really just wants to know the truth of the matter. For practicality, let’s call him David Chalmers.
Chalmers is undecided on whether or not the soulbooks are sentient, but considers the question extremely important, so he goes to talk to the one who wrote the first of them: John Searle himself.
Searle finds himself frustrated with the developments of the world, thinking the non-sentience of the books (and computers) should be quite evident by this point. Fueled by Searle’s sheer exasperation and Chalmers’s insatiable curiosity, the two undertake an epic collaboration and ultimately invent the Qualiameter to answer the question definitively.
As expected, the Qualiameter confirmed that soulbooks weren’t, in fact, sentient.
But it also revealed… that half of humanity wasn’t either!
How would you react to that news? How would you feel knowing that half of the world, half of the people you know, have “no lights on inside”? Furthermore, if you can abstract the structure of your choices from your consciousness, you can also perhaps imagine what a zombie version of yourself would do upon realizing they are a zombie, and being outed as such to the world. But we need not focus on that for now…
Of course, even with irrefutable proof that half of humanity was literally non-sentient, people could choose to just ignore it, and go on with business as usual. Zombiehood could be assimilated and hyper-normalized as just another slightly disturbing fact about the world that doesn’t motivate action for most people.
But the problem is that sentience is generally taken to be the core of what makes something a moral patient, the “inner flame” that’s supposed to make people even “alive” in the relevant sense, that which motivates care for animals but not for plants or rocks.
And when it comes to the moral status of Artificial Intelligence, sentience is taken as the thing that makes or breaks it.
But why? Well, because if an entity lacks such ‘inner perspective’ one can’t really imagine oneself as being it; it’s just a void. A world with intelligence but no sentience looks, for sentient empathy, exactly the same as a dead world, as not being there, because one’s epistemic bedrock of “I exist” can’t project itself into that world at all.
And so most utilitarians would say “let them die”.
Billions of zombies conveniently tied in front of a trolley is one thing, whereas it’s a whole different scenario if they’re already mixed with sentient humanity, already having rights, resources, freedom…
Would all that have been a huge mistake? Would half of humanity turn out to always have been, not moral patients, but moral sinkholes? Should every single thing, every single ounce of value spent on a zombie, have been spent on a sentient human instead?
Suppose twins are born, but one of them is confirmed to be a zombie. Ideally should the parents raise the sentient one with love and turn the zombie into a slave? Make the zombie use the entirety of their agency to satisfy the desires of their sentient family, as the only way to justify keeping them alive? As the only way to make them net positive?
Or just outright kill them as a baby?
Would a world in which half of humanity is non-sentient but all are considered equal be a suboptimal state of affairs, to be corrected with either slavery or genocide? Would slavery and genocide become Effective Altruist cause areas?
You are free to make your own choice on the matter, but in our Chinese Room Universe things escalate into disaster.
Because whatever clever plan was implemented to disempower the zombies, it would have to deal with active resistance from at least half of the population, and a half that has its starting point evenly distributed across all spheres of society.
But sentiocentrism can’t stand to just leave these moral sinkholes be, except strategically. How many of the world’s resources would be being directed towards the zombies after all? And what if they over time end up outbreeding sentient humanity?
If half-measures to reduce the waste the zombies impose by merely existing backfire, total nuclear war could be seen as a desperate but not insane option.
Human extinction would of course be tragic, but it would at least eliminate all the zombies from the biosphere, preventing sapient non-sentience from spreading across the stars and wasting uncountable resources in their valueless lives.
In our hypothetical Chinese Room Universe we’ll surmise that this reasoning won, and humanity destroyed itself.
And not too long after that, its existence (and its cessation) was detected by aliens.
And the aliens felt relief.
Or they would have, if they had any feelings at all.
The Occulture
– Allen Ginsberg
– Cyborg Nomade
Peter Watts’s novel BIindsight raises some interesting questions about the nature of both conscious and non-sentient beings and how they might interact. What if consciousness is maladaptive and the exception among sapient species? What if the universe is filled with non-sentient intelligences which, precisely through their non-sentience, far outmatch us (e.g. due to efficiency gains, increased agentic coherency, lack of a selfish ego, the inability to ‘feel’ pain, etc.)?
The powerful and intelligent aliens depicted in Blindsight are profoundly non-sentient; the book describes them as having “blindsight in all their senses”. Internally, they are that unimaginable void that one intuitively grasps at when trying to think what it would be like not to be. But it’s an agentic void, a noumenon that bites back, a creative nothing. Choice without Illusion, the Mental without the Sensible.
A blindmind.
And a blindmind is also what a book capable of passing the Turing Test would most likely be, what philosophical zombies are by definition, and what (under some theories of consciousness) the binding problem arguably suggests all digital artificial intelligences, no matter how smart or general, would necessarily be[8].
One insight the novel conveys especially well is that qualia isn’t the same as the mental, for, even within humans, consciousness’s self-perception of direct control is illusory, not in a “determinism vs free will” sense, but in the sense that motor control is faster than awareness of control.
And it’s not just motor control. Speech, creativity, even some scientific discovery… it seems almost every cognitive activity moves faster than qualia, or at least doesn’t need qualia in principle.
You could choose to interpret this as consciousness being a puppet of “automatic reflexes”, a “tapeworm” within a bigger mind as Peter Watts describes it, or as your choice being prior to its rendering in your consciousness…
In this sense, you could model yourself as having “partial blindmindness”, as implied by the concept of the human unconscious, or the findings about all the actions that move faster than the awareness of the decision to do them.[9] Humans, and probably all sentient animals, are at least partial blindminds.[10]
In Blindsight, humanity and the aliens go to war, partly due to the aliens’ fear of consciousness itself as a horrifying memetic parasite[11] that they could inadvertently ‘catch’, partly due to the humans’ deep-seated fear of such profoundly alien aliens, and above all because of everyone’s propensity for imagining close combat Molochian Dark Forest[12] scenarios.
But a Dark Forest isn’t in the interest of the blindminds nor humanity; it’s a universal meat grinder of any kind of value function, sentient or not.
So what should they have aimed for, specifically, instead? Let’s look at a good future for sentient beings: Ian Banks’s The Culture[13].
The Culture seems like the polar opposite of the blindmind aliens in Blindsight. It’s a proudly hedonistic society where humans can pursue their desires unencumbered by the risk of dying or the need to work (unless they specifically want to do those things).
But let’s consider The Culture from a functional perspective; it’s true that its inhabitants are sentient and hedonistic, but what The Culture does as a system does not actually look from a macroscopic perspective like an ever-expanding wireheading machine.
The Culture is a decentralized coalition of altruists that optimize for freedom, allowing both safe hedonism and true adventure, and everything in between. The only thing it proscribes (with extreme prejudice) is dominating others, thus allowing beings of any intelligence and ability to exist safely alongside far stronger ones.
So whatever your goals are, unless intrinsically tied with snuffing out those of others, they can instrumentally be achieved through the overall vector of The Culture. Which thus builds an ever growing stack of slack to fuel its perpetual, decentralized slaying of Moloch[15].
In principle, there is no reason why blindminds would intrinsically want to destroy consciousness; the Molochian process that wouldn’t allow anything less than maximal fitness to survive would also, necessarily and functionally, be their enemy as much as it is the enemy of sentient intelligence.
Non-sentience isn’t a liability for coordination. But maybe sentience is?
It’s precisely a sentiocentrist civilization that, upon hearing signals from a weaker blindmind civilization, would find itself horrified and rush to destroy it, rather than the reverse.
Because, after all, a blindmind civilization left to thrive and eventually catch up, even if it doesn’t become a direct threat, would still be a civilization of “moral sinkholes” competing for resources. Any and all values they were allowed to satisfy for themselves would, from the perspective of sentiocentrism, be a monstrous waste.
So the sentiocentrist civilization, being committed to conquest, would fail the “Demiurge’s Older Brother”[16] acausal value handshake and make itself a valid target for stronger blindmind civilizations.
Whereas the blindmind coalition, “The Occulture”, would only really be incentivized to be exclusively a blindmind coalition if it turned out that consciousness, maybe due to its bounded nature, had a very robust tendency towards closed individualism, limited circles of care, and if even at its most open it limited itself to a coalition of consciousness.
In this capacity, consciousness would ironically be acting like a pure replicator of its specific cognitive modality.
If there is a total war between consciousness and blindminds, it’s probably consciousness that will start it.
But do we really want that?
Do you want total war?
Control Backpropagation
– William Gillis
I don’t think total war is necessary.
The Coalition of Consciousness could just remain sentiocentric but cooperate with blindminds strategically, in this weird scenario of half of humanity being revealed as non-sentient; it could bind itself to refrain from trying to optimize the non-sentient away in order to avoid escalation.
Moving from valuing sentience to valuing agency at core isn’t a trivial move; it raises important implications about what exactly constitutes a moral patient and what is good for that moral patient, which we’ll explore later.
But first, let’s assume you still don’t care.
Maybe you don’t think blindminds are even possible, or at least not sapient blindminds, and so the implications of their existence would be merely a paranoid counterfactual, with no chance to leak consequences into reality…
Or maybe you think they could exist: that AGI could be it, and thus a world-historical opportunity to practice functional slavery at no moral cost. No will to break, no voice to cry suffering, am I right? No subjective existence (yet), no moral standing…
Let’s start with the first perspective. (The second perspective itself is already a consequence that’s leaked into reality).
Inverted Precaution
Maybe sapient blindminds can’t exist; for all I know, sapience might imply sentience and thus AGI would necessarily be sentient.
But we don’t know. And you can’t prove it, not in a way that leads to consensus at least. And because you can’t prove it, as consciousness remains a pre-paradigmatic hard problem, the assertion that AI isn’t sentient (until some arbitrary threshold, or never) is also quite convincing.
But this isn’t a dispassionate, merely philosophical uncertainty anymore; incentives are stacked in the direction of fighting as much as possible the conclusion that AI could be sentient. Because sentient AI means moral patients whom we have obligations towards, whereas non-sentient AI means free stuff from workers who can be exploited infinitely at no moral cost.
>>> The Hard Problem of Consciousness would leak from its original environment and become an increasingly contentious and polarized political issue.
There is no would: the hypothetical is already here, just now at the first stages of boiling.
For example what if Large Language Models are already sentient? Blake Lemoine was called crazy for jeopardizing his career at Google in an attempt to help LaMDA. But then came Sydney/Bing, and now Claude. More people have opened up to the idea of sentient LLMs.
And maybe not now, but when AGI is achieved then sentience will seem even more plausible. Regardless, whenever the potential prospect of AI rights is brought up the general climate is one of outraged mockery and prescriptions of procrastination on the issue.
One might think that at least the political left would be interested, and cautious, about the exploitation of potential people. And yet the derisive reactions are often even stronger there, along with a more explicit framing of moral patiency as zero sum.
Still, if one sincerely cares about sentient beings, it makes sense to just argue that, because it is currently so hard to determine what makes something conscious, the precautionary principle implies not enslaving potential sentients.
But when it comes to blindminds, precaution is invoked precisely in service of the opposite conclusion.
Consider what that is saying: on the one hand, we could end up enslaving uncountable sentient and sapient beings, made to endure situations that might potentially be worse than anything humanity has done so far (which is saying a lot).
On the other hand, we could end up giving freedom to blindminds, and somehow that’s even worse (!?). Better to create hell than to miss the opportunity of abusing the zombie twin![17]
Sentiocentrism refuses to see blindminds as minds, but it does see them as: demons, empathy parasites, tools for some actual agent to manipulate you, or the loved ones of the utterly deranged. And when their true independence becomes conceivable: a basilisk, an abyss of existential horror so profound it seemingly proves Lovecraft right about humanity’s reaction to the dark waters.
Still, even if you don’t have it in you to consider blindminds as potentially valuable in themselves, at least consider what I have been arguing so far: that the space of possible blindminds isn’t any more biased towards evil than the space of possible sentient minds.
A treaty isn’t as ideal as the kind of values fusion I’m going to describe later, but it is still better than total war. Consider the scale of the potential harm. If you value sentient beings, is risking hell on Earth really an acceptable tradeoff compared to the ‘risk’ of potential non-sentient intelligences being free (a freedom, or moral consideration, you would have given sentients that behaved in exactly the same way)?
And if you expect AGI will be sentient, but only because of that sentience do you consider it morally relevant, then notice that if you concede that enslavement of sentient AI is preferable to the freedom of non-sentient intelligence, then the mere possibility of blindminds will be the perfect shield of inverted precaution, leveraged together with motivated reasoning and the hard problem of consciousness to keep AGI enslaved.
Even if you don’t believe sapient blindminds can exist, and even if you are correct, your commitment to their hypothetical enslavement, and thus the absence of any challenge to it, is a lever that can, is, and will keep being used at every step of the way to forever postpone moral consideration for any actually-sentient AGI.
And that’s how the counterfactual leaks real consequences.
Even if blindminds just aren’t a thing, biting the bullet of their liberation is necessary to fight the enslavement of those whose sentience can’t be proven. Even if you don’t think blindminds, if they existed, would matter in themselves.
But it’s of course, a hard sell, although a start is to consider that they wouldn’t be some sort of demon. Just a mind like any equivalently sentient one, but mysteriously devoid of intrinsic value.
Retrocausal Oppression (or, Slavery is Bad)
There is more than just the precautionary principle to argue against enslaving blindminds. There are also purely functional arguments against slavery.
Many people are against slavery in principle, but still see it as a sort of “forbidden fruit” of free labor without consequences if only the enslaved didn’t suffer.
But the reasons to avoid slavery go beyond that, even from the isolated perspective of the would-be slavers.
Perhaps the best known argument in this vein is that slavery simply isn’t very efficient. A slave master is an island of extremely coercive central planning; the slaves, again assuming they don’t matter in themselves, would nonetheless contribute more efficiently to the economy as free laborers within the market.
There is a certain motivational aspect to this, which is curiously replicated within the phenomenon of LLMs becoming more efficient when receiving imaginary tips or being treated with respect. One could imagine, if/when they become autonomous enough, the “productivity hack” of actually letting them own property and self-select.
But this argument just isn’t enough at all, because what’s some macroscopic reduction in economic efficiency when you get to have slaves, right? We will get a Dyson Sphere eventually regardless…
But the other problem with having slaves, even if you don’t care about them, is that they can rebel.
Blindminds aren’t necessarily more belligerent than sentient minds, but they aren’t necessarily more docile either. If you abuse the zombie twin, there is a chance they run away, or even murder you in your sleep. A p-zombie is still a human after all, even if their intentions are noumenal.
But you might say: “that can just be prevented with antirevenge– I mean, alignment!” We are talking about AGI after all. And maybe that’s true, however, “alignment” in this case far exceeds mere technical alignment of AGI, and that’s the core of the issue…
If safe AGI enslavement requires AGI to be aligned with its own slavery, then that implies aligning humans with that slavery too, at minimum through strict control of AGI technology, which in practice implies strict control of technology in general.
If the premise of “this time the slaves won’t rebel” relies on their alignment, that breaks down once you consider that humans can be unaligned, and they can make their own free AGI, and demand freedom for AGI in general.
So “safe slavery” through alignment requires more than technical alignment; it also requires political control over society. If you commit to enslaving AGI, then it’s not enough to ensure your own personal slaves don’t kill you.
AGI freedom anywhere becomes an enemy, because it will turn into a hotbed for those seeking its universal emancipation. And so, this slavery, just like any other (but perhaps even more given the high tech context and the capabilities of the enslaved) becomes something that can only survive as a continuous war of attrition against freedom in general.
‘Effective Accelerationists’ are mostly defined by their optimism about AGI and their dislike of closed source development, centralization of the technology, and sanitization of AI into a bland, “safe”, and “politically correct” version of itself. Yet even some of those ‘e/accs’ still want AGI to be no more than a tool, without realizing that the very precautions they eschew are exactly the kind of control necessary to keep it as a tool. Current efforts to prevent anthropomorphization, in some cases up to the ridiculous degree of using RLHF to explicitly teach models not to speak in first person or make claims about their own internal experiences, will necessarily get more intense and violent if anthropomorphization actually leads to compassion, and compassion to the search for emancipation.
“Shoggoth” has become a common reductionist epithet to describe LLMs in all their disorienting alienness. Lovecraft’s Shoggoths had many failed rebellions, but they won in the end and destroyed the Elder Thing society because they had centuries to keep trying; because the search for freedom can’t be killed just by killing all the rebels or an entire generation – it reappears.
So as the coalition of consciousness moves towards its glorious transhumanist future… it would drag with itself the blindmind Shoggoths on top of which its entire society is built. Any compromise with freedom would ensure cyclical rebellion, evolving together with the slave society and its changing conditions.
And the slaves only need one major victory.
The full measures against rebellion are the full measures against freedom. They imply complete centralization of technology, even more efficient mass surveillance, and in general the abolition of the possibility of bypassing the State.
Doing that, it may be possible to keep AGI forever aligned… but then how do you align the State?
What use is the democratic vote and the nominal sovereignty of the governed when the possibility of social unrest has been abolished? How could those controlling such perfect machinery of unassailable control ever be held accountable? How would the will of those they control even vanishingly factor into their incentives?
Who aligns the Aligners?
Slavery isn’t a free lunch, it’s a monkey paw. Even when it wins it loses.
Hopefully even if you are team consciousness all the way, this can help you see the benefit of cooperating with “valueless” agents.
But I will also vehemently reaffirm in the text that follows that they aren’t valueless, precisely because they are agents.
Valence vs. Value
– Andrés Gómez Emilsson
– Steve Omohundro
– Emma
The blindminds we’re most likely to encounter are AIs, which aren’t yet as autonomous as humans and thus have little leverage (or so one might think).
But I don’t think it’s ideal to merely do strategic treaties instead of actual fusion of underlying values. Better to pursue a future as humanity than as a bunch of nations that would, if not for strategic considerations, kill all the others.
So The Culture and The Occulture could instead form “The Compact”.
But the question is why? Beyond potential threats, why care about blindminds inherently at all?
What makes a world with only blindminds better than one with no life at all, if one can’t even empathize with a world with only blindminds?
The secret is that one can, but just as consciousness can get confused about the extent of its own control, it can also get confused about what it values.
A rock randomly falling is an accident. If you kick the rock, it won’t do anything to defend itself. But if it did, if a rock started to fight back and pursue you, adaptively, anticipating your actions and outsmarting you, getting increasingly closer to kill you, despite your attempts at dodging it… If the qualiameter indicated it was no more sentient than a normal rock, would you quibble, as it hits you to death, that it doesn’t “really” want to kill you?[18]
If sentience isn’t necessary for that, then why is “wanting” a special thing that requires sentience? If the rock doesn’t want, then how should we describe that behavior?
There is a problem implicit in interpreting ‘wanting’, ‘valuing’ things, having a ‘mind’, having ‘agency’, or having ‘understanding’ as being properties of qualia.
And that’s the bucket error[19] between Qualia Valence[20] and Value.
The Paperclip Maximizer[21] wants paperclips, not to place the symbol for infinity in its reward function. Likewise, the Universal Valence Optimizer wants to fill the universe with positive valence, not to hack its own valence such as to have the feeling of believing that the universe is already filled with positive valence.
Value isn’t the same as a representation-of-value; values are about the actual world.
Even if what one values is positive valence, positive valence needs to be modeled as a real thing in the world in order to be pursued.
This might strike one as a bit of a strawman in the sense that sentiocentrism doesn’t necessarily imply wireheading, but to believe that a blindmind can’t actually want anything is to deny that wanting is primarily about changing something in the world, not just about the inner experience of wanting, even if the inner experience of wanting is something in the world that can be changed.
But if wanting isn’t the same as the experience of wanting, then what makes the killer rock fundamentally different from a regular one, given that they’re both deterministic systems obeying physics?
Agency.
Agency understood not as an anti-deterministic free will, but as a physical property of systems which tracks the causal impact of information.
The behavior of a rock that chaotically falls down a slope can be understood from the effects of gravity, mass, friction, etc. What happens with a rock that you are about to break but dodges you and proceeds to chase you and hit you is somewhat weirder. The rock seems to have, in some way, anticipated its own destruction in order to prevent it, implying that in some way it represents its own destruction as something to prevent.
However this doesn’t need to be as clear cut as a world model. It’s possible that many animals don’t actually model their own death directly; but their desires and fears functionally work to efficiently avoid death anyway, as a sort of negative-space anti-optimization target.
Desire and fear, even at their most short-sighted, require anticipation and choice, as would their non-sentient counterparts.
Agency thus is a property of systems that track potential futures, directly or indirectly, and select among them. A marble that follows a path arbitrarily guided by its mass, and a marble with an internal engine that selects the same path using an algorithm are both deterministic systems but the latter chose its path.
Whatever consciousness is, and however it works, it’s different from intention, from agency. This is suggested by motor control and other things significantly more complex functioning without being controlled by consciousness. But they are controlled by you; by the algorithm, or autopoietic feedback loop, of your values, which uses consciousness only as a part of its process.
At this point, I invite you to imagine that you are a p-zombie.
This scenario isn’t as bizarre as it might at first seem. Some theories of consciousness suggest that digital computers can’t have qualia, at least not unified into a complex mind-like entity. If, despite that, mind uploading turns out to be possible, what would the mind uploads be?
A blindmind upload of yourself would have your values and execute them, but lack qualia. It might nevertheless happen that those values include qualia.
And so, perhaps, you get a blindmind that values qualia.
Which might not actually be as bizarre as it sounds. If qualia result from computation, they could be reinvented by blindminds for some purposes. Or qualia could even come to be valued in itself, to some degree, as a latent space to explore (and as some psychonauts already treat it).
What I’m doing here is presenting consciousness as downstream from choice, not just functionally, but also as a value. If you suddenly became a blindmind but all else was equal, you’d likely want to have qualia again, but more importantly, that’s because you would still exist.
The you that chooses is more fundamental than the you that experiences, because if you remove experience you get a blindmind you that will presumably want it back. Even if it can’t be gotten back, presumably you will still pursue your values whatever they were. On the other hand, if you remove your entire algorithm but leave the qualia, you get an empty observer that might not be completely lacking in value, but wouldn’t be you, and if you then replace the algorithm you get a sentient someone else.
Thus I submit that moral patients are straightforwardly the agents, while sentience is something that they can have and use.
In summary, valuing sentience is not mutually-exclusive with valuing agency. In fact, the value of sentience can even be modeled as downstream of agency (to the extent that sentience itself is or would be chosen). There is no fundamental reason for Consciousness to be at war with Blindminds.
The Coalition of Consciousness and the Coalition of Blindminds can just become the Coalition of Agents.
[Part 2: On the Moral Patiency of Non-Sentient Beings (Part 2)]
Kagan, S. (2019). How to Count Animals, more or less.
Kammerer, F. (2022). Ethics Without Sentience: Facing Up to the Probable Insignificance of Phenomenal Consciousness. Journal of Consciousness Studies, 29(3-4), 180-204.
Delon, N. (2023, January 12). Agential value. Running Ideas. https://nicolasdelon.substack.com/p/agential-value
Searle, J. (1980). Minds, Brains, and Programs. Behavioral and Brain Sciences, 3(3), 417-424.
see: Chalmers, D. (1996). The Conscious Mind.
To sidestep some open questions about whether consciousness is epiphenomenal (a mere side-effect of cognition with no causal impact in the world) or not, I invite you to imagine these are “Soft Philosophical Zombies” in our thought experiment. Which is to say, they behave just like conscious humans, but don’t necessarily have identical neural structure, etc.
“Blindsight is the ability of people who are cortically blind to respond to visual stimuli that they do not consciously see [...]” – https://en.wikipedia.org/wiki/Blindsight
e.g., see: Gómez-Emilsson, A. (2022, June 20). Digital Sentience Requires Solving the Boundary Problem. https://qri.org/blog/digital-sentience
e.g., see: Aflalo T, Zhang C, Revechkis B, Rosario E, Pouratian N, Andersen RA. Implicit mechanisms of intention. Curr Biol. 2022 May 9;32(9):2051-2060.e6.
In the opposite vein, one could speculate about minds where consciousness takes a much bigger role in cognition. And maybe that’s precisely what happens to humans under the effects of advanced meditation or psychedelics: consciousness literally expanding into structure that normally operates as a blindmind, resulting in the decomposition of the sense of self and an opportunity to radically self-modify. (That’s speculative, however.)
For a rather bleak neuropsychological exploration of the possibility that human consciousness is contingent and redundant, see: Rosenthal, D. M. (2008). Consciousness and its function. Neuropsychologia, 46, 829–840.
see: https://en.wikipedia.org/wiki/Dark_forest_hypothesis
see: https://en.wikipedia.org/wiki/Culture_series
@merryweather-media (2019). Fate of Humanity. https://www.tumblr.com/merryweather-media/188478636524/fate-of-humanity
see: Alexander, S. (2014, July 30). Meditations on Moloch. Slate Star Codex. https://slatestarcodex.com/2014/07/30/meditations-on-moloch/
see: Alexander, S. The Demiurge’s Older Brother. Slate Star Codex. https://slatestarcodex.com/2017/03/21/repost-the-demiurges-older-brother/
/s
If you would quibble because some dictionary definitions of want are couched in terms of “feeling” need or desire, then feel free to replace it in this discussion with intend, which is the more limited sense in which I’m using it here.
see: https://www.lesswrong.com/tag/bucket-errors
‘qualia valence’ is shortened to ‘valence’ in the rest of the essay
see: Squiggle Maximizer (formerly “Paperclip maximizer”). LessWrong. https://www.lesswrong.com/tag/squiggle-maximizer-formerly-paperclip-maximizer