Alicia Vikander as the android Ava in *Ex Machina* (2014).

This is part 1 in a 5-part series entitled Conscious AI and Public Perception, encompassing the sections of a paper by the same title. This paper explores the intersection of two questions: Will future advanced AI systems be conscious? and Will future human society believe advanced AI systems to be conscious? Assuming binary (yes/no) responses to the above questions gives rise to four possible future scenarios—true positive, false positive, true negative, and false negative. We explore the specific risks & implications involved in each scenario with the aim of distilling recommendations for research & policy which are efficacious under different assumptions.
Read the rest of the series below:
Introduction and background: Key concepts, frameworks, and the case for caring about AI consciousness (this post)
AI consciousness and public perceptions: four futures
Current status of each axis
Recommended interventions and clearing the record on the case for conscious AI
Executive summary (posting later this week)
This paper was written as part of the Supervised Program for Alignment Research in spring 2024. We are posting it on the EA Forum as part of AI Welfare Debate Week as a way to get feedback before official publication.

Abstract

We stand at a critical moment in history: the decisions that we make today can have an outsized impact on the welfare of not only future humans, but also, possibly, future AI. Our investigation takes as its starting point two questions that are fundamental to the moral status of AI systems (“AIs”):

Metaphysical: Will future advanced AI systems be conscious?
Epistemic: Will future human society believe advanced AI systems to be conscious?

Assuming binary (yes/no) responses to the above questions gives rise to four possible future scenarios. In the positive scenarios (true positive and false positive), future humans believe that future AI systems are conscious. Only in the true positive scenario is this belief actually correct. Whereas, the negative scenarios (false negative and true negative) are characterised by widespread disbelief in the consciousness of future AI systems. The false negative scenario constitutes possibly the worst case scenario, in which humans fail to recognise AIs as conscious.

In this position paper, we explore the specific risks & implications involved in each scenario with the aim of distilling recommendations for research & policy which are robustly efficacious under different assumptions (i.e. given uncertainty about which scenario we are in).

1. Introduction: Why care about AI consciousness?

1.1 AI consciousness concerns all of us

Artificial intelligence continues to surprise us with capabilities once thought to be impossible for machines. Today, AI systems (henceforth, “AIs”) assist scientists at the forefront of scientific research. They drive cars & create works of art & music. People engage in increasingly complex conversations with AI chatbots on a daily basis. Some AIs even provide social & emotional companionship. This raises the question: what’s next?

Today, many people think that AI could never be conscious (Pauketat, Ladak, & Anthis, 2023). But could that change in the near future? Could it only be a matter of time before we build conscious AI– & are we ready for such a future? At first, the notion that machines might be conscious or merit moral consideration might seem outlandish or fanciful– the stuff of science fiction. One might reasonably ask “But aren’t they just machines? Why should we even be considering this?”

We think there are a few reasons to take this possibility seriously. Today, researchers & technologists around the world are working on different projects both directly & indirectly related to conscious AI^[1] (Huckins 2023). Experts who are engaged in such projects see enormous promise in conscious AI. Some think that conscious AI may have superior functionality (e.g. problem-solving ability) and could be capable of more fluid interactions with humans. Others believe that machines that are able to feel & be empathic could be better able to understand human values. If this is right, conscious AI might be safer. Still others see the challenge of building conscious AI as a crucial part of unravelling the deep mysteries of the mind.

While promising, these potential benefits fall short of an open-&-shut case for building conscious AI. We think that, before committing to building conscious AI, we ought to be reasonably sure that it’s even a good idea– that the benefits outweigh the risks. To this end, this report aims to clarify the pros & cons of building conscious AI. To summarise, our conclusions are as follows:

The benefits of building conscious AI are poorly defined & speculative at best.
There are significant risks associated with building conscious AI.

In other words, we think that the attendant risks far outweigh the benefits of building conscious AI. Like many other AI-related risks, the issues posed by AI consciousness present a Collingridge dilemma (2000). On the one hand, the possibility, likelihood, & impact of conscious AI (if any) is exceedingly difficult to forecast (§4.12; §1.2). At the same time, if & when conscious AI is actually created, it will be hard to “put the genie back in the bottle”. It will be hard for humans to maintain control (especially if conscious AI is also at or above human-level intelligence; §3.22)– to say nothing of reverting back to a state prior to conscious AI (that is, without committing some form of violence). Bearing these considerations in mind, we believe the most prudent course of action is fundamentally preventative: we should not build conscious AI– at least until we have a clearer grasp of the attendant promises & perils.

Our analysis in this paper takes as its starting point two questions that are fundamental to the moral status of AIs:

Metaphysical: Will future advanced AI systems be conscious?
Epistemic: Will future human society believe advanced AI systems to be conscious?

(a) is an metaphysical question because it concerns the actual state of affairs with respect to whether future AIs are conscious. (b) is an epistemic question because it concerns future humans’ beliefs about whether future AIs are conscious. Assuming binary (yes/no) responses to these questions gives rise to four future scenarios (table 1, below):

		[Metaphysical] Will future advanced AI systems be conscious?
		Yes	No
[Epistemic] Will future human society believe advanced AI systems to be conscious?	Yes	True positive Advanced AI correctly recognised as moral patients	False positive Advanced AI incorrectly recognised as moral patients
	No	False negative Advanced AI incorrectly disregarded as moral patients	True negative Advanced AI correctly disregarded as moral patients

Table 1, THE 2D FRAMEWORK: 2 × 2 matrix depicting possible future scenarios of AI consciousness and societal beliefs.

This “2D framework” comprises the conceptual basis of this position paper. The 2D framework shows how our beliefs about AI consciousness may or may not align with the actual facts thereof. Herein lies the potential for risk. At present, we lack reliable means of discerning whether AIs really are conscious. This is especially troubling, as many commentators have warned^[2]. Jeff Sebo says that we risk “sleepwalking into a major moral catastrophe” (2023). Saad & Bradley describe research on digital minds as “morally treacherous” (2022). If we fail to recognise AIs as conscious moral patients (as in the false negative scenario), then we risk inflicting suffering on a scale never before seen in history, possibly exceeding even that of non-human animals (used in e.g., agriculture, industry, & scientific research; Dung 2023a).

Whereas, if we incorrectly regard AI as deserving of moral consideration when they actually do not (because they are not actually conscious, as in the false positive scenario), then we may inadvertently disenfranchise legitimate moral patients. Indeed, continued efforts towards conscious AI court several significant risks apart from AI suffering (§3.2)– & hence imperil diverse stakeholders, including humans, as well as animals & wildlife.

The great challenge is to pinpoint where exactly on the 2D framework we stand, & where we are headed– as well as where we want to go. By exploring this framework, we hope to support decisionmakers in preparing and planning for multiple possibilities, as well as help to identify key factors and uncertainties that ought to be monitored by watchdog entities (e.g. non-profit organisations, ethics boards). Ultimately, it is our hope that the present report will help diverse stakeholders to come together in steering towards a positive future.

1.2 AI consciousness is a neglected issue

At present, we may not be able to tell whether in the future we will develop conscious AI. But what we can conclude is that, today, no one– not a single political party or ethics committee– is standing up for conscious AI (Metzinger 2021a). Despite growing initiative among researchers to take AI consciousness-related risks seriously, our current theoretical, cultural, & legal frameworks are critically ill-prepared^[3] to accommodate the genesis of conscious AI. This widespread neglect can be attributed to three factors:

Unpredictability: Experts disagree over how likely or imminent AI consciousness is, or even whether it is possible at all (Metzinger 2021b; see also §2.12). This state of radical uncertainty is discussed further in (§4.12). Difficulties assessing the extent & ways in which AIs may be conscious (§4.122) translate into ambiguities concerning its moral status (§2.11; Hildt 2022). The welfare of future conscious AI populations is thus doubly discounted (Ramsey 1928; Harrison 2010): in the first place, because they do not yet exist, &, in the second place, because it is not yet known whether they can exist at all^[4].
Uninformed policymakers & populace: Both AI & consciousness are rather technical subjects. Understanding frontier problems in each of these fields (to say nothing of their intersection) requires specialised knowledge which is not commonly shared by lawmakers & the general public. These knowledge gaps may lead to ignorance of, or misconceptions about AI consciousness-related risks, or else the perception that such risks are “merely” speculative. In (§4.2), we discuss the general public’s attitudes towards conscious AI.
Other pressing concerns: AI-related regulatory challenges must compete with other urgent issues such as economic crisis, global conflict, & environmental collapse. Even among the gallery of AI-related risks, AI suffering tends to be superseded by other risks that are judged to be more proximal (discrimination & bias, misinformation & disinformation, worker displacement due to automation), more realistic/prosaic (usage of AI technologies to enforce totalitarian rule, harms to humans due to misalignment), or otherwise closer to human interests^[5].

Needless to say, we do not mean to dismiss the reality or urgency of other risks– whether or not they are related to AI. Our only intention is to point out that risks related to AI consciousness currently command little political concern. As a matter of fact, calls to actively take steps to prepare for AI consciousness-related risks are typically either met with indifference or dismissed as alarmist– if they are even seriously considered at all. It is our hope that this position paper will advance the conversation on AI consciousness, preparing decisionmakers across the spectrum for critical & nuanced dialogue.

1.3 Structure

Having motivated our overall approach, we now turn to laying out the groundwork for our investigation. To begin with (§2), we provide background on the discussion on AI consciousness & moral status. We introduce key concepts & terminology & specify the scope of the discussion. Afterward (§3), we provide a more thorough introduction to the foregoing 2D framework and identify the most significant risks associated with conscious AI. In (§4), we make efforts to pinpoint our location and trajectory within the 2D framework. Finally (§5), we propose practical strategies for risk reduction given uncertainty about which scenario we are actually in.

2. Background: key concepts & frameworks

2.1 Pathocentrism: the link between AI consciousness & AI moral status

First & foremost, we introduce a few basic building blocks of our approach. The simple question with which we start is this:

What makes an AI system a moral subject^[6]?

According to the standard^[7] account, called pathocentrism (Metzinger 2021a), a being’s moral status depends upon its capacity to suffer. This means that at the very least, in order to receive moral consideration, AI systems must be conscious or sentient.

Consciousness (or “phenomenal consciousness”; Block 1995) is the capacity for subjective experience, or the ability to appreciate qualitative properties, such as what it’s like to see red (Jackson 1982; cf. Robinson 1982), to taste sweetness, to be a bat (Nagel 1974), etc.
Sentience (or “affective sentience”; Powell & Mikhalevich 2021) is the capacity to have good (”positively valenced”) or bad (”negatively valenced”) experience– i.e., to feel pleasure & pain (Browning & Birch 2022).

Theorists disagree as to whether consciousness or sentience is more fundamental– whether it is possible to be conscious but not sentient, or vice versa (Dawkins); or whether consciousness or sentience is what really matters to moral patienthood (Ladak 2023, Millière 2023, 01:06:45; Shepherd 2018; Shepherd 2024). Furthermore, there is little consistency on how these terms are applied in the literature. Some authors use them interchangeably (Chalmers 2022, 2-3), others don’t (Ladak 2021). Herein, we strive to remain neutral on these debates. While we do opt to use the term consciousness, we do not believe that this implies a commitment to its being the true ground of moral status^[8]. We are confident that many of our conclusions will still stand even if sentience, rather than consciousness, turns out to be the principal criterion of moral status.

The flagship appeal of the pathocentric rubric is that it enables ethical & legal frameworks to draw from philosophical & scientific investigations of consciousness. While this connection can be advantageous, it is not also without serious drawbacks– one cannot pick & choose what to inherit from the philosophy & science of consciousness. In order to be of any practical guidance, pathocentrism requires the resolution of a long-standing philosophical & scientific problem: how to determine which sorts of beings (organisms, AIs) are conscious [in the morally relevant sense(s)] (Allen & Trestman 2024). Suffice to say, this is no easy task. In the next section, we discuss a qualifying issue: whether or not it is even possible for AIs to be conscious at all. Afterward, we briefly canvas reasons for & against building conscious AI.

2.2 Is it even possible for machines to be conscious?

In order to even consider the possibility that AI can be conscious at all, it is necessary to endorse some degree of substrate neutrality (Butlin et al 2023; Bostrom 2003; Metzinger 2022?) or «hardware independence»– what in philosophy has been called “multiple realisability” (Putnam 1967; cf. Coelho Mollo forthcoming): the view that different kinds of things can be conscious regardless of what they are made of (e.g. carbon, silicon, etc.) or whether they are living or nonliving. The opposite of this view is biological essentialism: the view that only living things can be conscious^[9] (Peter Godfrey-Smith 2023; Seth, 2023; Aru et al 2023). This may be because certain biological structures (e.g. neuroprotein; Searle 2000; Boden 2016) or processes (e.g. metabolism) are necessary for consciousness to arise (Young & Pigott, 1999; Sebo & Long 2023). Strong biological essentialist views categorically rule out conscious AI.

In this work, we adopt a position closer to substrate neutrality: we remain open to the possibility that AIs may suffer & hence even merit moral consideration^[10]. There are two reasons for this. First, while it is true that everything that we know best to be conscious are living things (e.g. humans and other animals), there is no obvious reason why the structures or functions that make living things conscious couldn’t also be realised in artificial, non-living things like AI^[11] . Second, versions of the substrate neutrality doctrine are the prevailing orthodoxy across contemporary philosophy & psychology. Many researchers studying human, animal, & artificial minds subscribe to views like functionalism, which permit artificial consciousness.

2.3 AI suffering: a significant consideration against building conscious AI

Suppose we grant that it is in principle possible for AIs to be conscious. What then? If we create conscious AIs, how likely is it that they will suffer? As it happens, conscious AI may be subjected to a range of adverse conditions in the service of human interests^[12]. Examples include^[13]:

Torturous scientific/medical experimentation: Conscious AI may be used to simulate psychiatric conditions to model their long-term course (Metzinger 2021b).
Enslavement: Conscious AI may be forced into labour with neither compensation nor rest (Bryson 2010?).
Caregiver stress: Conscious AI may perform caretaker roles, such as AI companions & therapists. In this capacity, AI may experience significant stress due to constant exposure to negative thoughts & emotions, the emotional labour required to perform this role, & the parasocial^[14] nature of AI caretakers (§2.314).
Abuse for entertainment: Conscious AI may be exploited for amusement.

Humans may cause AIs to suffer through either malice, prejudice, indifference, or pure ignorance. As with animals, humans may recognise that AI is conscious, but not care enough to make trade-offs to alleviate their suffering (Anthis and Ladak, 2022; §4.211). We may judge AI’s moral status to be lower than that of humans, animals, &/or living things in general, & discount their interests accordingly. Alternatively, AI suffering may be wholly inadvertent. Humans may be genuinely ignorant of the fact that AI is actually conscious (e.g. because our methods of testing for consciousness are not sufficiently sensitive; §4.121).

2.31 The precautionary principle

These latter worries underwrite a precautionary approach to conscious AI. First proposed in the context of animal welfare (Birch 2017), the precautionary principle prescribes a permissive approach to identifying conscious beings & recognising them as moral patients. This means that we need not be certain nor even confident that an AI is conscious in order for it to merit moral consideration. Under the precautionary rubric, AIs qualify for moral consideration if there is a non-negligible chance^[15] that they are actually conscious (Sebo & Long 2023).

Many theorists believe that it’s better to err on the side of caution when attributing consciousness (Birch 2017): we are liable to do more harm by failing to recognise an AI system as conscious (a false negative) than by incorrectly attributing consciousness to it (a false positive). We can explicate this reasoning in terms of an equation such as the following:

net suffering risk = P(C) * n * d

where

P(C) expresses the probability that a type of AI system is conscious in a certain sense,

n the size of the relevant AI population, &

d the degree of suffering to which they might be subjected. Notably, even if P(C)

is low, both

n & d

may be high (Ladak 2021; Sebo 2023 on the “rebugnant conclusion”; Dung 2023a). Future technologies could make it cheap & feasible to create many copies of digital beings (Shulman & Bostrom 2021). As a result, future AI populations may number in the billions, &, moreover, they may be subjected to adverse conditions. As previously mentioned, researchers may create millions of AI models to simulate the clinical course of depression & to study its effects (Metzinger 2021b). In any case, even if there is a small chance^[16] that AI systems are capable of suffering, we should extend them moral consideration (Sebo & Long 2023).

2.4 Which AI systems? What sort of moral responsibilities?

Today, there exists a variety of AI systems which differ substantially in their functional architectures, physical embodiments, & capabilities. In the future, we can expect an even greater diversity. On top of this, there is emerging consensus that consciousness itself may admit of multiple qualitatively distinct dimensions^[17] (Birch et al 2020; Ladak 2021), such that it would not make sense to speak of different creatures as “more” or “less” conscious. Like animals, AI systems may well be conscious in different ways^[18]. Thus, there is no single question of AI consciousness.

Likewise, there is no single question of AI moral status (Hildt 2022; cf. Grimm & Hartnack 2013). In virtue of the above-mentioned differences, different AI systems are also likely to present diverse needs & interests. To quote Shulman & Bostrom (2021; emphasis ours):

“Digital minds come in many varieties. Some of them would be more different from one another than a human mind is to that of a cat. If a digital mind is constituted very differently than human minds, it would not be surprising if our moral duties towards it would differ from the duties we owe to other human beings; and so treating it differently need not be objectionably discriminatory.”

If anything, the discussion of AI rights^[19] & protections will likely have to be relativised to different broad categories of AI systems– much like how today, there exists a diversity of legal entities with unique sets of privileges &/or responsibilities (Compare, e.g. an adult human, an unborn foetus, a corporation, a lobster, & a chimpanzee). While the precise nature of AI rights (& possibly, responsibilities^[20]) might depend upon future technological advances & societal developments, we are arguably at a point where we can & should be thinking about the possibility & general form of such concessions.

^
VanRullen advocates for publicly funded research into AI consciousness (quoted in Huckins 2023). He worries that if conscious AI is first developed in a private research environment, it may not enjoy the protection of public scrutiny. In order to evade regulation, developers may suppress evidence of conscious AI [§].
^
In June 2023, 140 leading researchers signed an open letter published by the Association for Mathematical Consciousness Science calling for a more responsible approach to developing conscious AI.
^
The philosopher Thomas Metzinger has criticised (2021b) current AI policy for being industry-dominated, inadequate, & myopic. Current regulations largely ignore long-term issues such as artificial general intelligence (AGI), morally autonomous AI, & AI suffering.
^
To discount the welfare of future populations means to assign less importance to their well-being when making decisions in the present. This is important because if the welfare of future populations was weighted equally to that of present populations, then the former’s interests would outweigh the latter’s (assuming that future populations outnumber present populations). We speculate that the welfare of conscious AI populations are discounted because they inhabit the future & because their existence may not even be possible.
^
In their (2023) overview of catastrophic AI risks, Hendrycks et al do not even mention AI suffering. Even Pause AI, which tends to adopt a more grave perspective on AI-related risks, places AI suffering as the last item on their list of AI-related risks.
^
A moral subject, or moral patient, is an individual or thing that merits moral consideration. It is someone or something whose welfare or interests matter. By contrast, a moral agent is an individual who bears moral responsibility & is capable of moral reasoning. The actions of a moral agent ought to take into account the interests of moral subjects. Wallach & Allen (2009) provide a systematic inquiry into artificial moral agents. Some things are only moral patients (e.g. infants, wildlife, nature), while others are both moral patients & moral agents (adult humans). In law, these concepts are closely connected to the concept of personhood.
^
According to our literature review, pathocentrism appears to be the dominant theory of moral status. However, there are competitors. Alternatives to pathocentrism emphasise the moral relevance of other sorts of properties, such as intelligence (Shepherd 2024) or instrumental value to humans (Aristotle 1912; see also Brennan & Norva 2024 on anthropocentric approaches to environmental ethics).
^
There are two reasons for this. First, our focus is determining the likelihood and implications of AI attaining moral patienthood, regardless of whether that is due to their becoming conscious or sentient. Second, even if consciousness is not sufficient for moral status, it is plausible that it is consciousness that poses the main practical barrier to AI moral patienthood (Sebo & Long 2023). That is to say, the gap between non-conscious AI & conscious AI is probably larger than the gap between conscious AI & sentient AI. Thus, for present purposes, we feel that it suffices to treat consciousness as the primary criterion for AI moral status.
Interestingly, Sytsma and Machery (2010) found that non-philosophers were more prone to attribute seeing red to robots than feeling pain. This could suggest that lay people regard subjective experience to be more primitive or technically feasible than affective sentience. However, the interpretation of these results is debated (Sytsma 2014).
^
Today, the notion of animal consciousness may seem self-evident. But this has not always been the case. Famously, Descartes believed that animals were automata devoid of conscious experience– lacking souls, feeling, & the capacity to feel pleasure & pain. The philosopher Bernard Rollins, known as the father of modern veterinary ethics, writes (1989) that, up until the 1980s, veterinary doctrine did not acknowledge that animals could feel pain (i.e. that they were sentient). Analgesics & anaesthesia were viewed by practitioners as mere “chemical restraints” with no significant phenomenological effects– that is to say, their attenuating effects were only behavioural.
^
Assuming pathocentrism is true (§2.11). Of course, it is possible to hold a view somewhere in the middle: that life might be required for some, but not all aspects of consciousness. Whether or not life is necessary for a certain aspect of consciousness is a separate question from whether or not that aspect of consciousness is morally relevant (Hildt 2022). In fact, it may be that machines are ultimately only capable of some forms of consciousness, & that those forms of consciousness do not merit substantial moral consideration. For the time being, however, we believe it is important to be open to all possibilities, including the chance that AIs could be conscious in morally relevant ways (see §4.121iii on the theory-light approach to consciousness).
^
Searle, Chalmers (1995), & other authors have motivated this
^
There may be other sources of suffering that are not caused by humans. The idea that existence, on the whole, is more good than bad (e.g., consists mostly in suffering) is a key theme in major philosophical doctrines such as Buddhism & pessimism (Schopenhauer). These could present additional stressors to conscious AIs. We focus on the sources of suffering that are caused by or related to humans.
^
Many of these examples have been movingly anticipated in science fiction & popular media. These depictions can vividly illustrate the severity of AI suffering. For an example of enslavement, see the Cookie scene from Black Mirror’s White Christmas episode. For an example of caregiver stress, see Mima’s self-destruction in the Swedish film Aniara (2019). For an example of abuse for entertainment, see Westworld (scene).
^
On the prevailing paradigm, relationships with AI caregivers are overwhelmingly one-sided. AI caregivers are not typically allowed to harbour or express their own expectations for relationships with humans. Contrast this with conspecific (human-human) friendships or romantic partnerships involving reciprocal commitments & understanding.
^
For Sebo & Long, a given AI should be extended moral consideration if there is even a 1:1,000 chance that it is conscious. Two points bear emphasis. First, moral consideration does not imply concrete privileges, rights, or protections. For an AI to merit moral consideration is only to say that we, as moral agents, ought to take its interests into account when making decisions. It does not mean that the overall outcome of our deliberations must be favourable to it– its own interests must be conditioned against those of other moral patients. Second, 1:1,000 is higher than Sebo & Long’s personal thresholds for moral consideration. They make their stand at 1:1,000 because it is more conservative & hence more agreeable to sceptics about conscious AI.
^
Small, but non-negligible risks frequently exert powerful influences on our decision-making. Most people agree that even if the probability of serious injury or death is low, driving drunk is wrong & should be severely prohibited by law.
^
See discussion of Birch et al (2020)’s 5-dimensional disambiguation of consciousness in (§4.121). https://youtu.be/hbOQH9m2EIk
^
Like animals, AI systems may also differ in the degree to which they exhibit consciousness within a single respect.
^
Often also called “robot rights” in the literature (Gunkel), even though the debate is not specific to robots (i.e. AIs with physical embodiments which they can use to interact with the world).
^
See discussion of moral agents in (§2.1, footnote 5) & morally autonomous AI in (§2.314c).

Conscious AI concerns all of us. [Conscious AI & Public Perceptions]