Principles for AI Welfare Research

Tl;dr: This post, which is part of the EA Strategy Fortnight series, summarizes some of my current views about the importance of AI welfare, priorities for AI welfare research, and principles for AI welfare research.

1. Introduction

As humans start to take seriously the prospect of AI consciousness, sentience, and sapience, we also need to take seriously the prospect of AI welfare. That is, we need to take seriously the prospect that AI systems can have positive or negative states like pleasure, pain, happiness, and suffering, and that if they do, then these states can be good or bad for them.

A world that includes the prospect of AI welfare is a world that requires the development of AI welfare research. Researchers need to examine whether and to what extent AI systems might have the capacity for welfare. And to the extent that they might, researchers need to examine what might be good or bad for AI systems and what follows for our actions and policies.

The bad news is that AI welfare research will be difficult. Many researchers are likely to be skeptical of this topic at first. And even insofar as we take the topic seriously, it will be difficult for us to know what, if anything, it might be like to be an AI system. After all, the only mind that we can directly access is our own, and so our ability to study other minds is limited at best.

The good news is that we have a head start. Researchers have spent the past half century making steady progress in animal welfare research. And while there are many potentially relevant differences between animals and AI systems, there are also many potentially relevant similarities – enough for it to be useful for us to look to animal welfare research for guidance.

In Fall 2022, we launched the NYU Mind, Ethics, and Policy Program, which examines the nature and intrinsic value of nonhuman minds, with special focus on invertebrates and AI systems. In this post, I summarize some of my current views about the importance of AI welfare, priorities for AI welfare research, and principles for AI welfare research.

I want to emphasize that this post discusses these issues in a selective and general way. A comprehensive treatment of these issues would need to address many more topics in much more detail. But I hope that this discussion can be a useful starting point for researchers who want to think more deeply about what might be good or bad for AI systems in the future.

I also want to emphasize that this post expresses my current, tentative views about this topic. It might not reflect the views of other people at the NYU Mind, Ethics, and Policy Program or of other experts in effective altruism, global priorities research, and other relevant research, advocacy, or policy communities. It might not even reflect my own views a year from now.

Finally, I want to emphasize that AI welfare is only one of many topics that merit more attention right now. Many other topics merit more attention too, and this post makes no specific claims about relative priorities. I simply wish to claim that AI welfare research should be among our priorities, and to suggest how we can study and promote AI welfare in a productive way.

2. Why AI welfare matters

We can use the standard EA scale-neglectedness-tractability framework to see why AI welfare matters. The general idea is that there could be many more digital minds than biological minds in the future, humanity is currently considering digital minds much less than biological minds, and humanity might be able to take steps to treat both kinds of minds well.

First, AI welfare is potentially an extremely large-scale issue. In the same way that the invertebrate population is much larger than the vertebrate population at present, the digital population has the potential to be much larger than the biological population in the future. And in the same way that humans currently interact with many invertebrates at present, we have the potential to interact with many digital beings in the future. It thus matters a lot whether and to what extent these beings will have the capacity to experience happiness, suffering, and other welfare states. Indeed, given the potential size of this population, even if individual digital beings have only a small chance of experiencing only small amounts of welfare, given the evidence, they might still experience large amounts of welfare in total, in expectation.

Second, AI welfare is currently extremely neglected. Humans still spend much less time and money studying and promoting nonhuman welfare and rights than studying and promoting human welfare and rights, despite the fact that the nonhuman population is much larger than the human population. The same is true about both the vertebrate and invertebrate populations and the biological and digital populations. In all of these cases, we see an inverse relationship between the size of a population and the level of attention that this population receives. And while humans might be warranted in prioritizing ourselves to an extent for the foreseeable future for a variety of reasons, we might still be warranted in prioritizing nonhumans, including invertebrates and AI systems, much more than we currently do.

Third, AI welfare is at least potentially tractable. Its tractability is currently an open question, since advancing our understanding of the nature and intrinsic value of digital minds requires us to confront some of the hardest issues in philosophy and science, ranging from the nature of consciousness to the ethics of creating new beings. But while we might not ever be able to achieve certainty about these issues, we might at least be able to reduce our uncertainty and make more informed, rational decisions about how to treat digital minds. And either way, given the importance and neglectedness of the issue, we should at least investigate the tractability of the issue so that we can learn through experience what the limits of our knowledge about AI welfare are, rather than simply make assumptions from the start.

Finally, human, animal, and AI welfare are potentially linked. There might be cases where the interests of biological and digital beings diverge, but there might also be cases where our interests converge. As an analogy, human and nonhuman animals alike stand to benefit from a culture of respect and compassion for all animals, since our current exploitation and extermination of other animals for food, research, entertainment, and other purposes not only kills trillions of animals per year directly but also contributes to (a) global health and environmental threats that imperil us all and (b) exclusionary and hierarchical attitudes that we use to rationalize oppression within our own species. We should be open to the possibility that in the future, similar dynamics will arise between biological and digital populations.

3. Priorities for AI welfare research

Improving our understanding of whether, to what extent, and in what ways AI systems can be welfare subjects requires asking a wide range of questions, ranging from the theoretical (what is the nature of welfare?) to the practical (is this action harming this being?). For my purposes here I can focus on four general kinds of questions that I take to be especially important.

First, we need to improve our understanding of which beings have the capacity for welfare and moral standing. Answering this question partly requires asking which features are necessary and sufficient for welfare and moral standing. For example, even if we grant that sentience is sufficient, we might wonder whether consciousness without sentience, agency without consciousness, or life without agency is also sufficient. Answering this question also partly requires asking which beings have the features that might be necessary and sufficient. For example, even if we grant that, say, relatively complex, centralized, and carbon-based systems can be sentient or otherwise significant, we might wonder whether relatively simple, decentralized, and silicon-based systems can be sentient or otherwise significant, too.

Second, we need to improve our understanding of how much happiness, suffering, and other welfare states particular beings can have. Answering this question partly requires asking how to compare welfare capacities in different kinds of beings. Interspecies welfare comparisons are already hard, because even if we grant that our welfare capacities are a function of, say, our cognitive complexity and longevity (which, to be clear, is still very much an open question), we might not be able to find simple, reliable proxies for these variables in practice. If and when digital minds develop the capacity for welfare, intersubstrate welfare comparisons will be even harder, because we lack the same kinds of physical and evolutionary “common denominators” across substrates that we have, at least to an extent, within them.

Third, we need to improve our understanding of what benefits and harms particular beings. Even if we grant that everyone is better off to the extent that they experience positive states like pleasure and happiness and worse off to the extent that they experience negative states like pain and suffering, we might not always know to what extent someone is experiencing positive or negative states in practice. Likewise, even if we grant that a life is worth living when it contains more positive than negative welfare (or even if we grant that the threshold is higher or lower than this), we might not always know whether a particular life is above or below this threshold in practice. And unless we know when life is better, worse, good, or bad for particular beings, knowing that life can be better, worse, good, or bad for them is of limited value.

Finally, we need to improve our understanding of what follows from all this information for our actions and policies. In general, treating others well requires thinking not only about welfare but also about rights, virtues, relationships, and more. (This can be true even for consequentialists who aspire to do the most good possible, since for many agents in many contexts, we can do the most good possible by thinking partly in consequentialist terms and partly in non-consequentialist terms.) So, before we can know how to treat beings of other substrates, we need to ask not only whether they have the capacity for welfare, how much welfare they have, and what will benefit and harm them, but also what we owe them, what kinds of attitudes we should cultivate towards them, and what kinds of relationships we should build with them.

4. Principles for AI welfare research

With all that in mind, here are a dozen (overlapping) general principles that I hope can be useful for guiding AI welfare research. These principles are inspired by lessons learned during the past several decades of animal welfare research. These fields of course have many relevant differences, but they have many relevant similarities too, some of which can be instructive.

1. AI welfare research should be pluralistic.
Experts continue to debate basic issues regarding the nature and value of other minds. Normatively, experts still debate whether welfare is primarily a matter of pleasure and pain, satisfaction and frustration, or something else, and whether morality is primarily a matter of welfare, rights, virtues, relationships, or something else. And descriptively, experts still debate which beings have the capacity for welfare and which actions and policies are good or bad for them. AI welfare research should welcome these disagreements. We should be open to the possibility that our current views are wrong. And even if our current views are right, we still have a lot to learn from people with other perspectives, and we can make more progress as a field when we study and promote AI welfare from a variety of perspectives.

2. AI welfare research should be multidisciplinary.
It might be tempting to think of AI welfare research as a kind of natural science, since, after all, we need work in cognitive science and computer science to understand how biological and digital systems work. However, this field requires work in the humanities and social sciences, too. For instance, we need work in the humanities to identify the metaphysical, epistemological, and normative assumptions that drive this research, so that we can ensure that our attempts to study and protect animals and AI systems can have a solid theoretical foundation. Similarly, we need work in the social sciences to identify the beliefs, values, and practices that shape our interactions with animals and AI systems, so that we can identify biases that might prevent us from studying or protecting these populations in the right kind of way.

3. AI welfare research requires confronting human ignorance.
How, if at all, can we have knowledge about other minds when the only mind that any of us can directly access is our own? Taking this problem seriously requires cultivating humility about this topic. Our knowledge about other minds will likely always be limited, and as we move farther away from humanity on the tree of life – to other mammals, then other vertebrates, then other animals, then other organisms, and so on – these limitations will likely increase. However, taking this problem seriously also requires cultivating consistent epistemic standards. If we accept that we can reduce our uncertainty about human minds to an extent despite our epistemic limitations, then we should be open to the possibility that we can reduce our uncertainty about nonhuman minds to an extent despite these limitations as well.

4. AI welfare research requires confronting human bias.
As noted above, humans have many biases that can distort our thinking about other minds. For example, we have a tendency toward excessive anthropomorphism in some contexts (that is, to take nonhumans to have human features that they lack) as well as a tendency towards excessive anthropodenial in some contexts (that is, to take nonhumans to lack human features that they have). Our intuitions are also sensitive to self-interest, speciesism, status quo bias, scope insensitivity, and more. Given the complexity of these issues, we can expect that our intuitions about other minds will be unreliable, and we can also expect simple correctives like “reject anthropomorphism” will be unreliable. At the same time, given the importance of these issues, we need to do the best we can with what we have, in spite of our ongoing unreliability.

5. AI welfare research requires spectrum thinking.
People often frame questions about animal minds in binary, all-or-nothing terms. For instance, we might ask whether animals have language and reason, rather than asking what kinds of language and reason they have and lack. Yet many animals have the same capacities as humans in some respects but not in others. For example, many animals are capable of sharing information with each other, but not via the same general, flexible, recursive kind of syntax that humans can use. (Of course, this point applies in the other direction as well; for example, many humans are capable of seeing colors, but not as many as many birds can see.) In the future, a similar point will apply to digital minds. Where possible, instead of simply asking whether AI systems have particular capacities, we should ask what kinds they have and lack.

6. AI welfare research requires particularistic thinking.
People also often frame questions about animal minds in general terms. For instance, we might ask whether nonhuman primates have language and reason, rather than asking whether, say, chimpanzees or bonobos do (or, better yet, what kinds of language and reason chimpanzees or bonobos have and lack). And as we move farther away from humanity on the tree of life, the diversity of nonhuman minds increases, as does our tendency to lump them all together. But of course, there are many differences both within and across species. How, say, bumblebees communicate and solve problems is very different from how, say, carpenter ants do. In the future, a similar point will apply to digital minds. Where possible, instead of simply asking what AI minds are like, we should ask what particular kinds of AI minds are like.

7. AI welfare research requires probabilistic thinking.
As noted above, we may never be able to have certainty about animal minds. Instead, we may only be able to have higher or lower degrees of confidence. And as we move farther away from humanity on the tree of life, our uncertainty about animal minds increases. We thus need to factor our uncertainty into both our science and our ethics, by expressing our beliefs probabilistically (or, at least, in terms of high, medium, and low confidence), and by basing our actions on principles of risk (such as a precautionary principle or an expected value principle). In the future, a similar point will apply to digital minds. In general, instead of striving for a level of certainty about AI systems that will likely continue to elude us, we should develop methods for thinking about, and interacting with, AI systems that accommodate our uncertainty.

8. AI welfare research requires reflective equilibrium.
In discussions about animal minds, it can be tempting to treat the flow of information from the human context to the nonhuman context as a one-way street. We start with what we know about the human mind and then ask whether and to what degree these truths hold for nonhuman minds too. But the reality is that the flow of information is a two-way street. By asking what nonhuman minds are like, we can expand our understanding of the nature of perception, experience, communication, goal-directedness, and so on, and we can then apply this expanded understanding back to the human mind to an extent. In the future, a similar point will apply to digital minds. By treating the study of human, animal, and AI welfare as mutually reinforcing, researchers can increase the likelihood of new insights in all three areas.

9. AI welfare research requires conceptual engineering.
Many disagreements about animal minds are at least partly conceptual. For instance, when people disagree about whether insects feel pain, the crux is sometimes not whether insects have aversive states, but rather whether we should use the term ‘pain’ to describe them. In such cases, applying a familiar concept can increase the risk of excessive anthropomorphism, whereas applying an unfamiliar concept can increase the risk of excessive anthropodenial, and so a lot depends on which risk is worse. Many other disagreements have a similar character, including, for instance, disagreements about whether to use subject terms (‘they’) or object terms (‘it’) to describe animals. In the future, a similar point will apply to digital minds. Researchers will thus need to think about risk and uncertainty when selecting terminology as well.

10. AI welfare research requires ethics at multiple levels.
I already noted that AI welfare research is multidisciplinary, but the role of ethics is worth emphasizing in at least three respects. First, we need ethics to motivate AI welfare research. We have a responsibility to improve our treatment of vulnerable beings, and to learn which beings are vulnerable and what they might want or need as a means to that end. Second, we need ethics to shape and constrain AI welfare research. We have a responsibility to avoid harming vulnerable beings unnecessarily in the pursuit of new knowledge, and to develop ethical frameworks for our research practices as a means to that end. And third, we need ethics to *apply *AI welfare research. We have a responsibility to make our research useful for the world, and to support changemakers in applying it thoughtfully as a means to that end.

11. AI welfare research requires holistic thinking.
As noted above, there are many links between humans, animals, and AI systems, and these links can sometimes reveal tradeoffs. For instance, some people perceive a tension between the projects of caring for humans, animals, and AI systems because they worry that concern for AI systems will distract from concern for humans and other animals, and they also worry that caring for AI systems means controlling AI systems less, whereas caring for humans and other animals means controlling AI systems more. Determining how to improve welfare at the population level thus requires thinking about these issues holistically. Insofar as positive-sum approaches are possible, thinking holistically allows us to identify them. And insofar as tradeoffs remain, thinking holistically allows us to prioritize thoughtfully and minimize harm.

12. AI welfare research requires structural thinking.
Part of why we perceive tradeoffs between the projects of caring for humans, animals, and AI systems is that our knowledge, power, and political will is extremely limited, due in large part to social, political, and economic structures that pit us against each other. For example, some AI researchers might view AI ethics, safety, and welfare as unaffordable luxuries in the context of a global AI arms race, but they might take a different perspective in other contexts. Determining how to improve welfare at the population level thus requires thinking about these issues structurally. When we support social, political, and economic changes that can improve our ability to treat everyone well, we might discover that we can achieve and sustain higher levels of care for humans, animals, and AI systems than we previously appreciated.

5. Conclusion

Our understanding of welfare is still at an early stage of development. Fifty years ago, many experts believed that only humans have the capacity for welfare at all. Twenty-five years ago, many experts were confident that, say, other mammals have this capacity but were skeptical that, say, fishes do. We now feel more confident that all of these animals have this capacity.

At present, many experts are now reckoning with the possibility that invertebrates like insects have the capacity for welfare the same kind of way. Experts are also reckoning with the reality that we know very little about the vast majority of vertebrate and invertebrate species, and so we know very little about what they want and need if they do have the capacity for welfare.

Unfortunately, our acceptance of these realities is too little, too late for quadrillions of animals. Every year, humans kill more than 100 billion captive animals and hundreds of billions of wild animals for food. This is to say nothing of the trillions of animals who die each year as a result of deforestation, development, pollution, and other human-caused global changes.

Fortunately, we now have the opportunity to improve our understanding of animal welfare and improve our treatment of animals. While we might not be able to do anything for the quadrillions of animals who suffered and died at our hands in the past, we can, and should, still do something for the quintillions who might be vulnerable to the impacts of human practices in the future.

And as we consider the possibility of conscious, sentient, and sapient AI, we have the opportunity to learn lessons from our history with animals and avoid repeating the same mistakes with AI systems. We also have the opportunity to expand our understanding of minds in general, including our own, and to improve our treatment of everyone in an integrated way.

However, taking advantage of this opportunity will require thoughtful work. Research fields are path dependent, and which path they take can depend heavily on how researchers frame them during their formative stages of development. If researchers frame AI welfare research in the right kind of way from the start, then this field will be more likely to realize its potential.

As noted above, this post describes some of my own current, tentative views about how to frame and scope this field in a selective, general way. I hope that it can be useful for other people who want to work on this topic – or related topics, ranging from animal welfare to AI ethics and safety – and I welcome comments and suggestions about how to update my views.

You can find an early working paper by me and Robert Long that makes the case for moral consideration for AI systems by 2030 here. You can also find the winners of our early-career award on animal and AI consciousness here (and you can see them speak in NYC on June 26). Stay tuned for further work from our team, as well as, hopefully, from many others!