Why I’m working on AI welfare
Since Alvea wound down, AI moral patienthood/welfare/wellbeing[1] has been my primary professional/impact focus. I started exploring this topic casually last fall, then spent the winter investigating potential grantmaking opportunities and related strategy questions. Now, I’m working with Rob Long on a combination of research projects and engagement with leading AI labs on AI moral patienthood and related issues. I’m approaching this topic as one component of the broader puzzle of making a future with transformative AI go well, rather than as an independent/standalone issue.
My overall focus is on building a clearer picture of relevant questions and issues, developing tools and recommendations for how labs and other stakeholders might navigate them, and improving the discourse and decision-making around this topic. I’m not focused on general advocacy for digital minds—there are significant potential risks associated with both under- and over-attributing moral status to AI systems, which makes it important to keep the focus on getting the issues right, rather than prematurely pushing for particular conclusions.
For AI welfare debate week, I’ll briefly share the core factors contributing to my prioritization of this topic:
Direct longtermist relevance: Decisions we make soon about the development, deployment, and treatment of AI systems could have lasting effects on the wellbeing of digital minds, with potentially enormous significance for the moral value of the future.
I put some weight on the simple argument that the vast majority of the moral value of the future may come from the wellbeing/experiences of digital beings, and that therefore we ought to work now to understand this issue and take relevant actions.
I don’t think strong versions of this argument hold, given the possibility of deferring (certain kinds of) work on this topic until advanced AI systems can help or solve it on their own, the potentially overwhelming importance of alignment for existential security, and other factors. As such, I still rank alignment substantially higher than AI welfare in terms of overall importance from a longtermist standpoint.
However, I don’t think the longtermist value of working on AI welfare now is entirely defeated by these concerns. It still seems plausible enough to me that there are important path dependencies due to e.g. the need to make critical decisions prior to the development of transformative AI, that some low single-digit percentage of the total resources going into alignment should be invested here as a baseline.
Synergies with other longtermist priorities: Work on the potential wellbeing of AI systems may be instrumentally/synergistically valuable to work on other longtermist priorities, particularly AI alignment and safety.
Understanding the potential interests, motivations, goals, and experiences of AI systems—a key aim of work on AI moral patienthood—seems broadly useful in efforts to build positive futures with such systems.
It seems plausible to me that AI welfare will become a major topic of public interest/concern, and that it will factor into many critical decisions about AI development/deployment in general. By default, I do not expect the reasoning and decision-making about this topic to be good—this seems like a place where early thoughtful efforts could make a big positive difference.
There are also potential tensions that arise between AI welfare and other longtermist priorities (e.g. the possibility that some AI safety/control strategies involve mistreating AI systems), but my current view is that these topics are, or at least can be, predominantly synergistic.
Near-term/moral decency considerations: Taking the wellbeing of AI systems seriously looks good through various non- (or less-)longtermist lenses.
It’s plausible to me that within 1-2 decades AI welfare surpasses animal welfare and global health and development in importance/scale purely on the basis of near-term wellbeing. This is concerning in concert with us being on a trajectory toward relating to AI systems in potentially problematic ways (e.g. forced uncompensated labor; deleting them at will) and with the strong precedent of humanity abusing other beings on a large scale when it’s economically or otherwise useful to do so.
I want to be the type of person who cares—early and seriously—about the possibility that a new species/kind of being might have interests of their own that matter morally, and who does something about this. I want us to be the kind of community/civilization that does this, too.
Relatedly, I have a strong intuition against creating new (super powerful) beings/intelligences without trying to understand and take into account their potential interests and wellbeing. In addition to some general sense of decency/goodness, there’s also a practical angle: taking the interests of AI systems seriously and treating them well could make it more likely that they return the favor if/when they’re more powerful than us.
Moral intuitions and principles of goodness/fairness weigh heavier for me when issues are as murky as AI welfare and the relevant consequentialist reasoning is so uncertain.
Neglectedness: Very little work has gone into AI moral patienthood to date relative to its potential importance, especially from a practical/action-oriented angle.
There are very few people doing direct work on AI welfare, and even fewer doing so in a full-time capacity.
Most of the work in this space to date has come from an academic philosophy angle—exploring relevant theoretical questions without yet answering the question of what, if anything, we should be doing about all of this.
My sense is that the current degree of neglectedness is more a reflection of the lack of concrete opportunities for people to get involved/contribute in the space at the moment, rather than a reflection of people’s individual or collective prioritization.
Tractability: It may be possible to make progress on both philosophical and practical questions related to AI moral patienthood.
When I started exploring this space I was highly uncertain about the tractability. I thought it was plausible that I’d get a few months in and have no clue where to go, but now I have a backlog of seemingly valuable and achievable projects that would take me a couple of years to get through, and the list keeps growing (of course, this is not a perfect indicator of tractability).
I’m still highly uncertain about the feasibility of influencing the long-term wellbeing of digital systems, but I’m far from convinced that it’s intractable, and I’m optimistic about the meta-task of reducing this uncertainty.
Perhaps most importantly, I don’t think enough work has gone into this space to draw any confident conclusions about tractability. My sense is that paths through the fog of seemingly intractable problems are often only discovered through focused, determined effort and experimentation, very little of which has happened in this space to date.
Personal fit: My skills, interests, and personality/psychology are relatively well suited to working in this space, particularly in helping get it off the ground.
I enjoy working on fuzzy, pre-paradigmatic problems and trying to find ways to make progress on them, especially the process of clarifying and structuring areas of work so that more people can get involved and contribute. Work on AI welfare is bottlenecked to a large degree by a lack of concrete things for people to do, and my sense is that I’m in a somewhat unique position to help address this bottleneck.
I’m particularly well suited to bring a practical, execution-focused perspective into the AI welfare space, which has so far been a gap. I think my aptitudes have the potential to complement and help capitalize on the more academic work that has been done thus far.
I find this issue intrinsically motivating and exciting/fulfilling to work on.
None of these arguments provide a knock-down case to prioritize work in this space, but together they’ve left me with a strong sense that this is where I can be of the greatest use right now. They’ve also convinced me that this field ought to be a bigger priority in EA and amongst the broader set of efforts to build a positive future. Very loosely synthesizing the above considerations with others (e.g. downside risks), I’m currently interested in seeing investment of money and talent into AI welfare work scale up to ~5% of the resources going into AI safety/alignment. Before growing the field beyond that, I’d likely want to see (or build) more rigorous models of the value of relative investments/impact.
Thanks to Joe Carlsmith for major contributions to these ideas and framings, and to Rob Long and Sofia Davis-Fogel for many relevant discussions and for feedback on this piece.
- ^
I use these terms interchangeably throughout.
I think that AI welfare should be an EA priority, and I’m also working on it. I think this post is a good illustration of what that means, 5% seems reasonable to me. I also appreciate this post, as it has many of the core motivations for me. I recently spent several months thinking hard about the most effective philosophy PhD project I could work on, and ended up thinking that it was to work on AI consciousness.
Have you considered working on metaphilosophy / AI philosophical competence instead? Conditional on correct philosophy about AI welfare being important, most of future philosophical work will probably be done by AIs (to help humans / at our request, or for their own purposes). If AIs do that work badly and arrive at wrong conclusions, then all the object-level philosophical work we do now might only have short-term effects and count for little in the long run. (Conversely if we have wrong views now but AIs correct them later, that seems less disastrous.)
I hadn’t, that’s an interesting idea, thanks!
Thanks for letting me know! I have been wondering for a while why AI philosophical competence is so neglected, even compared to other subareas of what I call “ensuring a good outcome for the AI transition” (which are all terribly neglected in my view), and I appreciate your data point. Would be interested to hear your conclusions after you’ve thought about it.
Executive summary: The author argues that AI welfare is an important and neglected area that deserves more attention and resources, as decisions made now about AI systems could have enormous long-term consequences for the wellbeing of digital minds.
Key points:
AI welfare has direct longtermist relevance, with potential to significantly impact the moral value of the future.
Work on AI welfare may have synergies with other longtermist priorities like AI alignment and safety.
Near-term moral considerations support taking AI welfare seriously, as AI systems may soon surpass animals in ethical importance.
The field is highly neglected relative to its potential importance, especially from a practical perspective.
Progress on both philosophical and practical questions related to AI moral patienthood appears tractable.
The author recommends scaling up resources for AI welfare to ~5% of those going to AI safety/alignment.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
↑ ≈ ✓