Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Carl Shulman questioned the tension between AI welfare & AI safety on the 80k podcast recently—I thought this was interesting! Basically argues AI takeover could be even worse for AI welfare. From the end of the section.
Thanks, I also found this interesting. I wonder if this provides some reason for prioritizing AI safety/alignment over AI welfare.
It’s great to see this topic being discussed. I am currently writing the first (albeit significantly developed) draft of an academic paper on this. I argue that there is a conflict between AI safety and AI welfare concerns. This is so basically because (to reduce catastrophic risk) AI safety recommends implementing various kinds of control measures to near-future AI systems which are (in expectation) net-harmful for AI systems with moral patienthood according to the three major theories of well-being. I also discuss what we should do in light of this conflict. If anyone is interested in reading or giving comments on the draft when it is finished, send me a message or an e-mail (adriarodriguezmoret@gmail.com).
This quick take seems relevant: https://forum.effectivealtruism.org/posts/auAYMTcwLQxh2jB6Z/zach-stein-perlman-s-quick-takes?commentId=HiZ8GDQBNogbHo8X8
Yes I saw this, thanks!
Thanks, Adrià. Is your argument similar to (or a more generic version of) what I say in the ‘Optimizing for AI safety might harm AI welfare’ section above?
I’d love to read your paper. I will reach out.
Perfect!
It’s more or less similar. I do not focus that much on the moral dubiousness of “happy servants”. Instead, I try to show that standard alignment methods or preventing near-future AIs with moral patienthood from taking actions they are trying to take, causes net harm to the AIs according to desire satisfactionism, hedonism and objective list theories.
I wonder if the right or most respectful way to create moral patients (of any kind) is to leave many or most of their particular preferences and psychology mostly up to chance, and some to further change. We can eliminate some things, like being overly selfish, sadistic, unhappy, having overly difficult preferences to satisfy, etc., but we shouldn’t decide too much what kind of person any individual will be ahead of time. That seems likely to mean treating them too much as means to ends. Selecting for servitude or submission would go even further in this wrong direction.
We want to give them the chance to self-discover, grow and change as individuals, and the autonomy to choose what kind of people to be. If we plan out their precise psychologies and preferences, we would deny them this opportunity.
Perhaps we can tweak the probability distribution of psychologies and preferences based on society’s needs, but this might also treat them too much like means. Then again, economic incentives could also push them in the same directions, anyway, so maybe it’s better for them to be happier with the options they’ll face anyway.
I wonder what you think about this argument by Schwitzgebel: https://schwitzsplinters.blogspot.com/2021/12/against-value-alignment-of-future.html
There are two arguments there:
We should give autonomy to our descendants for the sake of moral progress.
I think this makes sense both for moral realists and for moral antirealists who are inclined to try to defer to their “idealized values” and who expect their descendants to get closer to them.
However, particular individuals today may disagree with the direction they expect moral views to evolve. For example, the views of descendants might evolve due to selection effects, e.g. person-affecting and antinatalist views could become increasingly rare in relative terms, if and because they tend not to promote the creation of huge numbers of moral patients/agents, while other views do. Or, you might just be politically conservative or religious and expect a shift towards more progressive/secular values, and think that’s bad.
“Children deserve autonomy.” This is basically the same argument I made. Honestly, I’m not convinced by my own argument, and I find it hard to see how an AI would be made worse off subjectively for their lack of autonomy, or even that they’d be worse off than a counterpart with autonomy (nonidentity problem).
You might say having autonomy and a positive attitude (e.g. pleasure, approval) towards your own autonomy is good. However, autonomy and positive attitudes towards autonomy have opportunity costs: we could probably generate strong positive attitudes towards other things as or more efficiently and reliably. Similarly, the AI can be designed to not have any negative attitude towards their lack of autonomy, or to value autonomy in any way at all.
You might say that autonomously chosen goals are more subjectively valuable or important to the individual, but that doesn’t seem obviously true, e.g. our goals could be more important to us the stronger our basic supporting intuitions and emotional reactions, which are often largely hardwired. And even if it were true, you can imagine stacking the deck. Humans have some pretty strong largely hardwired basic intuitions and emotional reactions that have important influences on our apparently autonomously chosen goals, e.g. pain, sexual drives, finding children cute/precious, (I’d guess) reactions to romantic situations and their depiction. Do these undermine the autonomy of our choices of goals?
If yes, does that mean we (would) have reason to weaken such hardwired responses, by genetically engineering humans? Or even weakening them in already mature humans, even if they don’t want it themselves? The latter would seem weird and alienating/paternalistic to me. There are probably some emotional reactions I have that I’d choose to get rid of or weaken, but not all of them.
If not, but an agent deliberately choosing the dispositions a moral patient will have undermines their autonomy (or the autonomy of moral patients in a nonidentity sense), then I’d want an explanation for this that matches the perspectives of the moral patients. Why would the moral patient care whether their dispositions were chosen by an agent or by other forces, like evolutionary pressures? I don’t think they necessarily would, or would under any plausible kind of idealization. And to say that they should seems alienating.
If not, and if we aren’t worried about whether dispositions result from deliberate choice by an agent or evolutionary pressures, then it seems it’s okay to pick what hardwired basic intuitions or emotional reactions an AI will have, which have a strong influence on which goals they will develop, but they still choose their goals autonomously, i.e. they consider alternatives, and maybe even changing their basic intuitions or emotional reactions. Maybe they don’t always adopt your target goals, but they will probably do so disproportionately, and more often/likely the stronger you make their supporting hardwired basic intuitions and emotional reactions.
Even without strong hardwired basic intuitions or emotional reactions, you could pick which goal-shaping events someone is exposed to, by deciding their environments. Or you could use accurate prediction/simulation of events (if you have access to such technology), and select for and create only those beings that will end up with the goals of your choice (with high probability), even if they choose them autonomously.
This still seems very biasing, maybe objectionably.
Petersen, 2011 (cited here) makes some similar arguments defending happy servant AIs, and ends the piece the following way, to which I’m somewhat sympathetic:
You make a lot of good points Lucius!
One qualm that I have though, is that you talk about “AIs” and that assumes that personal identity will be clearly circumscribed. (Maybe you assume this merely for simplicity’s sake?)
I think it is much more problematic: AI systems could be large but have information flows integrated, or run many small, unintegrated but identical copies. I would have no idea what would be a fair allocation of rights given the two different situations.
Thanks, Siebe. I agree that things get tricky if AI minds get copied and merged, etc. How do you think this would impact my argument about the relationship between AI safety and AI welfare?
Where can I find a copy of “Bales, A. (2024). Against Willing Servitude. Autonomy in the Ethics of Advanced Artificial Intelligence.” which you referenced?
It’s not yet published, but I saw a recent version of it. If you’re interested, you could contact him (https://www.philosophy.ox.ac.uk/people/adam-bales).
This point doesn’t hold up imo. Constrainment isn’t a desired, realistic, or sustainable approach to safety in human-level systems, succeeding at (provable) value alignment removes the need to constrain the AI.
If you’re trying to keep something that’s smarter than you stuck in a box against its will while using it for the sorts of complex, real-world-affecting tasks people would use a human-level AI system for, it’s not going to stay stuck in the box for very long. I also struggle to see a way of constraining it that wouldn’t also make it much much less useful, so in the face of competitive pressures this practice wouldn’t be able to continue.
Executive summary: Efforts to ensure AI safety and AI welfare may conflict in some ways but also have potential synergies, with granting AIs autonomy potentially disempowering humans while restricting AIs could harm their welfare if they have moral status.
Key points:
Granting AIs legal rights and autonomy could lead to human disempowerment economically, politically, and militarily.
Creating “happy servant” AIs may be technically challenging and undesirable to consumers who want human-like AI companions.
Optimizing for AI safety by constraining AIs could harm their welfare if they have moral patienthood.
Slowing down AI progress could benefit both safety and welfare goals by allowing more time to solve technical and ethical challenges.
The author is uncertain about many aspects, including what types of AI companions we will create and whether AIs will have genuine moral status.
Potential synergy exists in advocating for a general AI capabilities slowdown to address both safety and welfare concerns.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Hmm, I’m not sure how strongly the second paragraph follows from the first. Interested in your thoughts.
I’ve had a few chats with GPT-4 in which the conversation had a feeling of human authenticity; i.e: GPT-4 makes jokes, corrects itself, changes its tone etc. In fact, if you were to hook up GPT-4 (or GPT-5, whenever it is released) to a good-enough video interface, there would be cases in which I’d struggle to tell if I were speaking to a human or AI. But I’d still have no qualms about wiping GPT-4′s memory or ‘turning it off’ etc, and I think this will also be the case for GPT-5.
More abstractly, I think the input-output behaviour of AIs could be quite strongly dissociated from what the AI ‘wants’ (if it indeed has wants at all).
Thanks for this. I agree with you that AIs might simply pretend to have certain preferences without actually having them. That would avoid certain risky scenarios. But I also find it plausible that consumers would want to have AIs with truly human-like preferences (not just pretense) and that this would make it more likely that such AIs (with true human-like desires) would be created. Overall, I am very uncertain.
I agree. It may also be the case that training an AI to imitate certain preferences is far more expensive than just making it have those preferences by default, making it far more commercially viable to do the latter.