I’m actually a bit confused here, because I’m not settled on a meta-ethics: why isn’t it the case that a large part of human values is about satisfying the preferences of moral patients, and human values consider any or most advanced AIs as non-trivial moral patients?
I don’t put much weight on this currently, but I haven’t ruled it out.
For humans, preference-satisfaction is generally a good proxy for life-quality-improvement. For AI (or arbitrary agents), if we call whatever they seek to maximize “preferences” (which might be misleading in that for strict definitions of “preferences” they might not have preferences), it does not automatically follow that satisfying those preferences makes them better off in any way.
The paperclipper doesn’t make paperclips because it loves paperclips. It just makes paperclips because that’s what it was programmed or trained to do.
I don’t know much metaethics jargon, so I’ll just give an example. I believe that moral goodness (or choice-worthy-ness, if you prefer) is proportional to happiness minus suffering. I believe that happiness and suffering are caused by certain physical processes. A system could achieve its goals (that is, do what we would colloquially describe as achieving goals, although I’m not sure how to formalize “goals”) without being happier. For other theories of wellbeing, a system could generally achieve its goals without meeting those wellbeing-criteria.
I’m actually a bit confused here, because I’m not settled on a meta-ethics: why isn’t it the case that a large part of human values is about satisfying the preferences of moral patients, and human values consider any or most advanced AIs as non-trivial moral patients?
I don’t put much weight on this currently, but I haven’t ruled it out.
For humans, preference-satisfaction is generally a good proxy for life-quality-improvement. For AI (or arbitrary agents), if we call whatever they seek to maximize “preferences” (which might be misleading in that for strict definitions of “preferences” they might not have preferences), it does not automatically follow that satisfying those preferences makes them better off in any way.
The paperclipper doesn’t make paperclips because it loves paperclips. It just makes paperclips because that’s what it was programmed or trained to do.
Could you try to clarify what you mean by the AI (or an agent in general) being “better off?”
I don’t know much metaethics jargon, so I’ll just give an example. I believe that moral goodness (or choice-worthy-ness, if you prefer) is proportional to happiness minus suffering. I believe that happiness and suffering are caused by certain physical processes. A system could achieve its goals (that is, do what we would colloquially describe as achieving goals, although I’m not sure how to formalize “goals”) without being happier. For other theories of wellbeing, a system could generally achieve its goals without meeting those wellbeing-criteria.
(Currently exhausted, apologies for incoherence.)
No worries! Seemed mostly coherent to me, and please feel free to respond later.
I think the thing I am hung up on here is what counts as “happiness” and “suffering” in this framing.