Whoops. I can see how my responses didn’t make my own position clear.
I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.
I consider it a likely futile effort to integrate important and substantive discussions into contemporary moral philosophy. If engaging with moral philosophy introduces unproductive digressions/confusions/misplaced priorities into the discussion it may do more harm than good.
I’m puzzled by this remark:
I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.
I view utilitronium as an end, not a means. It is a logical consequence of wanting to maximize aggregate utility and is more or less a logical entailment of my moral views. I favor the production of whatever physical state of affairs yields the highest aggregate utility. This is, by definition, “utilitronium.” If I’m using the term in an unusual way I’m happy to propose a new label that conveys what I have in mind.
I totally sympathize with your sentiment and feel the same way about incorporating other people’s values in a superintelligent AI. If I just went with my own wish list for what the future should look like, I would not care about most other people’s wishes. I feel as though many other people are not even trying to be altruistic in the relevant sense that I want to be altruistic, and I don’t experience a lot of moral motivation to help accomplish people’s weird notions of altruistic goals, let alone any goals that are clearly non-altruistically motivated. In the same way I’d feel no strong (even lower, admittedly) motivation to help make the dreams of baby eating aliens come true.
Having said that, I am confident that it would screw things up for everyone if I followed a decision policy that does not give weight to other people’s strongly held moral beliefs. It is already hard enough to not mess up AI alignment in a way that makes things worse for everyone, and it would become much harder still if we had half a dozen or more competing teams who each wanted to get their idiosyncratic view of the future installed.
BTW note that value differences are not the only thing that can get you into trouble. If you hold an important empirical beliefs that others do not share, and you cannot convince them of it, then it may appear to you as though you’re justified to do something radical about it, but that’s even more likely to be a bad idea because the reasons for taking peer disagreement seriously are stronger in empirical domains of dispute than in normative ones.
There is a sea of considerations from Kantianism, contractualism, norms for stable/civil societies and advanced decision theory that, while each line of argument seems tentative on its own and open to skepticism, all taken together point very strongly into the same direction, namely that things will be horrible if we fail to cooperate with each other and that cooperating is often the truly rational thing to do. You’re probably already familiar with a lot of this, but for general reference, see also this recent paper that makes a particularly interesting case for particularly strong cooperation, as well as other work on the topic, e.g. here and here.
This is why I believe that people interested in any particular version of utilitronium should not override AI alignment procedures last minute just to get an extra large share of cosmic stakes for their own value system, and why I believe that people like me, who care primarily about reducing suffering, should not increase existential risk. Of course, all of this means that people who want to benefit human values in general should take particular caution to make sure that idiosyncratic value systems that may diverge from them also receive consideration and gains from trade.
This piece I wrote recently is relevant to cooperation and the question of whether values are subjective or not, and how much convergence we should expect and to what extent value extrapolation procedures bake in certain (potentially unilateral) assumptions.
I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.
Ah, okay. Well, in that case you can just read my original comment as an argument for why one would want to use psychology to design an AI that was capable of correctly figuring out just a single person’s values and implementing them, as that’s obviously a prerequisite for figuring out everybody’s values. The stuff that I had about social consensus was just an argument aimed at moral realists, if you’re not one then it’s probably not relevant for you.
(my values would still say that we should try to take everyone’s values into account, but that disagreement is distinct from the whole “is psychology useful for value learning” question)
I’m puzzled by this remark:
Sorry, my mistake—I confused utilitronium with hedonium.
Whoops. I can see how my responses didn’t make my own position clear.
I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.
I consider it a likely futile effort to integrate important and substantive discussions into contemporary moral philosophy. If engaging with moral philosophy introduces unproductive digressions/confusions/misplaced priorities into the discussion it may do more harm than good.
I’m puzzled by this remark:
I view utilitronium as an end, not a means. It is a logical consequence of wanting to maximize aggregate utility and is more or less a logical entailment of my moral views. I favor the production of whatever physical state of affairs yields the highest aggregate utility. This is, by definition, “utilitronium.” If I’m using the term in an unusual way I’m happy to propose a new label that conveys what I have in mind.
I totally sympathize with your sentiment and feel the same way about incorporating other people’s values in a superintelligent AI. If I just went with my own wish list for what the future should look like, I would not care about most other people’s wishes. I feel as though many other people are not even trying to be altruistic in the relevant sense that I want to be altruistic, and I don’t experience a lot of moral motivation to help accomplish people’s weird notions of altruistic goals, let alone any goals that are clearly non-altruistically motivated. In the same way I’d feel no strong (even lower, admittedly) motivation to help make the dreams of baby eating aliens come true.
Having said that, I am confident that it would screw things up for everyone if I followed a decision policy that does not give weight to other people’s strongly held moral beliefs. It is already hard enough to not mess up AI alignment in a way that makes things worse for everyone, and it would become much harder still if we had half a dozen or more competing teams who each wanted to get their idiosyncratic view of the future installed.
BTW note that value differences are not the only thing that can get you into trouble. If you hold an important empirical beliefs that others do not share, and you cannot convince them of it, then it may appear to you as though you’re justified to do something radical about it, but that’s even more likely to be a bad idea because the reasons for taking peer disagreement seriously are stronger in empirical domains of dispute than in normative ones.
There is a sea of considerations from Kantianism, contractualism, norms for stable/civil societies and advanced decision theory that, while each line of argument seems tentative on its own and open to skepticism, all taken together point very strongly into the same direction, namely that things will be horrible if we fail to cooperate with each other and that cooperating is often the truly rational thing to do. You’re probably already familiar with a lot of this, but for general reference, see also this recent paper that makes a particularly interesting case for particularly strong cooperation, as well as other work on the topic, e.g. here and here.
This is why I believe that people interested in any particular version of utilitronium should not override AI alignment procedures last minute just to get an extra large share of cosmic stakes for their own value system, and why I believe that people like me, who care primarily about reducing suffering, should not increase existential risk. Of course, all of this means that people who want to benefit human values in general should take particular caution to make sure that idiosyncratic value systems that may diverge from them also receive consideration and gains from trade.
This piece I wrote recently is relevant to cooperation and the question of whether values are subjective or not, and how much convergence we should expect and to what extent value extrapolation procedures bake in certain (potentially unilateral) assumptions.
Ah, okay. Well, in that case you can just read my original comment as an argument for why one would want to use psychology to design an AI that was capable of correctly figuring out just a single person’s values and implementing them, as that’s obviously a prerequisite for figuring out everybody’s values. The stuff that I had about social consensus was just an argument aimed at moral realists, if you’re not one then it’s probably not relevant for you.
(my values would still say that we should try to take everyone’s values into account, but that disagreement is distinct from the whole “is psychology useful for value learning” question)
Sorry, my mistake—I confused utilitronium with hedonium.