Even if we found the most agreeable available set of moral principles, that amount may turn out not to constitute the vast majority of people. It may not even reach a majority at all. It is possible that there simply is no moral theory that is acceptable to most people.
It’s certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?
More importantly, it is unclear whether or not I have any rational or moral obligation to care about the outputs of this exercise. I do not want to implement the moral system that most people find agreeable.
Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you’d be convinced that this solution really does satisfy all the things you care about—and all the things that most other people care about, too.
From a more pragmatic perspective, you could try to insist on an AI which implemented your values specifically—but then everyone else would also have a reason to fight to get an AI which fulfilled their values specifically, and if it was you versus everyone else in the world, it seems like a pretty high probability that somebody else would win. Which means that your values would have a much higher chance of getting shafted than if everyone had agreed to go for a solution which tried to take into everyone’s preferences into account.
It’s certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?
Sure. That isn’t my primary objection though. My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.
Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you’d be convinced that this solution really does satisfy all the things you care about—and all the things that most other people care about, too.
I want to convert all matter in the universe to utilitronium. Do you think it is likely that an AI that factored in the values of all humans would yield this as its solution? I do not. Since I think the expected utility of most other likely solutions, given what I suspect about other people’s values, is far less than this, I would view almost any scenario other than imposing my values on everyone else to be a cosmic disaster.
My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.
Well, what alternative would you propose? I don’t see how it would even be possible to get any stronger evidence for the moral truth of a theory, than the failure of everyone to come up with convincing objections to it even after extended investigation. Nor a strategy for testing the truth which wouldn’t at some point reduce to “test X gives us reason to disagree with the theory”.
I would understand your disagreement if you were a moral antirealist, but your comments seem to imply that you do believe that a moral truth exists and that it’s possible to get information about it, and that it’s possible to do “heavy metaethical lifting”. But how?
I want to convert all matter in the universe to utilitronium.
What the first communist revolutionaries thought would happen, as the empirical
consequence of their revolution, was that people’s lives would improve: laborers would
no longer work long hours at backbreaking labor and make little money from it. This
turned out not to be the case, to put it mildly. But what the first communists thought
would happen, was not so very different from what advocates of other political systems
thought would be the empirical consequence of their favorite political systems. They
thought people would be happy. They were wrong.
Now imagine that someone should attempt to program a “Friendly” AI to implement
communism, or libertarianism, or anarcho-feudalism, or favoritepoliticalsystem, believing
that this shall bring about utopia. People’s favorite political systems inspire blazing suns
of positive affect, so the proposal will sound like a really good idea to the proposer.
We could view the programmer’s failure on a moral or ethical level—say that it is the
result of someone trusting themselves too highly, failing to take into account their own
fallibility, refusing to consider the possibility that communism might be mistaken after
all. But in the language of Bayesian decision theory, there’s a complementary technical
view of the problem. From the perspective of decision theory, the choice for communism
stems from combining an empirical belief with a value judgment. The empirical belief is
that communism, when implemented, results in a specific outcome or class of outcomes:
people will be happier, work fewer hours, and possess greater material wealth. This is
ultimately an empirical prediction; even the part about happiness is a real property of
brain states, though hard to measure. If you implement communism, either this outcome
eventuates or it does not. The value judgment is that this outcome satisfices or is
preferable to current conditions. Given a different empirical belief about the actual realworld
consequences of a communist system, the decision may undergo a corresponding
change.
We would expect a true AI, an Artificial General Intelligence, to be capable of changing
its empirical beliefs (or its probabilistic world-model, et cetera). If somehow Charles
Babbage had lived before Nicolaus Copernicus, and somehow computers had been invented
before telescopes, and somehow the programmers of that day and age successfully
created an Artificial General Intelligence, it would not follow that the AI would believe
forever after that the Sun orbited the Earth. The AI might transcend the factual error
of its programmers, provided that the programmers understood inference rather better
than they understood astronomy. To build an AI that discovers the orbits of the planets,
the programmers need not know the math of Newtonian mechanics, only the math of
Bayesian probability theory.
The folly of programming an AI to implement communism, or any other political
system, is that you’re programming means instead of ends. You’re programming in a fixed
decision, without that decision being re-evaluable after acquiring improved empirical
knowledge about the results of communism. You are giving the AI a fixed decision
without telling the AI how to re-evaluate, at a higher level of intelligence, the fallible
process which produced that decision.
Whoops. I can see how my responses didn’t make my own position clear.
I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.
I consider it a likely futile effort to integrate important and substantive discussions into contemporary moral philosophy. If engaging with moral philosophy introduces unproductive digressions/confusions/misplaced priorities into the discussion it may do more harm than good.
I’m puzzled by this remark:
I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.
I view utilitronium as an end, not a means. It is a logical consequence of wanting to maximize aggregate utility and is more or less a logical entailment of my moral views. I favor the production of whatever physical state of affairs yields the highest aggregate utility. This is, by definition, “utilitronium.” If I’m using the term in an unusual way I’m happy to propose a new label that conveys what I have in mind.
I totally sympathize with your sentiment and feel the same way about incorporating other people’s values in a superintelligent AI. If I just went with my own wish list for what the future should look like, I would not care about most other people’s wishes. I feel as though many other people are not even trying to be altruistic in the relevant sense that I want to be altruistic, and I don’t experience a lot of moral motivation to help accomplish people’s weird notions of altruistic goals, let alone any goals that are clearly non-altruistically motivated. In the same way I’d feel no strong (even lower, admittedly) motivation to help make the dreams of baby eating aliens come true.
Having said that, I am confident that it would screw things up for everyone if I followed a decision policy that does not give weight to other people’s strongly held moral beliefs. It is already hard enough to not mess up AI alignment in a way that makes things worse for everyone, and it would become much harder still if we had half a dozen or more competing teams who each wanted to get their idiosyncratic view of the future installed.
BTW note that value differences are not the only thing that can get you into trouble. If you hold an important empirical beliefs that others do not share, and you cannot convince them of it, then it may appear to you as though you’re justified to do something radical about it, but that’s even more likely to be a bad idea because the reasons for taking peer disagreement seriously are stronger in empirical domains of dispute than in normative ones.
There is a sea of considerations from Kantianism, contractualism, norms for stable/civil societies and advanced decision theory that, while each line of argument seems tentative on its own and open to skepticism, all taken together point very strongly into the same direction, namely that things will be horrible if we fail to cooperate with each other and that cooperating is often the truly rational thing to do. You’re probably already familiar with a lot of this, but for general reference, see also this recent paper that makes a particularly interesting case for particularly strong cooperation, as well as other work on the topic, e.g. here and here.
This is why I believe that people interested in any particular version of utilitronium should not override AI alignment procedures last minute just to get an extra large share of cosmic stakes for their own value system, and why I believe that people like me, who care primarily about reducing suffering, should not increase existential risk. Of course, all of this means that people who want to benefit human values in general should take particular caution to make sure that idiosyncratic value systems that may diverge from them also receive consideration and gains from trade.
This piece I wrote recently is relevant to cooperation and the question of whether values are subjective or not, and how much convergence we should expect and to what extent value extrapolation procedures bake in certain (potentially unilateral) assumptions.
I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.
Ah, okay. Well, in that case you can just read my original comment as an argument for why one would want to use psychology to design an AI that was capable of correctly figuring out just a single person’s values and implementing them, as that’s obviously a prerequisite for figuring out everybody’s values. The stuff that I had about social consensus was just an argument aimed at moral realists, if you’re not one then it’s probably not relevant for you.
(my values would still say that we should try to take everyone’s values into account, but that disagreement is distinct from the whole “is psychology useful for value learning” question)
I’m puzzled by this remark:
Sorry, my mistake—I confused utilitronium with hedonium.
It’s certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?
Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you’d be convinced that this solution really does satisfy all the things you care about—and all the things that most other people care about, too.
From a more pragmatic perspective, you could try to insist on an AI which implemented your values specifically—but then everyone else would also have a reason to fight to get an AI which fulfilled their values specifically, and if it was you versus everyone else in the world, it seems like a pretty high probability that somebody else would win. Which means that your values would have a much higher chance of getting shafted than if everyone had agreed to go for a solution which tried to take into everyone’s preferences into account.
And of course, in the context of AI, everyone insisting on their own values and their values only means that we’ll get arms races, meaning a higher probability of a worse outcome for everyone.
See also Gains from Trade Through Compromise.
Sure. That isn’t my primary objection though. My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.
I want to convert all matter in the universe to utilitronium. Do you think it is likely that an AI that factored in the values of all humans would yield this as its solution? I do not. Since I think the expected utility of most other likely solutions, given what I suspect about other people’s values, is far less than this, I would view almost any scenario other than imposing my values on everyone else to be a cosmic disaster.
Well, what alternative would you propose? I don’t see how it would even be possible to get any stronger evidence for the moral truth of a theory, than the failure of everyone to come up with convincing objections to it even after extended investigation. Nor a strategy for testing the truth which wouldn’t at some point reduce to “test X gives us reason to disagree with the theory”.
I would understand your disagreement if you were a moral antirealist, but your comments seem to imply that you do believe that a moral truth exists and that it’s possible to get information about it, and that it’s possible to do “heavy metaethical lifting”. But how?
I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.
Whoops. I can see how my responses didn’t make my own position clear.
I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.
I consider it a likely futile effort to integrate important and substantive discussions into contemporary moral philosophy. If engaging with moral philosophy introduces unproductive digressions/confusions/misplaced priorities into the discussion it may do more harm than good.
I’m puzzled by this remark:
I view utilitronium as an end, not a means. It is a logical consequence of wanting to maximize aggregate utility and is more or less a logical entailment of my moral views. I favor the production of whatever physical state of affairs yields the highest aggregate utility. This is, by definition, “utilitronium.” If I’m using the term in an unusual way I’m happy to propose a new label that conveys what I have in mind.
I totally sympathize with your sentiment and feel the same way about incorporating other people’s values in a superintelligent AI. If I just went with my own wish list for what the future should look like, I would not care about most other people’s wishes. I feel as though many other people are not even trying to be altruistic in the relevant sense that I want to be altruistic, and I don’t experience a lot of moral motivation to help accomplish people’s weird notions of altruistic goals, let alone any goals that are clearly non-altruistically motivated. In the same way I’d feel no strong (even lower, admittedly) motivation to help make the dreams of baby eating aliens come true.
Having said that, I am confident that it would screw things up for everyone if I followed a decision policy that does not give weight to other people’s strongly held moral beliefs. It is already hard enough to not mess up AI alignment in a way that makes things worse for everyone, and it would become much harder still if we had half a dozen or more competing teams who each wanted to get their idiosyncratic view of the future installed.
BTW note that value differences are not the only thing that can get you into trouble. If you hold an important empirical beliefs that others do not share, and you cannot convince them of it, then it may appear to you as though you’re justified to do something radical about it, but that’s even more likely to be a bad idea because the reasons for taking peer disagreement seriously are stronger in empirical domains of dispute than in normative ones.
There is a sea of considerations from Kantianism, contractualism, norms for stable/civil societies and advanced decision theory that, while each line of argument seems tentative on its own and open to skepticism, all taken together point very strongly into the same direction, namely that things will be horrible if we fail to cooperate with each other and that cooperating is often the truly rational thing to do. You’re probably already familiar with a lot of this, but for general reference, see also this recent paper that makes a particularly interesting case for particularly strong cooperation, as well as other work on the topic, e.g. here and here.
This is why I believe that people interested in any particular version of utilitronium should not override AI alignment procedures last minute just to get an extra large share of cosmic stakes for their own value system, and why I believe that people like me, who care primarily about reducing suffering, should not increase existential risk. Of course, all of this means that people who want to benefit human values in general should take particular caution to make sure that idiosyncratic value systems that may diverge from them also receive consideration and gains from trade.
This piece I wrote recently is relevant to cooperation and the question of whether values are subjective or not, and how much convergence we should expect and to what extent value extrapolation procedures bake in certain (potentially unilateral) assumptions.
Ah, okay. Well, in that case you can just read my original comment as an argument for why one would want to use psychology to design an AI that was capable of correctly figuring out just a single person’s values and implementing them, as that’s obviously a prerequisite for figuring out everybody’s values. The stuff that I had about social consensus was just an argument aimed at moral realists, if you’re not one then it’s probably not relevant for you.
(my values would still say that we should try to take everyone’s values into account, but that disagreement is distinct from the whole “is psychology useful for value learning” question)
Sorry, my mistake—I confused utilitronium with hedonium.