Defining just what it is that human values are. The project of AI safety can roughly be defined as “the challenge of ensuring that AIs remain aligned with human values”, but it’s also widely acknowledged that nobody really knows what exactly human values are—or at least, not to a sufficient extent that they could be given a formal definition and programmed into an AI. This seems like one of the core problems of AI safety, and one which can only be understood with a psychology-focused research program.
Defining human values, at least in the prescriptive sense, is not a psychological issue at all. It’s a philosophical issue. Certain philosophers have believed that psychology can inform moral philosophy, but it’s a stretch to say that even someone like Joshua Greene’s work in experimental philosophy is a psychology-focused research program, and the whole approach is dubious—see, e.g., The Normative Insignificance of Neuroscience (http://www.pgrim.org/philosophersannual/29articles/berkerthenormative.pdf). Of course a new wave of pop-philosophers and internet bloggers have made silly claims that moral philosophy can be completely solved by psychology and neuroscience but this extreme view is ridiculous on its face.
What people believe doesn’t tell us much about what actually is good. The challenge of AI safety is the challenge of making AI that actually does what is right, not AI that does whatever it’s told to do by a corrupt government, a racist constituency, and so on.
It took me a while to respond to this because I wanted to take the time to read “The Normative Insignificance of Neuroscience” first. Having now read it, I’d say that I agree with its claims with regard to criticism of Greene’s approach. I don’t think it disproves the notion of psychology being useful for defining human values, though, for I think there’s an argument for psychology’s usefulness that’s entirely distinct from the specific approach that Greene is taking.
I start from the premise that the goal of moral philosophy is to develop a set of explicit principles that would tell us what is good. Now this is particularly relevant for designing AI, because we also want our AIs to follow those principles. But it’s noteworthy that at their current state, none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered “good”. E.g. Muehlhauser & Helm 2012:
Let us consider the implications of programming a machine superoptimizer to implement particular moral theories.
We begin with hedonistic utilitarianism, a theory still defended today (Tännsjö 1998). If a machine superoptimizer’s goal system is programmed to maximize pleasure, then it might, for example, tile the local universe with tiny digital minds running continuous loops of a single, maximally pleasurable experience. We can’t predict exactly what a hedonistic utilitarian machine superoptimizer would do, but we think it seems likely to produce unintended consequences, for reasons we hope will become clear. [...]
Suppose “pleasure” was specified (in the machine superoptimizer’s goal system) in terms of our current understanding of the human neurobiology of pleasure. Aldridge and Berridge (2009) report that according to “an emerging consensus,” pleasure is “not a sensation” but instead a “pleasure gloss” added to sensations by “hedonic hotspots” in the ventral pallidum and other regions of the brain. A sensation is encoded by a particular pattern of neural activity, but it is not pleasurable in itself. To be pleasurable, the sensation must be “painted” with a pleasure gloss represented by additional neural activity activated by a hedonic hotspot (Smith et al. 2009).
A machine superoptimizer with a goal system programmed to maximize human pleasure (in this sense) could use nanotechnology or advanced pharmaceuticals or neurosurgery to apply maximum pleasure gloss to all human sensations—a scenario not unlike that of plugging us all into Nozick’s experience machines (Nozick 1974, 45). Or, it could use these tools to restructure our brains to apply maximum pleasure gloss to one consistent experience it could easily create for us, such as lying immobile on the ground.
Or suppose “pleasure” was specified more broadly, in terms of anything that functioned as a reward signal—whether in the human brain’s dopaminergic reward system (Dreher and Tremblay 2009), or in a digital mind’s reward signal circuitry (Sutton and Barto 1998). A machine superoptimizer with the goal of maximizing reward signal scores could tile its environs with trillions of tiny minds, each one running its reward signal up to the highest number it could. [...]
What if a machine superoptimizer was programmed to maximize desire satisfaction in humans? Human desire is implemented by the dopaminergic reward system (Schroeder 2004; Berridge, Robinson, and Aldridge 2009), and a machine superoptimizer mizer could likely get more utility by (1) rewiring human neurology so that we attain maximal desire satisfaction while lying quietly on the ground than by (2) building and maintaining a planet-wide utopia that caters perfectly to current human preferences. [...]
Consequentialist designs for machine goal systems face a host of other concerns (Shulman, Jonsson, and Tarleton 2009b), for example the difficulty of interpersonal comparisons of utility (Binmore 2009), and the counterintuitive implications of some methods of value aggregation (Parfit 1986; Arrhenius 2011). [...]
We cannot show that every moral theory yet conceived would produce substantially unwanted consequences if used in the goal system of a machine superoptimizer. Philosophers have been prolific in producing new moral theories, and we do not have the space here to consider the prospects (for use in the goal system of a machine superoptimizer) for a great many modern moral theories. These include rule utilitarianism (Harsanyi 1977), motive utilitarianism (Adams 1976), two-level utilitarianism (Hare 1982), prioritarianism (Arneson 1999), perfectionism (Hurka 1993), welfarist utilitarianism (Sen 1979), virtue consequentialism (Bradley 2005), Kantian consequentialism (Cummiskey 1996), global consequentialism (Pettit and Smith 2000), virtue theories (Hursthouse 2012), contractarian theories (Cudd 2008), Kantian deontology (R. Johnson 2010), and Ross’ prima facie duties (Anderson, Anderson, and Armen 2006).
Yet the problem remains: the AI has to be programmed with some definition of what is good.
Now this alone isn’t yet sufficient to show that philosophy wouldn’t be up to the task. But philosophy has been trying to solve ethics for at least the last 2500 years, and it doesn’t look like there would have been any major progress towards solving it. The PhilPapers survey didn’t show any of the three major ethical schools (consequentialism, deontology, virtue ethics) being significantly more favored by professional philosophers than the others, nor does anyone—to my knowledge—even know what a decisive theoretical argument in favor of one of them could be.
And at this point, we have pretty good theoretical reasons for believing that the traditional goal of moral philosophy—“developing a set of explicit principles for telling us what is good”—is in fact impossible. Or at least, it’s impossible to develop a set of principles that would be simple and clear enough to write down in human-understandable form and which would give us clear answers to every situation, because morality is too complicated for that.
We’ve already seen this in trying to define concepts: as philosophy noted a long time ago, you can’t come up with a set of explicit rules that would define even any concept even as simple as “man” in such a way that nobody could develop a counterexample. “The Normative Insignificance of Neuroscience” also notes that the situation in ethics looks similar to the situation with trying to define many other concepts:
… what makes the trolley problem so hard—indeed, what has led some to despair of our ever finding a solution to it—is that for nearly every principle that has been proposed to explain our intuitions about trolley cases, some ingenious person has devised a variant of the classic trolley scenario for which that principle yields counterintuitive results. Thus as with the Gettier literature in epistemology and the causation and personal identity literatures in metaphysics, increasingly baroque proposals have given way to increasingly complex counterexamples, and though some have continued to struggle with the trolley problem, many others have simply given up and moved on to other topics.
Yet human brains do manage to successfully reason with concepts, despite it being impossible to develop a set of explicit necessary and sufficient criteria. The evidence from both psychology and artificial intelligence (where we’ve managed to train neural nets capable of reasonably good object recognition) is that a big part of how they do it is by building up complicated statistical models of what counts as a “man” or “philosopher” or whatever.
So given that
we can’t build explicit verbal models of what a concept is * but we can build machine-learning algorithms that use complicated statistical analysis to identify instances of a concept
and
defining morality looks similar to defining concepts, in that we can’t build explicit verbal models of what morality is
it would seem reasonable to assume that
we can build machine-learning algorithms that can learn to define morality, in that it can give such answers to moral dilemmas that a vast majority of people would consider them acceptable
But here it looks likely that we need information from psychology to narrow down what those models should be. What humans consider to be good has likely been influenced by a number of evolutionary idiosyncrasies, so if we want to come up with a model of morality that most humans would agree with, then our AI’s reasoning process should take into account those considerations. And we’ve already established that defining those considerations on a verbal level looks insufficient—they have to be established on a deeper level, of “what are the actual computational processes that are involved when the brain computes morality”.
Yes, I am here assuming “what is good” to equate to “what do human brains consider good”, in a way that may be seen as reducing to “what would human brains accept as a persuasive argument for what is good”. You could argue that this is flawed, because it’s getting dangerously close to defining “good” by social consensus. But then again, the way the field of ethics itself proceeds is basically the same: a philosopher presents an argument for what is good, another attacks it, if the argument survives attacks and is compelling then it is eventually accepted. For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable—due to the is-ought gap—that some degree of “truth by social consensus” is the only way of figuring out what the truth is, even in principle.
Even if we found the most agreeable available set of moral principles, that amount may turn out not to constitute the vast majority of people. It may not even reach a majority at all. It is possible that there simply is no moral theory that is acceptable to most people. People may just have irreconcilable values. You state that:
“For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable—due to the is-ought gap—that some degree of “truth by social consensus” is the only way of figuring out what the truth is, even in principle.”
Suppose this is the best we can do. It doesn’t follow that the outputs of this exercise are “true.” I am not sure in what sense this would constitute a true set of moral principles.
More importantly, it is unclear whether or not I have any rational or moral obligation to care about the outputs of this exercise. I do not want to implement the moral system that most people find agreeable. On the contrary, I want everyone to share my moral views, because this is what, fundamentally, I care about. The notion that we should care about what others care about, and implement whatever the consensus is, seems to presume a very strong and highly contestable metaethical position that I do not accept and do not think others should accept.
Even if we found the most agreeable available set of moral principles, that amount may turn out not to constitute the vast majority of people. It may not even reach a majority at all. It is possible that there simply is no moral theory that is acceptable to most people.
It’s certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?
More importantly, it is unclear whether or not I have any rational or moral obligation to care about the outputs of this exercise. I do not want to implement the moral system that most people find agreeable.
Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you’d be convinced that this solution really does satisfy all the things you care about—and all the things that most other people care about, too.
From a more pragmatic perspective, you could try to insist on an AI which implemented your values specifically—but then everyone else would also have a reason to fight to get an AI which fulfilled their values specifically, and if it was you versus everyone else in the world, it seems like a pretty high probability that somebody else would win. Which means that your values would have a much higher chance of getting shafted than if everyone had agreed to go for a solution which tried to take into everyone’s preferences into account.
It’s certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?
Sure. That isn’t my primary objection though. My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.
Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you’d be convinced that this solution really does satisfy all the things you care about—and all the things that most other people care about, too.
I want to convert all matter in the universe to utilitronium. Do you think it is likely that an AI that factored in the values of all humans would yield this as its solution? I do not. Since I think the expected utility of most other likely solutions, given what I suspect about other people’s values, is far less than this, I would view almost any scenario other than imposing my values on everyone else to be a cosmic disaster.
My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.
Well, what alternative would you propose? I don’t see how it would even be possible to get any stronger evidence for the moral truth of a theory, than the failure of everyone to come up with convincing objections to it even after extended investigation. Nor a strategy for testing the truth which wouldn’t at some point reduce to “test X gives us reason to disagree with the theory”.
I would understand your disagreement if you were a moral antirealist, but your comments seem to imply that you do believe that a moral truth exists and that it’s possible to get information about it, and that it’s possible to do “heavy metaethical lifting”. But how?
I want to convert all matter in the universe to utilitronium.
What the first communist revolutionaries thought would happen, as the empirical
consequence of their revolution, was that people’s lives would improve: laborers would
no longer work long hours at backbreaking labor and make little money from it. This
turned out not to be the case, to put it mildly. But what the first communists thought
would happen, was not so very different from what advocates of other political systems
thought would be the empirical consequence of their favorite political systems. They
thought people would be happy. They were wrong.
Now imagine that someone should attempt to program a “Friendly” AI to implement
communism, or libertarianism, or anarcho-feudalism, or favoritepoliticalsystem, believing
that this shall bring about utopia. People’s favorite political systems inspire blazing suns
of positive affect, so the proposal will sound like a really good idea to the proposer.
We could view the programmer’s failure on a moral or ethical level—say that it is the
result of someone trusting themselves too highly, failing to take into account their own
fallibility, refusing to consider the possibility that communism might be mistaken after
all. But in the language of Bayesian decision theory, there’s a complementary technical
view of the problem. From the perspective of decision theory, the choice for communism
stems from combining an empirical belief with a value judgment. The empirical belief is
that communism, when implemented, results in a specific outcome or class of outcomes:
people will be happier, work fewer hours, and possess greater material wealth. This is
ultimately an empirical prediction; even the part about happiness is a real property of
brain states, though hard to measure. If you implement communism, either this outcome
eventuates or it does not. The value judgment is that this outcome satisfices or is
preferable to current conditions. Given a different empirical belief about the actual realworld
consequences of a communist system, the decision may undergo a corresponding
change.
We would expect a true AI, an Artificial General Intelligence, to be capable of changing
its empirical beliefs (or its probabilistic world-model, et cetera). If somehow Charles
Babbage had lived before Nicolaus Copernicus, and somehow computers had been invented
before telescopes, and somehow the programmers of that day and age successfully
created an Artificial General Intelligence, it would not follow that the AI would believe
forever after that the Sun orbited the Earth. The AI might transcend the factual error
of its programmers, provided that the programmers understood inference rather better
than they understood astronomy. To build an AI that discovers the orbits of the planets,
the programmers need not know the math of Newtonian mechanics, only the math of
Bayesian probability theory.
The folly of programming an AI to implement communism, or any other political
system, is that you’re programming means instead of ends. You’re programming in a fixed
decision, without that decision being re-evaluable after acquiring improved empirical
knowledge about the results of communism. You are giving the AI a fixed decision
without telling the AI how to re-evaluate, at a higher level of intelligence, the fallible
process which produced that decision.
Whoops. I can see how my responses didn’t make my own position clear.
I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.
I consider it a likely futile effort to integrate important and substantive discussions into contemporary moral philosophy. If engaging with moral philosophy introduces unproductive digressions/confusions/misplaced priorities into the discussion it may do more harm than good.
I’m puzzled by this remark:
I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.
I view utilitronium as an end, not a means. It is a logical consequence of wanting to maximize aggregate utility and is more or less a logical entailment of my moral views. I favor the production of whatever physical state of affairs yields the highest aggregate utility. This is, by definition, “utilitronium.” If I’m using the term in an unusual way I’m happy to propose a new label that conveys what I have in mind.
I totally sympathize with your sentiment and feel the same way about incorporating other people’s values in a superintelligent AI. If I just went with my own wish list for what the future should look like, I would not care about most other people’s wishes. I feel as though many other people are not even trying to be altruistic in the relevant sense that I want to be altruistic, and I don’t experience a lot of moral motivation to help accomplish people’s weird notions of altruistic goals, let alone any goals that are clearly non-altruistically motivated. In the same way I’d feel no strong (even lower, admittedly) motivation to help make the dreams of baby eating aliens come true.
Having said that, I am confident that it would screw things up for everyone if I followed a decision policy that does not give weight to other people’s strongly held moral beliefs. It is already hard enough to not mess up AI alignment in a way that makes things worse for everyone, and it would become much harder still if we had half a dozen or more competing teams who each wanted to get their idiosyncratic view of the future installed.
BTW note that value differences are not the only thing that can get you into trouble. If you hold an important empirical beliefs that others do not share, and you cannot convince them of it, then it may appear to you as though you’re justified to do something radical about it, but that’s even more likely to be a bad idea because the reasons for taking peer disagreement seriously are stronger in empirical domains of dispute than in normative ones.
There is a sea of considerations from Kantianism, contractualism, norms for stable/civil societies and advanced decision theory that, while each line of argument seems tentative on its own and open to skepticism, all taken together point very strongly into the same direction, namely that things will be horrible if we fail to cooperate with each other and that cooperating is often the truly rational thing to do. You’re probably already familiar with a lot of this, but for general reference, see also this recent paper that makes a particularly interesting case for particularly strong cooperation, as well as other work on the topic, e.g. here and here.
This is why I believe that people interested in any particular version of utilitronium should not override AI alignment procedures last minute just to get an extra large share of cosmic stakes for their own value system, and why I believe that people like me, who care primarily about reducing suffering, should not increase existential risk. Of course, all of this means that people who want to benefit human values in general should take particular caution to make sure that idiosyncratic value systems that may diverge from them also receive consideration and gains from trade.
This piece I wrote recently is relevant to cooperation and the question of whether values are subjective or not, and how much convergence we should expect and to what extent value extrapolation procedures bake in certain (potentially unilateral) assumptions.
I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.
Ah, okay. Well, in that case you can just read my original comment as an argument for why one would want to use psychology to design an AI that was capable of correctly figuring out just a single person’s values and implementing them, as that’s obviously a prerequisite for figuring out everybody’s values. The stuff that I had about social consensus was just an argument aimed at moral realists, if you’re not one then it’s probably not relevant for you.
(my values would still say that we should try to take everyone’s values into account, but that disagreement is distinct from the whole “is psychology useful for value learning” question)
I’m puzzled by this remark:
Sorry, my mistake—I confused utilitronium with hedonium.
It took me a while to respond to this because I wanted to take the time to read “The Normative Insignificance of Neuroscience” first.
Great!
But philosophy has been trying to solve ethics for at least the last 2500 years, and it doesn’t look like there would have been any major progress towards solving it. The PhilPapers survey didn’t show any of the three major ethical schools (consequentialism, deontology, virtue ethics) being significantly more favored by professional philosophers than the others, nor does anyone—to my knowledge—even know what a decisive theoretical argument in favor of one of them could be.
Restricting analysis to the Western tradition, 2500 years ago we barely had any conception of virtue ethics. Our contemporary conceptions of virtue ethics are much better than the ones the Greeks had. Meanwhile, deontological and consequentialist ethics did not even exist back then. Even over recent decades there has been progress in these positions. And plenty of philosophers know what a decisive theoretical argument could be: either they purport to have identified such arguments, or they think it would be an argument that showed the theory to be well supported by intuitions, reason, or some other evidence, not generally different from what an argument for a non-moral philosophical theory would look like.
it’s noteworthy that at their current state, none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered “good”.
It would (arguably) give results that people wouldn’t like, but assuming that the moral theory is correct and the machine understands it, almost necessarily it would do morally correct things. If you object to its actions then you are already begging the question by asserting that we ought to be focused on building a machine that will do things that we like regardless of whether they are moral. Moreover, you could tell a similar story for any values that people have. Whether you source them from real philosophy or from layman ethics wouldn’t change the problems of optimization and systematization.
And at this point, we have pretty good theoretical reasons for believing that the traditional goal of moral philosophy—“developing a set of explicit principles for telling us what is good”—is in fact impossible.
But that’s an even stronger claim than the one that moral philosophy hasn’t progressed towards such a goal. What reasons are there?
Or at least, it’s impossible to develop a set of principles that would be simple and clear enough to write down in human-understandable form and which would give us clear answers to every situation, because morality is too complicated for that.
That’s contentious, but some philosophers believe that, and there are philosophies which adhere to that. The problem of figuring out how to make a machine behave morally according to those premises is still a philosophical one, just one based on other ideas in moral philosophy besides explicit rule-based ones.
Yes, I am here assuming “what is good” to equate to “what do human brains consider good”, in a way that may be seen as reducing to “what would human brains accept as a persuasive argument for what is good”. You could argue that this is flawed, because it’s getting dangerously close to defining “good” by social consensus. But then again, the way the field of ethics itself proceeds is basically the same: a philosopher presents an argument for what is good, another attacks it, if the argument survives attacks and is compelling then it is eventually accepted.
Except the field of ethics does it with actual arguments among experts in the field. You could make the same story for any field: truths about physics can be determined by social consensus, since that’s just what the field of physics is, a physicist presents an experiment or hypothesis, another attacks it, if the hypothesis survives the attacks and is compelling then it is eventually accepted! And so on for all non-moral fields of inquiry as well. I don’t see why you think ethics would be special; basically everything can be modeled like this. But that’s ridiculous. We don’t look at social consensus for all forms of inquiry, because there is a difference between what ordinary people believe and what people believe when they are trained professionals in the subject.
for moral truths it looks to me unavoidable—due to the is-ought gap—that some degree of “truth by social consensus” is the only way of figuring out what the truth is, even in principle.
Then why don’t you believe in morality by social consensus? (Or do you? It seems like you’re probably not, given that you’re an effective altruist. What do you think about animal rights, or Sharia law?)
(We seem to be talking past each other in some weird way; I’m not even sure what exactly it is that we’re disagreeing over.)
It would (arguably) give results that people wouldn’t like, but assuming that the moral theory is correct and the machine understands it, almost necessarily it would do morally correct things.
Well sure, if we proceed from the assumption that the moral theory really was correct, but the point was that none of those proposed theories has been generally accepted by moral philosophers.
But that’s an even stronger claim than the one that moral philosophy hasn’t progressed towards such a goal. What reasons are there?
I gave one in the comment? That philosophy has accepted that you can’t give a set of human-comprehensible set of necessary and sufficient criteria for concepts, and if you want a system for classifying concepts you have to use psychology and machine learning; and it looks like morality is similar.
Except the field of ethics does it with actual arguments among experts in the field. You could make the same story for any field: truths about physics can be determined by social consensus, since that’s just what the field of physics is, a physicist presents an experiment or hypothesis, another attacks it, if the hypothesis survives the attacks and is compelling then it is eventually accepted! And so on for all non-moral fields of inquiry as well. I don’t see why you think ethics would be special; basically everything can be modeled like this. But that’s ridiculous. We don’t look at social consensus for all forms of inquiry, because there is a difference between what ordinary people believe and what people believe when they are trained professionals in the subject.
I’m not sure what exactly you’re disagreeing with? It seems obvious to me that physics does indeed proceed by social consensus in the manner you describe. Someone does an experiment, then others replicate the experiment until there is consensus that this experiment really does produce these results; somebody proposes a hypothesis to explain the experimental results, others point out holes in that hypothesis, there’s an extended back-and-forth conversation and further experiments until there is a consensus that the modified hypothesis really does explain the results and that it can be accepted as an established scientific law. And the same for all other scientific and philosophical disciplines. I don’t think that ethics is special in that sense.
Sure, there is a difference between what ordinary people believe and what people believe when they’re trained professionals: that’s why you look for a social consensus among the people who are trained professionals and have considered the topic in detail, not among the general public.
Then why don’t you believe in morality by social consensus? (Or do you? It seems like you’re probably not, given that you’re an effective altruist.
I do believe in morality by social consensus, in the same manner as I believe in physics by social consensus: if I’m told that the physics community has accepted it as an established fact that e=mc^2 and that there’s no dispute or uncertainty about this, then I’ll accept it as something that’s probably true. If I thought that it was particularly important for me to make sure that this was correct, then I might look up the exact reasoning and experiments used to determine this and try to replicate some of them, until I found myself to also be in consensus with the physics community.
Similarly, if someone came to me with a theory of what was moral and it turned out that the entire community of moral philosophers had considered this theory and accepted it after extended examination, and I could also not find any objections to that and found the justifications compelling, then I would probably also accept the moral theory.
But to my knowledge, nobody has presented a conclusive moral theory that would satisfy both me and nearly all moral philosophers and which would say that it was wrong to be an effective altruist—quite the opposite. So I don’t see a problem in being an EA.
Well sure, if we proceed from the assumption that the moral theory really was correct, but the point was that none of those proposed theories has been generally accepted by moral philosophers.
Your point was that “none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered “good”.” But this claim is simply begging the question by assuming that all the existing theories are false. And to claim that a theory would have bad moral results is different from claiming that it’s not generally accepted by moral philosophers. It’s plausible that a theory would have good moral results, in virtue of it being correct, while not being accepted by many moral philosophers. Since there is no dominant moral theory, this is necessarily the case as long as some moral theory is correct.
I gave one in the comment? That philosophy has accepted that you can’t give a set of human-comprehensible set of necessary and sufficient criteria for concepts
If you’re referring to ethics, no, philosophy has not accepted that you cannot give such an account. You believe this, on the basis of your observation that philosophers give different accounts of ethics. But that doesn’t mean that moral philosophers believe it. They just don’t think that the fact of disagreement implies that no such account can be given.
It seems obvious to me that physics does indeed proceed by social consensus in the manner you describe. Someone does an experiment, then others replicate the experiment until there is consensus that this experiment really does produce these results; somebody proposes a hypothesis to explain the experimental results, others point out holes in that hypothesis, there’s an extended back-and-forth conversation and further experiments until there is a consensus that the modified hypothesis really does explain the results and that it can be accepted as an established scientific law. And the same for all other scientific and philosophical disciplines. I don’t think that ethics is special in that sense.
So you haven’t pointed out any particular features of ethics, you’ve merely described a feature of inquiry in general. This shows that your claim proves too much—it would be ridiculous to conduct physics by studying psychology.
Sure, there is a difference between what ordinary people believe and what people believe when they’re trained professionals: that’s why you look for a social consensus among the people who are trained professionals and have considered the topic in detail, not among the general public.
But that’s not a matter of psychological inquiry, that’s a matter of looking at what is being published in philosophy, becoming familiar with how philosophical arguments are formed, and staying in touch with current developments in the field. So you are basically describing studying philosophy. Studying or researching psychology will not tell you anything about this.
Also, I find pretty compelling the argument that the classical definition of moral philosophy in trying to define “the good” is both impossible and not even a particularly good target to aim at, and that trying to find generally-agreeable moral solutions is something much more useful; and if we accept this argument, then moral psychology is relevant, because it can help us figure out generally-agreeable solutions.
...there is a deeper point in Williams’s book that is even harder to rebut. Williams asks: What can an ethical theory do, if we are able to build a convincing case for one? He is skeptical about the force of ethical considerations and reminds us that even if we were to have a justified ethical theory, the person in question might not be concerned about it. Even if we could prove to some amoralists that what they are about to do is (a) against some universal ethical standard, (b) is detrimental to their own well-being, and/or (c) is against the demands of rationality or internal coherence, they still have the choice of whether to care about this or not. They can choose to act even if they know that what they are about to do is against some standard that they believe in. Robert Nozick—whom Williams quotes—describes this as follows: “Suppose that we show that some X he [the immoral man] holds or accepts or does commits him to behaving morally. He now must give up at least one of the following: (a) behaving immorally, (b) maintaining X, (c) being consistent about this matter in this respect. The immoral man tells us, ‘To tell you the truth, if I had to make the choice, I would give up being consistent’” (Nozick 1981, 408).
What Williams in effect says is that the noble task of finding ultimate justification for some ethical standards could not—even if it was successful—deliver any final argument in practical debates about how to behave. “Objective truth” would have only the motivational weight that the parties involved choose to give to it. It no longer is obvious what a philosophical justification of an ethical standard is supposed to do or even “why we should need such a thing” (Williams 1985, 23).
Yet when we look at many contemporary ethical debates, we can see that that they proceed as if the solutions to the questions they pose would matter. In most scientific disciplines the journal articles have a standard section called “practical bearings,” where the practical relevance of the accumulated results are discussed. Not so for metaethical articles, even though they otherwise simulate the academic and peer-reviewed writing style of scientific articles. When we read someone presenting a number of technical counterarguments against quasi-realist solutions to the Frege-Geach problem, there usually is no debate about what practical bearings the discussion would have, whether these arguments would be successful or not. Suppose that in some idealized future the questions posed by the Frege-Geach problem would be conclusively solved. A new argument would emerge that all parties would see as so valid and sound that they would agree that the problem has now been finally settled. What then? How would ordinary people behave differently, after the solution has been delivered to them? I would guess it is fair to say—at least until it is proven otherwise—that the outcome of these debates is only marginally relevant for any ordinary person’s ethical life. [...]
This understanding of morality means that we have to think anew what moral inquiry should aim at. [...] Whatever justification can be given for one moral doctrine over the other, it has to be found in practice—simply because there are no other options available. Accordingly, for pragmatists, moral inquiry is in the end directed toward practice, its successfulness is ultimately judged by the practical bearings it has on people’s experiences: “Unless a philosophy is to remain symbolic—or verbal—or a sentimental indulgence for a few, or else mere arbitrary dogma, its auditing of past experience and its program of values must take effect in conduct” (Dewey 1916, 315). Moral inquiry should thus aim at practice; its successfulness is ultimately measured by how it is able to influence people’s moral outlook and behavior. [...]
Moral principles, ideals, rules, theories, or conclusions should thus be seen “neither as a cookbook, nor a remote calculus” (Pappas 1997, 546) but as instruments that we can use to understand our behavior and change it for the better. Instead of trying to discover the correct ethical theories, the task becomes one of designing the most functional ethical theories. Ethics serves certain functions in human lives and in societies, and the task is to improve its ability to serve these functions (Kitcher 2011b). In other words, the aim of ethical theorizing is to provide people with tools (see Hickman 1990, 113–14) that help them in living their lives in a good and ethically sound way. [...]
It is true that the lack of foundational principles in ethics denies the pragmatist moral philosopher the luxury of being objectively right in some moral question. In moral disagreements, a pragmatist cannot “solve” the disagreement by relying on some objective standards that deliver the “right” and final answer. But going back to Williams’s argument raised at the beginning of this article, we can ask what would it help if we were to “solve” the problem. The other party still has the option to ignore our solution. Furthermore, despite the long history of ethics we still haven’t found many objective standards or “final solutions” that everyone would agree on, and thus it seems that waiting for such standards to emerge is futile.
In practice, there seem to be two ways in which moral disagreements are resolved. First is brute force. In some moral disputes I am in a position in which I can force the other party to comply with my standards whether that other party agrees with me or not. The state with its monopoly on the legitimate use of violence can force its citizens to comply with certain laws even when the personal moral code of these citizens would disagree with the law. The second way to resolve a moral disagreement is to find some common ground, some standards that the other believes in, and start building from there a case for one’s own position.
In the end, it might be beneficial that pragmatism annihilates the possibility of believing that I am absolutely right and the other party is absolutely wrong. As Margolis notes: “The most monstrous crimes the race has ever (been judged to have) perpetrated are the work of the partisans of ‘right principles’ and privileged revelation” (1996, 213). Instead of dismissing the other’s perspective as wrong, one must try to understand it in order to find common ground and shared principles that might help in progressing the dialogue around the problem. If one really wants to change the opinion of the other party, instead of invoking some objective standards one should invoke some standards that the other already believes in. This means that one has to listen to the other person, try to see the world from his or her point of view. Only through understanding the other’s perspective one can have a chance to find a way to change it—or to change one’s own opinion, if this learning process should lead to that. One can aim to clarify the other’s points of view, unveil their hidden assumptions and values, or challenge their arguments, but one must do this by drawing on principles and values that the other is already committed to if one wants to have a chance to have a real impact on the other’s way of seeing the world, or actually to resolve the disagreement. I believe that this kind of approach, rather than a claim for a more objective position, has a much better chance of actually building common understanding around the moral issue at hand.
Your comment reads strangely to me because your thoughts seem to fall into a completely different groove from mine. The problem statement is perhaps: write a program that does what-I-want, indefinitely. Of course, this could involve a great deal of extrapolation.
The fact that I am even aspiring to write such a program means that I am assuming that what-I-want can be computed. Presumably, at least some portion of the relevant computation, the one that I am currently denoting ‘what-I-want’, takes place in my brain. If I want to perform this computation in an AI, then it would probably help to at least be able to reproduce whatever portion of it takes place in my brain. People who study the mind and brain happen to call themselves psychologists and cognitive scientists. It’s weird to me that you’re arguing about how to classify Joshua Greene’s research; I don’t see why it matters whether we call it philosophy or psychology. I generally find it suspicious when anyone makes a claim of the form: “Only the academic discipline that I hold in high esteem has tools that will work in this domain.” But I won’t squabble over words if you think you’re drawing important boundaries; what do you mean when you write ‘philosophical’? Maybe you’re saying that Greene, despite his efforts to inquire with psychological tools, elides into ‘philosophy’ anyway, so like, what’s the point of pretending it’s ‘moral philosophy’ via psychology? If that’s your objection, that he ‘just ends up doing philosophy anyway’, then what exactly is he eliding into, without using the words ‘philosophy’ or ‘philosophical’?
More generally, why is it that we should discard the approach because it hasn’t made itself obsolete yet? Should the philosophers give up because they haven’t made their approach obsolete yet either? If there’s any reason that we should have more confidence in the ability of philosophers than cognitive scientists to contribute towards a formal specification of what-I-want, that reason is certainly not track record.
What people believe doesn’t tell us much about what actually is good.
I don’t think anyone who has read or who likely will read your comment equivocates testimony or social consensus with what-is-good.
The challenge of AI safety is the challenge of making AI that actually does what is right, not AI that does whatever it’s told to do by a corrupt government, a racist constituency, and so on.
It’s my impression that AI safety researchers are far more concerned about unaligned AGIs killing everyone than they are about AGIs that are successfully designed by bad actors to do a specific, unimaginative thing without killing themselves and everyone else in the process.
Of course a new wave of pop-philosophers and internet bloggers have made silly claims that moral philosophy can be completely solved by psychology and neuroscience but this extreme view is ridiculous on its face.
Bleck, please don’t ever give me a justification to link a Wikipedia article literally named pooh-pooh.
The problem statement is perhaps: write a program that does what-I-want, indefinitely
No, the problem statement is write a program that does what is right.
It’s weird to me that you’re arguing about how to classify Joshua Greene’s research; I don’t see why it matters whether we call it philosophy or psychology
Then you missed the point of what I said, since I wasn’t talking about what to call it, I was talking about the tools and methods it uses. The question is what people ought to be studying and learning.
I generally find it suspicious when anyone makes a claim of the form: “Only the academic discipline that I hold in high esteem has tools that will work in this domain.”
If you want to solve a philosophical problem then you’re going to have to do philosophy. Psychology is for solving psychological problems. It’s pretty straightforward.
what do you mean when you write ‘philosophical’?
I mean the kind of work that is done in philosophy departments, and which would be studied by someone who was told “go learn about moral philosophy”.
Maybe you’re saying that Greene, despite his efforts to inquire with psychological tools, elides into ‘philosophy’ anyway
Yes, that’s true by his own admission (he affirms in his reply to Berker that the specific cognitive model he uses is peripheral to the main normative argument) and is apparent if you look at his work.
If that’s your objection, that he ‘just ends up doing philosophy anyway’, then what exactly is he eliding into, without using the words ‘philosophy’ or ‘philosophical’?
He’s eliding into normative arguments about morality, rather than merely describing psychological or cognitive processes.
More generally, why is it that we should discard the approach because it hasn’t made itself obsolete yet?
I don’t know what you are talking about, since I said nothing about obsolescence.
I don’t think anyone who has read or who likely will read your comment equivocates testimony or social consensus with what-is-good.
Great! Then they’ll acknowledge that studying testimony and social consensus is not studying what is good.
It’s my impression that AI safety researchers are far more concerned about unaligned AGIs killing everyone than they are about AGIs that are successfully designed by bad actors to do a specific, unimaginative thing without killing themselves and everyone else in the process.
Rather than bad actors needing to be restrained by good actors, which is neither a psychological nor a philosophical problem, the problem is that the very best actors are flawed and will produce flawed machines if they don’t do things correctly.
please don’t ever give me a justification to link a Wikipedia article literally named pooh-pooh.
Would you like to me to explicitly explain why the new wave of pop-philosophers and internet bloggers who think that moral philosophy can be completely solved by psychology and neuroscience don’t know what they’re talking about? It’s not taken seriously; I didn’t go into detail because I was unsure if anyone around here took it seriously.
I agree that defining human values is a philosophical issue, but I would not describe it as “not a psychological issue at all.” It is in part a psychological issue insofar as understanding how people conceive of values is itself an empirical question. Questions about individual and intergroup differences in how people conceive of values, distinguish moral from nonmoral norms, etc. cannot be resolved by philosophy alone.
I am sympathetic to some of the criticisms of Greene’s work, but I do not think Berker’s critique is completely correct, though explaining why I think Greene and others are correct in thinking that psychology can inform moral philosophy in detail would call for a rather titanic post.
The tl;dr point I’d make is that yes, you can draw philosophical conclusions from empirical premises, provided your argument is presented as a conditional one in which you propose that certain philosophical positions are dependent on certain factual claims. If anyone else accepts those premises, then empirical findings that confirm or disconfirm those factual claims can compel specific philosophical conclusions. A toy version of this would be the following:
P1: If the sky is blue, then utilitarianism is true.
P2: The sky is blue.
C: Therefore, utilitarianism is true.
If someone accepts P1, and if P2 is an empirical claim, then empirical evidence for/against P2 bears on the conclusion.
This is the kind of move Greene wants to make.
The slightly longer version of what I’d say to a lot of Greene’s critics is that they misconstrue Greene’s arguments if they think he is attempting to move straight from descriptive claims to normative claims. In arguing for the primacy of utilitarian over deontological moral norms, Greene appeals the presumptive shared premise between himself and his interlocutors that, on reflection, they will reject beliefs that are the result of epistemically dubious processes but retain those that are the result of epistemically justified processes.
If they share his views about what processes would in principle be justified/not justified, and if he can demonstrate that utilitarian judgments are reliably the result of justified processes but deontological judgments are not, then he has successfully appealed to empirical findings to draw a philosophical conclusion: that utilitarian judgments are justified and deontological ones are not. One could simply reject his premises about what constitutes justifed/unjustified grounds for belief, and in that case his argument would not be convincing. I don’t endorse his conclusions because I think his empirical findings are not compelling; not because I think he’s made any illicit philosophical moves.
The tl;dr point I’d make is that yes, you can draw philosophical conclusions from empirical premises, provided your argument is presented as a conditional one in which you propose that certain philosophical positions are dependent on certain factual claims.
You can do that if you want, but (1) it’s still a narrow case within a much larger philosophical framework and (2) such cases are usually pretty simple and don’t require sophisticated knowledge of psychology.
The slightly longer version of what I’d say to a lot of Greene’s critics is that they misconstrue Greene’s arguments if they think he is attempting to move straight from descriptive claims to normative claims.
To the contrary, Berker criticizes Greene precisely because his neuroscientific work is hardly relevant to the moral argument he’s making. You don’t need a complex account of neuroscience or psychology to know that people’s intuitions in the trolley problem are changing merely because of an apparently non-significant change in the situation. Philosophers knew that a century ago.
If they share his views about what processes would in principle be justified/not justified, and if he can demonstrate that utilitarian judgments are reliably the result of justified processes but deontological judgments are not, then he has successfully appealed to empirical findings to draw a philosophical conclusion: that utilitarian judgments are justified and deontological ones are not.
But nobody believes that judgements are correct or wrong merely because of the process that produces them. That just produces grounds for skepticism that the judgements are reliable—and it is skepticism of a sort that was already known without any reference to psychology, for instance through Plantinga’s evolutionary argument against naturalism or evolutionary debunking arguments.
Also it’s worth clarifying that Greene only deals with a particular instance of a deontological judgement rather than deontological judgements in general.
One could simply reject his premises about what constitutes justifed/unjustified grounds for belief, and in that case his argument would not be convincing.
Again, it’s worth stressing that this is a fairly narrow and methodologically controversial area of moral philosophy. There is a difference between giving an opinion on a novel approach to a subject, and telling a group of people what subject they need to study in order to be well-informed. Even if you do take the work of x-philers for granted, it’s not the sort of thing that can be done merely with education in psychology and neuroscience, because people who understand that side of the story but not the actual philosophy are going to be unable to evaluate or make the substantive moral arguments which are necessary for empirically informed work.
Greene would probably not dispute that philosophers have generally agreed that the difference between the lever and footbridge cases are due to “apparently non-significant changes in the situation”
However, what philosophers have typically done is either bit the bullet and said one ought to push, or denied that one ought to push in the footbridge case, but then feel the need to defend commonsense intuitions by offering a principled justification for the distinction between the two. The trolley literature is rife with attempts to vindicate an unwillingness to push, because these philosophers are starting from the assumption that commonsense moral intuitions track deep moral truths and we must explicate the underlying, implicit justification our moral competence is picking up on.
What Greene is doing by appealing to neuroscientific/psychological evidence is to offer a selective debunking explanation of some of those intuitions but not the others. If the evidence demonstrates that one set of outputs (deontological judgments) are the result of an unreliable cognitive process, and another set of outputs (utilitarian judgments) are the result of reliable cognitive processes, then he can show that we have reason to doubt one set of intuitions but not the other, provided we agree with his criteria about what constitutes a reliable vs. an unreliable process. A selective debunking argument of this kind, relying as it does on the reliability of distinct psychological systems or processes, does in fact turn on the empirical evidence (in this case, on his dual process model of moral cognition).
[But nobody believes that judgements are correct or wrong merely because of the process that produces them.]
Sure, but Greene does not need to argue that deontological/utilitarian conclusions are correct or incorrect, only that we have reason to doubt one but not the other. If we can offer reasons to doubt the very psychological processes that give rise to deontological intuitions, this skepticism may be sufficient to warrant skepticism about the larger project of assuming that these intuitions are underwitten by implicit, non-obvious justifications that the philosopher’s job is to extract and explicate.
You mention evolutionary debunking arguments as an alternative that is known “without any reference to psychology.” I think this is mistaken. Evolutionary debunking arguments are entirely predicated on specific empirical claims about the evolution of human psychology, and are thus a perfect example of the relevance of empirical findings to moral philosophy.
[Also it’s worth clarifying that Greene only deals with a particular instance of a deontological judgement rather than deontological judgements in general.]
Yes, I completely agree and I think this is a major weakness with Greene’s account.
I think there are two other major problems: the fMRI evidence he has is not very convincing, and trolley problems offer a distorted psychological picture of the distinction between utilitarian and non-utilitarian moral judgment. Recent work by Kahane shows that people who push in footbridge scenarios tend not to be utilitarians, just people with low empathy. The same people that push tend to also be more egoistic, less charitable, less impartial, less concerned about maximizing welfare, etc.
Regarding your last point two points: I agree that one move is to simply reject how he talks about intuitions (or one could raise other epistemic challenges presumably). I also agree that training in psychology/neuroscience but not philosophy impairs one’s ability to evaluate arguments that presumably depend on competence in both. I am not sure why you bring this up though, so if there was an inference I should draw from this help me out!
Defining human values, at least in the prescriptive sense, is not a psychological issue at all. It’s a philosophical issue. Certain philosophers have believed that psychology can inform moral philosophy, but it’s a stretch to say that even someone like Joshua Greene’s work in experimental philosophy is a psychology-focused research program, and the whole approach is dubious—see, e.g., The Normative Insignificance of Neuroscience (http://www.pgrim.org/philosophersannual/29articles/berkerthenormative.pdf). Of course a new wave of pop-philosophers and internet bloggers have made silly claims that moral philosophy can be completely solved by psychology and neuroscience but this extreme view is ridiculous on its face.
What people believe doesn’t tell us much about what actually is good. The challenge of AI safety is the challenge of making AI that actually does what is right, not AI that does whatever it’s told to do by a corrupt government, a racist constituency, and so on.
It took me a while to respond to this because I wanted to take the time to read “The Normative Insignificance of Neuroscience” first. Having now read it, I’d say that I agree with its claims with regard to criticism of Greene’s approach. I don’t think it disproves the notion of psychology being useful for defining human values, though, for I think there’s an argument for psychology’s usefulness that’s entirely distinct from the specific approach that Greene is taking.
I start from the premise that the goal of moral philosophy is to develop a set of explicit principles that would tell us what is good. Now this is particularly relevant for designing AI, because we also want our AIs to follow those principles. But it’s noteworthy that at their current state, none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered “good”. E.g. Muehlhauser & Helm 2012:
Yet the problem remains: the AI has to be programmed with some definition of what is good.
Now this alone isn’t yet sufficient to show that philosophy wouldn’t be up to the task. But philosophy has been trying to solve ethics for at least the last 2500 years, and it doesn’t look like there would have been any major progress towards solving it. The PhilPapers survey didn’t show any of the three major ethical schools (consequentialism, deontology, virtue ethics) being significantly more favored by professional philosophers than the others, nor does anyone—to my knowledge—even know what a decisive theoretical argument in favor of one of them could be.
And at this point, we have pretty good theoretical reasons for believing that the traditional goal of moral philosophy—“developing a set of explicit principles for telling us what is good”—is in fact impossible. Or at least, it’s impossible to develop a set of principles that would be simple and clear enough to write down in human-understandable form and which would give us clear answers to every situation, because morality is too complicated for that.
We’ve already seen this in trying to define concepts: as philosophy noted a long time ago, you can’t come up with a set of explicit rules that would define even any concept even as simple as “man” in such a way that nobody could develop a counterexample. “The Normative Insignificance of Neuroscience” also notes that the situation in ethics looks similar to the situation with trying to define many other concepts:
Yet human brains do manage to successfully reason with concepts, despite it being impossible to develop a set of explicit necessary and sufficient criteria. The evidence from both psychology and artificial intelligence (where we’ve managed to train neural nets capable of reasonably good object recognition) is that a big part of how they do it is by building up complicated statistical models of what counts as a “man” or “philosopher” or whatever.
So given that
we can’t build explicit verbal models of what a concept is * but we can build machine-learning algorithms that use complicated statistical analysis to identify instances of a concept
and
defining morality looks similar to defining concepts, in that we can’t build explicit verbal models of what morality is
it would seem reasonable to assume that
we can build machine-learning algorithms that can learn to define morality, in that it can give such answers to moral dilemmas that a vast majority of people would consider them acceptable
But here it looks likely that we need information from psychology to narrow down what those models should be. What humans consider to be good has likely been influenced by a number of evolutionary idiosyncrasies, so if we want to come up with a model of morality that most humans would agree with, then our AI’s reasoning process should take into account those considerations. And we’ve already established that defining those considerations on a verbal level looks insufficient—they have to be established on a deeper level, of “what are the actual computational processes that are involved when the brain computes morality”.
Yes, I am here assuming “what is good” to equate to “what do human brains consider good”, in a way that may be seen as reducing to “what would human brains accept as a persuasive argument for what is good”. You could argue that this is flawed, because it’s getting dangerously close to defining “good” by social consensus. But then again, the way the field of ethics itself proceeds is basically the same: a philosopher presents an argument for what is good, another attacks it, if the argument survives attacks and is compelling then it is eventually accepted. For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable—due to the is-ought gap—that some degree of “truth by social consensus” is the only way of figuring out what the truth is, even in principle.
Hi Kaj,
Even if we found the most agreeable available set of moral principles, that amount may turn out not to constitute the vast majority of people. It may not even reach a majority at all. It is possible that there simply is no moral theory that is acceptable to most people. People may just have irreconcilable values. You state that:
“For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable—due to the is-ought gap—that some degree of “truth by social consensus” is the only way of figuring out what the truth is, even in principle.”
Suppose this is the best we can do. It doesn’t follow that the outputs of this exercise are “true.” I am not sure in what sense this would constitute a true set of moral principles.
More importantly, it is unclear whether or not I have any rational or moral obligation to care about the outputs of this exercise. I do not want to implement the moral system that most people find agreeable. On the contrary, I want everyone to share my moral views, because this is what, fundamentally, I care about. The notion that we should care about what others care about, and implement whatever the consensus is, seems to presume a very strong and highly contestable metaethical position that I do not accept and do not think others should accept.
It’s certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?
Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you’d be convinced that this solution really does satisfy all the things you care about—and all the things that most other people care about, too.
From a more pragmatic perspective, you could try to insist on an AI which implemented your values specifically—but then everyone else would also have a reason to fight to get an AI which fulfilled their values specifically, and if it was you versus everyone else in the world, it seems like a pretty high probability that somebody else would win. Which means that your values would have a much higher chance of getting shafted than if everyone had agreed to go for a solution which tried to take into everyone’s preferences into account.
And of course, in the context of AI, everyone insisting on their own values and their values only means that we’ll get arms races, meaning a higher probability of a worse outcome for everyone.
See also Gains from Trade Through Compromise.
Sure. That isn’t my primary objection though. My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.
I want to convert all matter in the universe to utilitronium. Do you think it is likely that an AI that factored in the values of all humans would yield this as its solution? I do not. Since I think the expected utility of most other likely solutions, given what I suspect about other people’s values, is far less than this, I would view almost any scenario other than imposing my values on everyone else to be a cosmic disaster.
Well, what alternative would you propose? I don’t see how it would even be possible to get any stronger evidence for the moral truth of a theory, than the failure of everyone to come up with convincing objections to it even after extended investigation. Nor a strategy for testing the truth which wouldn’t at some point reduce to “test X gives us reason to disagree with the theory”.
I would understand your disagreement if you were a moral antirealist, but your comments seem to imply that you do believe that a moral truth exists and that it’s possible to get information about it, and that it’s possible to do “heavy metaethical lifting”. But how?
I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.
Whoops. I can see how my responses didn’t make my own position clear.
I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.
I consider it a likely futile effort to integrate important and substantive discussions into contemporary moral philosophy. If engaging with moral philosophy introduces unproductive digressions/confusions/misplaced priorities into the discussion it may do more harm than good.
I’m puzzled by this remark:
I view utilitronium as an end, not a means. It is a logical consequence of wanting to maximize aggregate utility and is more or less a logical entailment of my moral views. I favor the production of whatever physical state of affairs yields the highest aggregate utility. This is, by definition, “utilitronium.” If I’m using the term in an unusual way I’m happy to propose a new label that conveys what I have in mind.
I totally sympathize with your sentiment and feel the same way about incorporating other people’s values in a superintelligent AI. If I just went with my own wish list for what the future should look like, I would not care about most other people’s wishes. I feel as though many other people are not even trying to be altruistic in the relevant sense that I want to be altruistic, and I don’t experience a lot of moral motivation to help accomplish people’s weird notions of altruistic goals, let alone any goals that are clearly non-altruistically motivated. In the same way I’d feel no strong (even lower, admittedly) motivation to help make the dreams of baby eating aliens come true.
Having said that, I am confident that it would screw things up for everyone if I followed a decision policy that does not give weight to other people’s strongly held moral beliefs. It is already hard enough to not mess up AI alignment in a way that makes things worse for everyone, and it would become much harder still if we had half a dozen or more competing teams who each wanted to get their idiosyncratic view of the future installed.
BTW note that value differences are not the only thing that can get you into trouble. If you hold an important empirical beliefs that others do not share, and you cannot convince them of it, then it may appear to you as though you’re justified to do something radical about it, but that’s even more likely to be a bad idea because the reasons for taking peer disagreement seriously are stronger in empirical domains of dispute than in normative ones.
There is a sea of considerations from Kantianism, contractualism, norms for stable/civil societies and advanced decision theory that, while each line of argument seems tentative on its own and open to skepticism, all taken together point very strongly into the same direction, namely that things will be horrible if we fail to cooperate with each other and that cooperating is often the truly rational thing to do. You’re probably already familiar with a lot of this, but for general reference, see also this recent paper that makes a particularly interesting case for particularly strong cooperation, as well as other work on the topic, e.g. here and here.
This is why I believe that people interested in any particular version of utilitronium should not override AI alignment procedures last minute just to get an extra large share of cosmic stakes for their own value system, and why I believe that people like me, who care primarily about reducing suffering, should not increase existential risk. Of course, all of this means that people who want to benefit human values in general should take particular caution to make sure that idiosyncratic value systems that may diverge from them also receive consideration and gains from trade.
This piece I wrote recently is relevant to cooperation and the question of whether values are subjective or not, and how much convergence we should expect and to what extent value extrapolation procedures bake in certain (potentially unilateral) assumptions.
Ah, okay. Well, in that case you can just read my original comment as an argument for why one would want to use psychology to design an AI that was capable of correctly figuring out just a single person’s values and implementing them, as that’s obviously a prerequisite for figuring out everybody’s values. The stuff that I had about social consensus was just an argument aimed at moral realists, if you’re not one then it’s probably not relevant for you.
(my values would still say that we should try to take everyone’s values into account, but that disagreement is distinct from the whole “is psychology useful for value learning” question)
Sorry, my mistake—I confused utilitronium with hedonium.
Great!
Restricting analysis to the Western tradition, 2500 years ago we barely had any conception of virtue ethics. Our contemporary conceptions of virtue ethics are much better than the ones the Greeks had. Meanwhile, deontological and consequentialist ethics did not even exist back then. Even over recent decades there has been progress in these positions. And plenty of philosophers know what a decisive theoretical argument could be: either they purport to have identified such arguments, or they think it would be an argument that showed the theory to be well supported by intuitions, reason, or some other evidence, not generally different from what an argument for a non-moral philosophical theory would look like.
It would (arguably) give results that people wouldn’t like, but assuming that the moral theory is correct and the machine understands it, almost necessarily it would do morally correct things. If you object to its actions then you are already begging the question by asserting that we ought to be focused on building a machine that will do things that we like regardless of whether they are moral. Moreover, you could tell a similar story for any values that people have. Whether you source them from real philosophy or from layman ethics wouldn’t change the problems of optimization and systematization.
But that’s an even stronger claim than the one that moral philosophy hasn’t progressed towards such a goal. What reasons are there?
That’s contentious, but some philosophers believe that, and there are philosophies which adhere to that. The problem of figuring out how to make a machine behave morally according to those premises is still a philosophical one, just one based on other ideas in moral philosophy besides explicit rule-based ones.
Except the field of ethics does it with actual arguments among experts in the field. You could make the same story for any field: truths about physics can be determined by social consensus, since that’s just what the field of physics is, a physicist presents an experiment or hypothesis, another attacks it, if the hypothesis survives the attacks and is compelling then it is eventually accepted! And so on for all non-moral fields of inquiry as well. I don’t see why you think ethics would be special; basically everything can be modeled like this. But that’s ridiculous. We don’t look at social consensus for all forms of inquiry, because there is a difference between what ordinary people believe and what people believe when they are trained professionals in the subject.
Then why don’t you believe in morality by social consensus? (Or do you? It seems like you’re probably not, given that you’re an effective altruist. What do you think about animal rights, or Sharia law?)
(We seem to be talking past each other in some weird way; I’m not even sure what exactly it is that we’re disagreeing over.)
Well sure, if we proceed from the assumption that the moral theory really was correct, but the point was that none of those proposed theories has been generally accepted by moral philosophers.
I gave one in the comment? That philosophy has accepted that you can’t give a set of human-comprehensible set of necessary and sufficient criteria for concepts, and if you want a system for classifying concepts you have to use psychology and machine learning; and it looks like morality is similar.
I’m not sure what exactly you’re disagreeing with? It seems obvious to me that physics does indeed proceed by social consensus in the manner you describe. Someone does an experiment, then others replicate the experiment until there is consensus that this experiment really does produce these results; somebody proposes a hypothesis to explain the experimental results, others point out holes in that hypothesis, there’s an extended back-and-forth conversation and further experiments until there is a consensus that the modified hypothesis really does explain the results and that it can be accepted as an established scientific law. And the same for all other scientific and philosophical disciplines. I don’t think that ethics is special in that sense.
Sure, there is a difference between what ordinary people believe and what people believe when they’re trained professionals: that’s why you look for a social consensus among the people who are trained professionals and have considered the topic in detail, not among the general public.
I do believe in morality by social consensus, in the same manner as I believe in physics by social consensus: if I’m told that the physics community has accepted it as an established fact that e=mc^2 and that there’s no dispute or uncertainty about this, then I’ll accept it as something that’s probably true. If I thought that it was particularly important for me to make sure that this was correct, then I might look up the exact reasoning and experiments used to determine this and try to replicate some of them, until I found myself to also be in consensus with the physics community.
Similarly, if someone came to me with a theory of what was moral and it turned out that the entire community of moral philosophers had considered this theory and accepted it after extended examination, and I could also not find any objections to that and found the justifications compelling, then I would probably also accept the moral theory.
But to my knowledge, nobody has presented a conclusive moral theory that would satisfy both me and nearly all moral philosophers and which would say that it was wrong to be an effective altruist—quite the opposite. So I don’t see a problem in being an EA.
Your point was that “none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered “good”.” But this claim is simply begging the question by assuming that all the existing theories are false. And to claim that a theory would have bad moral results is different from claiming that it’s not generally accepted by moral philosophers. It’s plausible that a theory would have good moral results, in virtue of it being correct, while not being accepted by many moral philosophers. Since there is no dominant moral theory, this is necessarily the case as long as some moral theory is correct.
If you’re referring to ethics, no, philosophy has not accepted that you cannot give such an account. You believe this, on the basis of your observation that philosophers give different accounts of ethics. But that doesn’t mean that moral philosophers believe it. They just don’t think that the fact of disagreement implies that no such account can be given.
So you haven’t pointed out any particular features of ethics, you’ve merely described a feature of inquiry in general. This shows that your claim proves too much—it would be ridiculous to conduct physics by studying psychology.
But that’s not a matter of psychological inquiry, that’s a matter of looking at what is being published in philosophy, becoming familiar with how philosophical arguments are formed, and staying in touch with current developments in the field. So you are basically describing studying philosophy. Studying or researching psychology will not tell you anything about this.
Also, I find pretty compelling the argument that the classical definition of moral philosophy in trying to define “the good” is both impossible and not even a particularly good target to aim at, and that trying to find generally-agreeable moral solutions is something much more useful; and if we accept this argument, then moral psychology is relevant, because it can help us figure out generally-agreeable solutions.
As Martela (2017) writes:
Your comment reads strangely to me because your thoughts seem to fall into a completely different groove from mine. The problem statement is perhaps: write a program that does what-I-want, indefinitely. Of course, this could involve a great deal of extrapolation.
The fact that I am even aspiring to write such a program means that I am assuming that what-I-want can be computed. Presumably, at least some portion of the relevant computation, the one that I am currently denoting ‘what-I-want’, takes place in my brain. If I want to perform this computation in an AI, then it would probably help to at least be able to reproduce whatever portion of it takes place in my brain. People who study the mind and brain happen to call themselves psychologists and cognitive scientists. It’s weird to me that you’re arguing about how to classify Joshua Greene’s research; I don’t see why it matters whether we call it philosophy or psychology. I generally find it suspicious when anyone makes a claim of the form: “Only the academic discipline that I hold in high esteem has tools that will work in this domain.” But I won’t squabble over words if you think you’re drawing important boundaries; what do you mean when you write ‘philosophical’? Maybe you’re saying that Greene, despite his efforts to inquire with psychological tools, elides into ‘philosophy’ anyway, so like, what’s the point of pretending it’s ‘moral philosophy’ via psychology? If that’s your objection, that he ‘just ends up doing philosophy anyway’, then what exactly is he eliding into, without using the words ‘philosophy’ or ‘philosophical’?
More generally, why is it that we should discard the approach because it hasn’t made itself obsolete yet? Should the philosophers give up because they haven’t made their approach obsolete yet either? If there’s any reason that we should have more confidence in the ability of philosophers than cognitive scientists to contribute towards a formal specification of what-I-want, that reason is certainly not track record.
I don’t think anyone who has read or who likely will read your comment equivocates testimony or social consensus with what-is-good.
It’s my impression that AI safety researchers are far more concerned about unaligned AGIs killing everyone than they are about AGIs that are successfully designed by bad actors to do a specific, unimaginative thing without killing themselves and everyone else in the process.
Bleck, please don’t ever give me a justification to link a Wikipedia article literally named pooh-pooh.
No, the problem statement is write a program that does what is right.
Then you missed the point of what I said, since I wasn’t talking about what to call it, I was talking about the tools and methods it uses. The question is what people ought to be studying and learning.
If you want to solve a philosophical problem then you’re going to have to do philosophy. Psychology is for solving psychological problems. It’s pretty straightforward.
I mean the kind of work that is done in philosophy departments, and which would be studied by someone who was told “go learn about moral philosophy”.
Yes, that’s true by his own admission (he affirms in his reply to Berker that the specific cognitive model he uses is peripheral to the main normative argument) and is apparent if you look at his work.
He’s eliding into normative arguments about morality, rather than merely describing psychological or cognitive processes.
I don’t know what you are talking about, since I said nothing about obsolescence.
Great! Then they’ll acknowledge that studying testimony and social consensus is not studying what is good.
Rather than bad actors needing to be restrained by good actors, which is neither a psychological nor a philosophical problem, the problem is that the very best actors are flawed and will produce flawed machines if they don’t do things correctly.
Would you like to me to explicitly explain why the new wave of pop-philosophers and internet bloggers who think that moral philosophy can be completely solved by psychology and neuroscience don’t know what they’re talking about? It’s not taken seriously; I didn’t go into detail because I was unsure if anyone around here took it seriously.
I agree that defining human values is a philosophical issue, but I would not describe it as “not a psychological issue at all.” It is in part a psychological issue insofar as understanding how people conceive of values is itself an empirical question. Questions about individual and intergroup differences in how people conceive of values, distinguish moral from nonmoral norms, etc. cannot be resolved by philosophy alone.
I am sympathetic to some of the criticisms of Greene’s work, but I do not think Berker’s critique is completely correct, though explaining why I think Greene and others are correct in thinking that psychology can inform moral philosophy in detail would call for a rather titanic post.
The tl;dr point I’d make is that yes, you can draw philosophical conclusions from empirical premises, provided your argument is presented as a conditional one in which you propose that certain philosophical positions are dependent on certain factual claims. If anyone else accepts those premises, then empirical findings that confirm or disconfirm those factual claims can compel specific philosophical conclusions. A toy version of this would be the following:
P1: If the sky is blue, then utilitarianism is true. P2: The sky is blue. C: Therefore, utilitarianism is true.
If someone accepts P1, and if P2 is an empirical claim, then empirical evidence for/against P2 bears on the conclusion.
This is the kind of move Greene wants to make.
The slightly longer version of what I’d say to a lot of Greene’s critics is that they misconstrue Greene’s arguments if they think he is attempting to move straight from descriptive claims to normative claims. In arguing for the primacy of utilitarian over deontological moral norms, Greene appeals the presumptive shared premise between himself and his interlocutors that, on reflection, they will reject beliefs that are the result of epistemically dubious processes but retain those that are the result of epistemically justified processes.
If they share his views about what processes would in principle be justified/not justified, and if he can demonstrate that utilitarian judgments are reliably the result of justified processes but deontological judgments are not, then he has successfully appealed to empirical findings to draw a philosophical conclusion: that utilitarian judgments are justified and deontological ones are not. One could simply reject his premises about what constitutes justifed/unjustified grounds for belief, and in that case his argument would not be convincing. I don’t endorse his conclusions because I think his empirical findings are not compelling; not because I think he’s made any illicit philosophical moves.
You can do that if you want, but (1) it’s still a narrow case within a much larger philosophical framework and (2) such cases are usually pretty simple and don’t require sophisticated knowledge of psychology.
To the contrary, Berker criticizes Greene precisely because his neuroscientific work is hardly relevant to the moral argument he’s making. You don’t need a complex account of neuroscience or psychology to know that people’s intuitions in the trolley problem are changing merely because of an apparently non-significant change in the situation. Philosophers knew that a century ago.
But nobody believes that judgements are correct or wrong merely because of the process that produces them. That just produces grounds for skepticism that the judgements are reliable—and it is skepticism of a sort that was already known without any reference to psychology, for instance through Plantinga’s evolutionary argument against naturalism or evolutionary debunking arguments.
Also it’s worth clarifying that Greene only deals with a particular instance of a deontological judgement rather than deontological judgements in general.
It’s only a question of moral epistemology, so you could simply disagree on how he talks about intuitions or abandon the idea altogether (https://global.oup.com/academic/product/philosophy-without-intuitions-9780199644865?cc=us&lang=en&).
Again, it’s worth stressing that this is a fairly narrow and methodologically controversial area of moral philosophy. There is a difference between giving an opinion on a novel approach to a subject, and telling a group of people what subject they need to study in order to be well-informed. Even if you do take the work of x-philers for granted, it’s not the sort of thing that can be done merely with education in psychology and neuroscience, because people who understand that side of the story but not the actual philosophy are going to be unable to evaluate or make the substantive moral arguments which are necessary for empirically informed work.
Thanks for the excellent reply.
Greene would probably not dispute that philosophers have generally agreed that the difference between the lever and footbridge cases are due to “apparently non-significant changes in the situation”
However, what philosophers have typically done is either bit the bullet and said one ought to push, or denied that one ought to push in the footbridge case, but then feel the need to defend commonsense intuitions by offering a principled justification for the distinction between the two. The trolley literature is rife with attempts to vindicate an unwillingness to push, because these philosophers are starting from the assumption that commonsense moral intuitions track deep moral truths and we must explicate the underlying, implicit justification our moral competence is picking up on.
What Greene is doing by appealing to neuroscientific/psychological evidence is to offer a selective debunking explanation of some of those intuitions but not the others. If the evidence demonstrates that one set of outputs (deontological judgments) are the result of an unreliable cognitive process, and another set of outputs (utilitarian judgments) are the result of reliable cognitive processes, then he can show that we have reason to doubt one set of intuitions but not the other, provided we agree with his criteria about what constitutes a reliable vs. an unreliable process. A selective debunking argument of this kind, relying as it does on the reliability of distinct psychological systems or processes, does in fact turn on the empirical evidence (in this case, on his dual process model of moral cognition).
[But nobody believes that judgements are correct or wrong merely because of the process that produces them.]
Sure, but Greene does not need to argue that deontological/utilitarian conclusions are correct or incorrect, only that we have reason to doubt one but not the other. If we can offer reasons to doubt the very psychological processes that give rise to deontological intuitions, this skepticism may be sufficient to warrant skepticism about the larger project of assuming that these intuitions are underwitten by implicit, non-obvious justifications that the philosopher’s job is to extract and explicate.
You mention evolutionary debunking arguments as an alternative that is known “without any reference to psychology.” I think this is mistaken. Evolutionary debunking arguments are entirely predicated on specific empirical claims about the evolution of human psychology, and are thus a perfect example of the relevance of empirical findings to moral philosophy.
[Also it’s worth clarifying that Greene only deals with a particular instance of a deontological judgement rather than deontological judgements in general.]
Yes, I completely agree and I think this is a major weakness with Greene’s account.
I think there are two other major problems: the fMRI evidence he has is not very convincing, and trolley problems offer a distorted psychological picture of the distinction between utilitarian and non-utilitarian moral judgment. Recent work by Kahane shows that people who push in footbridge scenarios tend not to be utilitarians, just people with low empathy. The same people that push tend to also be more egoistic, less charitable, less impartial, less concerned about maximizing welfare, etc.
Regarding your last point two points: I agree that one move is to simply reject how he talks about intuitions (or one could raise other epistemic challenges presumably). I also agree that training in psychology/neuroscience but not philosophy impairs one’s ability to evaluate arguments that presumably depend on competence in both. I am not sure why you bring this up though, so if there was an inference I should draw from this help me out!