Cognitive Science/Psychology As a Neglected Approach to AI Safety
All of the advice on getting into AI safety research that I’ve seen recommends studying computer science and mathematics: for example, the 80,000 hours AI safety syllabus provides a computer science-focused reading list, and mentions that “Ideally your undergraduate degree would be mathematics and computer science”.
There are obvious good reasons for recommending these two fields, and I agree that anyone wishing to make an impact in AI safety should have at least a basic proficiency in them. However, I find it a little concerning that cognitive science/psychology are rarely even mentioned in these guides. I believe that it would be valuable to have more people working in AI safety whose primary background is from one of cogsci/psych, or who have at least done a minor in them.
Here are examples of four lines of research into AI safety which I think could benefit from such a background:
The psychology of developing an AI safety culture. Besides the technical problem of “how can we create safe AI”, there is the social problem of “how can we ensure that the AI research community develops a culture where safety concerns are taken seriously”. At least two existing papers draw on psychology to consider this problem: Eliezer Yudkowsky’s “Cognitive Biases Potentially Affecting Judgment of Global Risks” uses cognitive psychology to discuss why people might misjudge the probability of risks in general, and Seth Baum’s “On the promotion of safe and socially beneficial artificial intelligence” uses social psychology to discuss the specific challenge of motivating AI researchers to choose beneficial AI designs.
Developing better analyses of “AI takeoff” scenarios. Currently humans are the only general intelligence we know of, so any analyzes of what “expertise” consists of and how it can be acquired would benefit from the study of humans. Eliezer Yudkowsky’s “Intelligence Explosion Microeconomics” draws on a number of fields to analyze the possibility of a hard takeoff, including some knowledge of human intelligence differences as well as the history of human evolution, whereas my “How Feasible is the Rapid Development of Artificial Superintelligence?” draws extensively on the work of a number of psychologists to make the case that based on what we know of human expertise, scenarios with AI systems becoming major actors within timescales on the order of mere days or weeks seem to remain within the range of plausibility.
Defining just what it is that human values are. The project of AI safety can roughly be defined as “the challenge of ensuring that AIs remain aligned with human values”, but it’s also widely acknowledged that nobody really knows what exactly human values are—or at least, not to a sufficient extent that they could be given a formal definition and programmed into an AI. This seems like one of the core problems of AI safety, and one which can only be understood with a psychology-focused research program. Luke Muehlhauser’s article “A Crash Course in the Neuroscience of Human Motivation” took one look at human values from the perspective of neuroscience, and my “Defining Human Values for Value Learners” sought to provide a preliminary definition of human values in a computational language, drawing from the intersection of artificial intelligence, moral psychology, and emotion research. Both of these are very preliminary papers, and it would take a full research program to pursue this question in more detail.
Better understanding multi-level world-models. MIRI defines the technical problem of “multi-level world-models” as “How can multi-level world-models be constructed from sense data in a manner amenable to ontology identification?”. In other words, suppose that we had built an AI to make diamonds (or anything else we care about) for us. How should that AI be programmed so that it could still accurately estimate the number of diamonds in the world after it had learned more about physics, and after it had learned that the things it calls “diamonds” are actually composed of protons, neutrons, and electrons? While I haven’t seen any papers that would explicitly tackle this question yet, a reasonable starting point would seem to be the question of “well, how do humans do it?”. There, psych/cogsci may offer some clues. For instance, in the book Cognitive Pluralism, the philosopher Steven Horst offers an argument for believing that humans have multiple different, mutually incompatible mental models / reasoning systems—ranging from core knowledge systems to scientific theories—that they flexibly switch between depending on the situation. (Unfortunately, Horst approaches this as a philosopher, so he’s mostly content at making the argument for this being the case in general, leaving it up to actual cognitive scientists to work out how exactly this works.) I previously also offered a general argument along these lines in my article World-models as tools, suggesting that at least part of the choice of a mental model may be driven by reinforcement learning in the basal ganglia. But this isn’t saying much, given that all human thought and behavior seems to be in at least part driven by reinforcement learning in the basal ganglia. Again, this would take a dedicated research program.
From these four special cases, you could derive more general use cases for psychology and cognitive science within AI safety:
Psychology as the study and understanding of human thought and behavior, helps guide actions that are aimed at understanding and influencing people’s behavior in a more safety-aligned direction (related example: the psychology of developing an AI safety culture)
The study of the only general intelligence we know about, may provide information about the properties of other general intelligences (related example: developing better analyzes of “AI takeoff” scenarios)
A better understanding of how human minds work, may help figure out how we want the cognitive processes of AIs to work so that they end up aligned with our values (related examples: defining human values, better understanding multi-level world-models)
Here I would ideally offer reading recommendations, but the fields are so broad that any given book can only give a rough idea of the basics; and for instance, the topic of world-models that human brains use is just one of many, many subquestions that the fields cover. Thus my suggestion to have some safety-interested people who’d actually study these fields as a major or at least a minor.
Still, if I’d have to suggest a couple of books, with the main idea of getting a basic grounding in the mindsets and theories of the fields so that it would be easier to read more specialized research… on the cognitive psychology/cognitive science side I’d suggest Cognitive Science by Jose Luis Bermudez (haven’t read it, but Luke Muehlhauser recommends it and it looked good to me based on the table of contents; see also Luke’s follow-up recommendations behind that link); Cognitive Psychology: A Student’s Handbook by Michael W. Eysenck & Mark T. Keane; and maybe Sensation and Perception by E. Bruce Goldstein. I’m afraid that I don’t know of any good introductory textbooks on the social psychology side.
- Collection of good 2012-2017 EA forum posts by 10 Jul 2020 16:35 UTC; 202 points) (
- A central directory for open research questions by 19 Apr 2020 23:47 UTC; 163 points) (
- What are the highest impact questions in the behavioral sciences? by 7 Apr 2021 11:35 UTC; 37 points) (
- 23 Oct 2019 11:37 UTC; 30 points) 's comment on Technical AGI safety research outside AI by (
- Research ideas to study humans with AI Safety in mind by 3 Jul 2020 16:01 UTC; 23 points) (LessWrong;
- Is it valuable to the field of AI Safety to have a neuroscience background? by 3 Apr 2022 19:44 UTC; 18 points) (
- Bi-Weekly Rational Feed by 10 Jun 2017 21:56 UTC; 17 points) (LessWrong;
- AI safety and consciousness research: A brainstorm by 15 Mar 2023 14:33 UTC; 11 points) (
- A new place to discuss cognitive science, ethics and human alignment by 4 Nov 2022 14:34 UTC; 9 points) (
- 30 Sep 2017 11:44 UTC; 4 points) 's comment on Call for cognitive science in AI safety by (LessWrong;
- 20 Sep 2021 1:29 UTC; 3 points) 's comment on A central directory for open research questions by (
- A new place to discuss cognitive science, ethics and human alignment by 4 Nov 2022 14:34 UTC; 3 points) (LessWrong;
- Looking at how Superforecasting might improve some EA projects response to Superintelligence by 29 Aug 2017 22:22 UTC; 2 points) (
I am a psychology PhD student with a background in philosophy/evolutionary psychology. My current research focuses on two main areas: effective altruism and the nature of morality and in particular the psychology of metaethics. My motivation for pursuing the former should be obvious, but my rationale for pursuing the latter is in part self-consciously about the third bullet point, “Defining just what it is that human values are.” More basic than even defining what those values are, I am interested in what people take values themselves to be. For instance, we do not actually have good data on the degree to which people regard their own moral beliefs as objective/relative, how common noncognitivist or error theoretic beliefs are in lay populations, etc.
Related to the first point, about developing an AI safety culture, there is also the matter of what we can glean psychologically about how the public likely to receive AI developments. Understanding how people generally perceive AI and technological change more broadly could provide insight that can help us anticipate emerging social issues that result from advances in AI and improve our ability to raise awareness about and increase receptivity to concerns about AI risk among nonexperts, policymakers, the media, and the public. Cognitive science has more direct value than areas like mine (social psychology/philosophy) but my areas of study could serve a valuable auxiliary function to AI safety.
I think these are all points that many people have considered privately or publicly in isolation, but that thus far no one has explicitly written them down and drawn a connection between them. In particular, lots of people have independently made the observation that ontological crises in AIs are apparently similar to existential angst in humans, ontology identification seems philosophically difficult, and so plausibly studying ontology identification in humans is a promising route to understanding ontology identification for arbitrary minds. So, thank you for writing this up; it seems like something that needed to be written quite badly.
Some other problems that might be easier to tackle from this perspective include mind crime, nonperson predicates, and suffering risk, especially subproblems like suffering in physics.
Strong agreement. Considerations from cognitive science might also help us to get a handle on how difficult the problem of general intelligence is, and the limits of certain techniques (e.g. reinforcement learning). This could help clarify our thinking on AI timelines as well as the constraints which any AGI must satisfy. Misc. topics that jump to mind are the mental modularity debate, the frame problem, and insight problem solving.
This is a good article on AI from a cog sci perspective: https://arxiv.org/pdf/1604.00289.pdf
Yay, correctly guessed which article that was before clicking on the link. :-)
Also, have you seen this AI Impacts post and the interview it links to? I would expect so, but it seems worth asking. Tom Griffiths makes similar points to the ones you’ve made here.
I’d seen that, but re-reading it was useful. :)
There has recently been an effort started to make the pipeline better for getting people up to speed with AGI safety. I’m trying to champion a broad view of AGI safety including psychology.
Would anyone be interested in providing digested content? It would also be good to have an exit for the pipeline for psychology people interested in AGI. Would that be FHI? Who else would be good to talk to about what is required.
Excellent post; as a psych professor I agree that psych and cognitive science are relevant to AI safety, and it’s surprising that our insights from studying animal and human minds for the last 150 years haven’t been integrating into mainstream AI safety work.
The key problem, I think, is that AI safety seems to assume that there will be some super-powerful deep learning system attached to some general-purpose utility function connected to a general-purpose reward system, and we have to get the utility/reward system exactly aligned with our moral interests.
That’s not the way any animal mind has ever emerged in evolutionary history. Instead, minds emerge as large numbers of domain-specific mental adaptations to solve certain problems, and they’re coordinated by superordinate ‘modes of operation’ called emotions and motivations. These can be described as implementing utility functions, but that’s not their function—promoting reproductive success is. Some animals also evolve some ‘moral machinery’ for nepotism, reciprocity, in-group cohesion, norm-policing, and virtue-signaling, but those mechanisms are also distinct and often at odds.
Maybe we’ll be able to design AGIs that deviate markedly from this standard ‘massively modular’ animal-brain architecture, but we have no proof-of-concept for thinking that will work. Until then, it seems useful to consider what psychology has learned about preferences, motivations, emotions, moral intuitions, and domain-specific forms of reinforcement learning.
I got linked here while browsing a pretty random blog on deep learning, you’re getting attention! (https://medium.com/intuitionmachine/seven-deadly-sins-and-ai-safety-5601ae6932c3)
Neat, thanks for the find. :)
What is your model of why other people in the AI safety field disagree with you/don’t consider this as important as you?
My main guess is “they mostly come from a math/CS background so haven’t looked at this through a psych/cogsci perspective and seen how it could be useful”.
That said, some of my stuff linked to above has been mostly met with a silence, and while I presume it’s a question of inferential silence—a sufficiently long inferential distance that a claim doesn’t provoke even objections, just uncomprehending or indifferent silence—there is also the possibility of me just being so wrong about the usefulness of my ideas that nobody’s even bothering to tell me.
Kaj, I tend to promote your stuff a fair amount to end the inferential silence, and it goes without saying that I agree with all else you said.
Don’t give up on your ideas or approach. I am dispirited that there are so few people thinking like you do out there.
Hi Kaj,
Thanks for writing this. Since you mention some 80,000 Hours content, I thought I’d respond briefly with our perspective.
We had intended the career review and AI safety syllabus to be about what you’d need to do from a technical AI research perspective. I’ve added a note to clarify this.
We agree that there a lot of approaches you could take to tackle AI risk, but currently expect that technical AI research will be where a large amount of the effort is required. However, we’ve also advised many people on non-technical routes to impacting AI safety, so don’t think it’s the only valid path by any means.
We’re planning on releasing other guides and paths for non-technical approaches, such as the AI safety policy career guide, which also recommends studying political science and public policy, law, and ethics, among others.
Hi Peter, thanks for the response!
Your comment seems to suggest that you don’t think the arguments in my post are relevant for technical AI safety research. Do you feel that I didn’t make a persuasive case for psych/cogsci being relevant for value learning/multi-level world-models research, or do you not count these as technical AI safety research? Or am I misunderstanding you somehow?
I agree that the “understanding psychology may help persuade more people to work on/care about AI safety” and “analyzing human intelligences may suggest things about takeoff scenarios” points aren’t related to technical safety research, but value learning and multi-level world-models are very much technical problems to me.
We agree these are technical problems, but for most people, all else being equal, it seems more useful to learn ML rather than cog sci/psych. Caveats:
Personal fit could dominate this equation though, so I’d be excited about people tackling AI safety from a variety of fields.
It’s an equilibrium. The more people already attacking a problem using one toolkit, the more we should be sending people to learn other toolkits to attack it.
Got it. To clarify: if the question as framed as “should AI safety researchers learn ML, or should they learn cogsci/psych”, then I agree that it seems better to learn ML.
I see quite a bunch of relevant cognitive science work these days, e.g. this: http://saxelab.mit.edu/resources/papers/Kleiman-Weiner.etal.2017.pdf
That’s super-neat! Thanks.
Defining human values, at least in the prescriptive sense, is not a psychological issue at all. It’s a philosophical issue. Certain philosophers have believed that psychology can inform moral philosophy, but it’s a stretch to say that even someone like Joshua Greene’s work in experimental philosophy is a psychology-focused research program, and the whole approach is dubious—see, e.g., The Normative Insignificance of Neuroscience (http://www.pgrim.org/philosophersannual/29articles/berkerthenormative.pdf). Of course a new wave of pop-philosophers and internet bloggers have made silly claims that moral philosophy can be completely solved by psychology and neuroscience but this extreme view is ridiculous on its face.
What people believe doesn’t tell us much about what actually is good. The challenge of AI safety is the challenge of making AI that actually does what is right, not AI that does whatever it’s told to do by a corrupt government, a racist constituency, and so on.
It took me a while to respond to this because I wanted to take the time to read “The Normative Insignificance of Neuroscience” first. Having now read it, I’d say that I agree with its claims with regard to criticism of Greene’s approach. I don’t think it disproves the notion of psychology being useful for defining human values, though, for I think there’s an argument for psychology’s usefulness that’s entirely distinct from the specific approach that Greene is taking.
I start from the premise that the goal of moral philosophy is to develop a set of explicit principles that would tell us what is good. Now this is particularly relevant for designing AI, because we also want our AIs to follow those principles. But it’s noteworthy that at their current state, none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered “good”. E.g. Muehlhauser & Helm 2012:
Yet the problem remains: the AI has to be programmed with some definition of what is good.
Now this alone isn’t yet sufficient to show that philosophy wouldn’t be up to the task. But philosophy has been trying to solve ethics for at least the last 2500 years, and it doesn’t look like there would have been any major progress towards solving it. The PhilPapers survey didn’t show any of the three major ethical schools (consequentialism, deontology, virtue ethics) being significantly more favored by professional philosophers than the others, nor does anyone—to my knowledge—even know what a decisive theoretical argument in favor of one of them could be.
And at this point, we have pretty good theoretical reasons for believing that the traditional goal of moral philosophy—“developing a set of explicit principles for telling us what is good”—is in fact impossible. Or at least, it’s impossible to develop a set of principles that would be simple and clear enough to write down in human-understandable form and which would give us clear answers to every situation, because morality is too complicated for that.
We’ve already seen this in trying to define concepts: as philosophy noted a long time ago, you can’t come up with a set of explicit rules that would define even any concept even as simple as “man” in such a way that nobody could develop a counterexample. “The Normative Insignificance of Neuroscience” also notes that the situation in ethics looks similar to the situation with trying to define many other concepts:
Yet human brains do manage to successfully reason with concepts, despite it being impossible to develop a set of explicit necessary and sufficient criteria. The evidence from both psychology and artificial intelligence (where we’ve managed to train neural nets capable of reasonably good object recognition) is that a big part of how they do it is by building up complicated statistical models of what counts as a “man” or “philosopher” or whatever.
So given that
we can’t build explicit verbal models of what a concept is * but we can build machine-learning algorithms that use complicated statistical analysis to identify instances of a concept
and
defining morality looks similar to defining concepts, in that we can’t build explicit verbal models of what morality is
it would seem reasonable to assume that
we can build machine-learning algorithms that can learn to define morality, in that it can give such answers to moral dilemmas that a vast majority of people would consider them acceptable
But here it looks likely that we need information from psychology to narrow down what those models should be. What humans consider to be good has likely been influenced by a number of evolutionary idiosyncrasies, so if we want to come up with a model of morality that most humans would agree with, then our AI’s reasoning process should take into account those considerations. And we’ve already established that defining those considerations on a verbal level looks insufficient—they have to be established on a deeper level, of “what are the actual computational processes that are involved when the brain computes morality”.
Yes, I am here assuming “what is good” to equate to “what do human brains consider good”, in a way that may be seen as reducing to “what would human brains accept as a persuasive argument for what is good”. You could argue that this is flawed, because it’s getting dangerously close to defining “good” by social consensus. But then again, the way the field of ethics itself proceeds is basically the same: a philosopher presents an argument for what is good, another attacks it, if the argument survives attacks and is compelling then it is eventually accepted. For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable—due to the is-ought gap—that some degree of “truth by social consensus” is the only way of figuring out what the truth is, even in principle.
Hi Kaj,
Even if we found the most agreeable available set of moral principles, that amount may turn out not to constitute the vast majority of people. It may not even reach a majority at all. It is possible that there simply is no moral theory that is acceptable to most people. People may just have irreconcilable values. You state that:
“For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable—due to the is-ought gap—that some degree of “truth by social consensus” is the only way of figuring out what the truth is, even in principle.”
Suppose this is the best we can do. It doesn’t follow that the outputs of this exercise are “true.” I am not sure in what sense this would constitute a true set of moral principles.
More importantly, it is unclear whether or not I have any rational or moral obligation to care about the outputs of this exercise. I do not want to implement the moral system that most people find agreeable. On the contrary, I want everyone to share my moral views, because this is what, fundamentally, I care about. The notion that we should care about what others care about, and implement whatever the consensus is, seems to presume a very strong and highly contestable metaethical position that I do not accept and do not think others should accept.
It’s certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?
Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you’d be convinced that this solution really does satisfy all the things you care about—and all the things that most other people care about, too.
From a more pragmatic perspective, you could try to insist on an AI which implemented your values specifically—but then everyone else would also have a reason to fight to get an AI which fulfilled their values specifically, and if it was you versus everyone else in the world, it seems like a pretty high probability that somebody else would win. Which means that your values would have a much higher chance of getting shafted than if everyone had agreed to go for a solution which tried to take into everyone’s preferences into account.
And of course, in the context of AI, everyone insisting on their own values and their values only means that we’ll get arms races, meaning a higher probability of a worse outcome for everyone.
See also Gains from Trade Through Compromise.
Sure. That isn’t my primary objection though. My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.
I want to convert all matter in the universe to utilitronium. Do you think it is likely that an AI that factored in the values of all humans would yield this as its solution? I do not. Since I think the expected utility of most other likely solutions, given what I suspect about other people’s values, is far less than this, I would view almost any scenario other than imposing my values on everyone else to be a cosmic disaster.
Well, what alternative would you propose? I don’t see how it would even be possible to get any stronger evidence for the moral truth of a theory, than the failure of everyone to come up with convincing objections to it even after extended investigation. Nor a strategy for testing the truth which wouldn’t at some point reduce to “test X gives us reason to disagree with the theory”.
I would understand your disagreement if you were a moral antirealist, but your comments seem to imply that you do believe that a moral truth exists and that it’s possible to get information about it, and that it’s possible to do “heavy metaethical lifting”. But how?
I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.
Whoops. I can see how my responses didn’t make my own position clear.
I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.
I consider it a likely futile effort to integrate important and substantive discussions into contemporary moral philosophy. If engaging with moral philosophy introduces unproductive digressions/confusions/misplaced priorities into the discussion it may do more harm than good.
I’m puzzled by this remark:
I view utilitronium as an end, not a means. It is a logical consequence of wanting to maximize aggregate utility and is more or less a logical entailment of my moral views. I favor the production of whatever physical state of affairs yields the highest aggregate utility. This is, by definition, “utilitronium.” If I’m using the term in an unusual way I’m happy to propose a new label that conveys what I have in mind.
I totally sympathize with your sentiment and feel the same way about incorporating other people’s values in a superintelligent AI. If I just went with my own wish list for what the future should look like, I would not care about most other people’s wishes. I feel as though many other people are not even trying to be altruistic in the relevant sense that I want to be altruistic, and I don’t experience a lot of moral motivation to help accomplish people’s weird notions of altruistic goals, let alone any goals that are clearly non-altruistically motivated. In the same way I’d feel no strong (even lower, admittedly) motivation to help make the dreams of baby eating aliens come true.
Having said that, I am confident that it would screw things up for everyone if I followed a decision policy that does not give weight to other people’s strongly held moral beliefs. It is already hard enough to not mess up AI alignment in a way that makes things worse for everyone, and it would become much harder still if we had half a dozen or more competing teams who each wanted to get their idiosyncratic view of the future installed.
BTW note that value differences are not the only thing that can get you into trouble. If you hold an important empirical beliefs that others do not share, and you cannot convince them of it, then it may appear to you as though you’re justified to do something radical about it, but that’s even more likely to be a bad idea because the reasons for taking peer disagreement seriously are stronger in empirical domains of dispute than in normative ones.
There is a sea of considerations from Kantianism, contractualism, norms for stable/civil societies and advanced decision theory that, while each line of argument seems tentative on its own and open to skepticism, all taken together point very strongly into the same direction, namely that things will be horrible if we fail to cooperate with each other and that cooperating is often the truly rational thing to do. You’re probably already familiar with a lot of this, but for general reference, see also this recent paper that makes a particularly interesting case for particularly strong cooperation, as well as other work on the topic, e.g. here and here.
This is why I believe that people interested in any particular version of utilitronium should not override AI alignment procedures last minute just to get an extra large share of cosmic stakes for their own value system, and why I believe that people like me, who care primarily about reducing suffering, should not increase existential risk. Of course, all of this means that people who want to benefit human values in general should take particular caution to make sure that idiosyncratic value systems that may diverge from them also receive consideration and gains from trade.
This piece I wrote recently is relevant to cooperation and the question of whether values are subjective or not, and how much convergence we should expect and to what extent value extrapolation procedures bake in certain (potentially unilateral) assumptions.
Ah, okay. Well, in that case you can just read my original comment as an argument for why one would want to use psychology to design an AI that was capable of correctly figuring out just a single person’s values and implementing them, as that’s obviously a prerequisite for figuring out everybody’s values. The stuff that I had about social consensus was just an argument aimed at moral realists, if you’re not one then it’s probably not relevant for you.
(my values would still say that we should try to take everyone’s values into account, but that disagreement is distinct from the whole “is psychology useful for value learning” question)
Sorry, my mistake—I confused utilitronium with hedonium.
Great!
Restricting analysis to the Western tradition, 2500 years ago we barely had any conception of virtue ethics. Our contemporary conceptions of virtue ethics are much better than the ones the Greeks had. Meanwhile, deontological and consequentialist ethics did not even exist back then. Even over recent decades there has been progress in these positions. And plenty of philosophers know what a decisive theoretical argument could be: either they purport to have identified such arguments, or they think it would be an argument that showed the theory to be well supported by intuitions, reason, or some other evidence, not generally different from what an argument for a non-moral philosophical theory would look like.
It would (arguably) give results that people wouldn’t like, but assuming that the moral theory is correct and the machine understands it, almost necessarily it would do morally correct things. If you object to its actions then you are already begging the question by asserting that we ought to be focused on building a machine that will do things that we like regardless of whether they are moral. Moreover, you could tell a similar story for any values that people have. Whether you source them from real philosophy or from layman ethics wouldn’t change the problems of optimization and systematization.
But that’s an even stronger claim than the one that moral philosophy hasn’t progressed towards such a goal. What reasons are there?
That’s contentious, but some philosophers believe that, and there are philosophies which adhere to that. The problem of figuring out how to make a machine behave morally according to those premises is still a philosophical one, just one based on other ideas in moral philosophy besides explicit rule-based ones.
Except the field of ethics does it with actual arguments among experts in the field. You could make the same story for any field: truths about physics can be determined by social consensus, since that’s just what the field of physics is, a physicist presents an experiment or hypothesis, another attacks it, if the hypothesis survives the attacks and is compelling then it is eventually accepted! And so on for all non-moral fields of inquiry as well. I don’t see why you think ethics would be special; basically everything can be modeled like this. But that’s ridiculous. We don’t look at social consensus for all forms of inquiry, because there is a difference between what ordinary people believe and what people believe when they are trained professionals in the subject.
Then why don’t you believe in morality by social consensus? (Or do you? It seems like you’re probably not, given that you’re an effective altruist. What do you think about animal rights, or Sharia law?)
(We seem to be talking past each other in some weird way; I’m not even sure what exactly it is that we’re disagreeing over.)
Well sure, if we proceed from the assumption that the moral theory really was correct, but the point was that none of those proposed theories has been generally accepted by moral philosophers.
I gave one in the comment? That philosophy has accepted that you can’t give a set of human-comprehensible set of necessary and sufficient criteria for concepts, and if you want a system for classifying concepts you have to use psychology and machine learning; and it looks like morality is similar.
I’m not sure what exactly you’re disagreeing with? It seems obvious to me that physics does indeed proceed by social consensus in the manner you describe. Someone does an experiment, then others replicate the experiment until there is consensus that this experiment really does produce these results; somebody proposes a hypothesis to explain the experimental results, others point out holes in that hypothesis, there’s an extended back-and-forth conversation and further experiments until there is a consensus that the modified hypothesis really does explain the results and that it can be accepted as an established scientific law. And the same for all other scientific and philosophical disciplines. I don’t think that ethics is special in that sense.
Sure, there is a difference between what ordinary people believe and what people believe when they’re trained professionals: that’s why you look for a social consensus among the people who are trained professionals and have considered the topic in detail, not among the general public.
I do believe in morality by social consensus, in the same manner as I believe in physics by social consensus: if I’m told that the physics community has accepted it as an established fact that e=mc^2 and that there’s no dispute or uncertainty about this, then I’ll accept it as something that’s probably true. If I thought that it was particularly important for me to make sure that this was correct, then I might look up the exact reasoning and experiments used to determine this and try to replicate some of them, until I found myself to also be in consensus with the physics community.
Similarly, if someone came to me with a theory of what was moral and it turned out that the entire community of moral philosophers had considered this theory and accepted it after extended examination, and I could also not find any objections to that and found the justifications compelling, then I would probably also accept the moral theory.
But to my knowledge, nobody has presented a conclusive moral theory that would satisfy both me and nearly all moral philosophers and which would say that it was wrong to be an effective altruist—quite the opposite. So I don’t see a problem in being an EA.
Your point was that “none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered “good”.” But this claim is simply begging the question by assuming that all the existing theories are false. And to claim that a theory would have bad moral results is different from claiming that it’s not generally accepted by moral philosophers. It’s plausible that a theory would have good moral results, in virtue of it being correct, while not being accepted by many moral philosophers. Since there is no dominant moral theory, this is necessarily the case as long as some moral theory is correct.
If you’re referring to ethics, no, philosophy has not accepted that you cannot give such an account. You believe this, on the basis of your observation that philosophers give different accounts of ethics. But that doesn’t mean that moral philosophers believe it. They just don’t think that the fact of disagreement implies that no such account can be given.
So you haven’t pointed out any particular features of ethics, you’ve merely described a feature of inquiry in general. This shows that your claim proves too much—it would be ridiculous to conduct physics by studying psychology.
But that’s not a matter of psychological inquiry, that’s a matter of looking at what is being published in philosophy, becoming familiar with how philosophical arguments are formed, and staying in touch with current developments in the field. So you are basically describing studying philosophy. Studying or researching psychology will not tell you anything about this.
Also, I find pretty compelling the argument that the classical definition of moral philosophy in trying to define “the good” is both impossible and not even a particularly good target to aim at, and that trying to find generally-agreeable moral solutions is something much more useful; and if we accept this argument, then moral psychology is relevant, because it can help us figure out generally-agreeable solutions.
As Martela (2017) writes:
Your comment reads strangely to me because your thoughts seem to fall into a completely different groove from mine. The problem statement is perhaps: write a program that does what-I-want, indefinitely. Of course, this could involve a great deal of extrapolation.
The fact that I am even aspiring to write such a program means that I am assuming that what-I-want can be computed. Presumably, at least some portion of the relevant computation, the one that I am currently denoting ‘what-I-want’, takes place in my brain. If I want to perform this computation in an AI, then it would probably help to at least be able to reproduce whatever portion of it takes place in my brain. People who study the mind and brain happen to call themselves psychologists and cognitive scientists. It’s weird to me that you’re arguing about how to classify Joshua Greene’s research; I don’t see why it matters whether we call it philosophy or psychology. I generally find it suspicious when anyone makes a claim of the form: “Only the academic discipline that I hold in high esteem has tools that will work in this domain.” But I won’t squabble over words if you think you’re drawing important boundaries; what do you mean when you write ‘philosophical’? Maybe you’re saying that Greene, despite his efforts to inquire with psychological tools, elides into ‘philosophy’ anyway, so like, what’s the point of pretending it’s ‘moral philosophy’ via psychology? If that’s your objection, that he ‘just ends up doing philosophy anyway’, then what exactly is he eliding into, without using the words ‘philosophy’ or ‘philosophical’?
More generally, why is it that we should discard the approach because it hasn’t made itself obsolete yet? Should the philosophers give up because they haven’t made their approach obsolete yet either? If there’s any reason that we should have more confidence in the ability of philosophers than cognitive scientists to contribute towards a formal specification of what-I-want, that reason is certainly not track record.
I don’t think anyone who has read or who likely will read your comment equivocates testimony or social consensus with what-is-good.
It’s my impression that AI safety researchers are far more concerned about unaligned AGIs killing everyone than they are about AGIs that are successfully designed by bad actors to do a specific, unimaginative thing without killing themselves and everyone else in the process.
Bleck, please don’t ever give me a justification to link a Wikipedia article literally named pooh-pooh.
No, the problem statement is write a program that does what is right.
Then you missed the point of what I said, since I wasn’t talking about what to call it, I was talking about the tools and methods it uses. The question is what people ought to be studying and learning.
If you want to solve a philosophical problem then you’re going to have to do philosophy. Psychology is for solving psychological problems. It’s pretty straightforward.
I mean the kind of work that is done in philosophy departments, and which would be studied by someone who was told “go learn about moral philosophy”.
Yes, that’s true by his own admission (he affirms in his reply to Berker that the specific cognitive model he uses is peripheral to the main normative argument) and is apparent if you look at his work.
He’s eliding into normative arguments about morality, rather than merely describing psychological or cognitive processes.
I don’t know what you are talking about, since I said nothing about obsolescence.
Great! Then they’ll acknowledge that studying testimony and social consensus is not studying what is good.
Rather than bad actors needing to be restrained by good actors, which is neither a psychological nor a philosophical problem, the problem is that the very best actors are flawed and will produce flawed machines if they don’t do things correctly.
Would you like to me to explicitly explain why the new wave of pop-philosophers and internet bloggers who think that moral philosophy can be completely solved by psychology and neuroscience don’t know what they’re talking about? It’s not taken seriously; I didn’t go into detail because I was unsure if anyone around here took it seriously.
I agree that defining human values is a philosophical issue, but I would not describe it as “not a psychological issue at all.” It is in part a psychological issue insofar as understanding how people conceive of values is itself an empirical question. Questions about individual and intergroup differences in how people conceive of values, distinguish moral from nonmoral norms, etc. cannot be resolved by philosophy alone.
I am sympathetic to some of the criticisms of Greene’s work, but I do not think Berker’s critique is completely correct, though explaining why I think Greene and others are correct in thinking that psychology can inform moral philosophy in detail would call for a rather titanic post.
The tl;dr point I’d make is that yes, you can draw philosophical conclusions from empirical premises, provided your argument is presented as a conditional one in which you propose that certain philosophical positions are dependent on certain factual claims. If anyone else accepts those premises, then empirical findings that confirm or disconfirm those factual claims can compel specific philosophical conclusions. A toy version of this would be the following:
P1: If the sky is blue, then utilitarianism is true. P2: The sky is blue. C: Therefore, utilitarianism is true.
If someone accepts P1, and if P2 is an empirical claim, then empirical evidence for/against P2 bears on the conclusion.
This is the kind of move Greene wants to make.
The slightly longer version of what I’d say to a lot of Greene’s critics is that they misconstrue Greene’s arguments if they think he is attempting to move straight from descriptive claims to normative claims. In arguing for the primacy of utilitarian over deontological moral norms, Greene appeals the presumptive shared premise between himself and his interlocutors that, on reflection, they will reject beliefs that are the result of epistemically dubious processes but retain those that are the result of epistemically justified processes.
If they share his views about what processes would in principle be justified/not justified, and if he can demonstrate that utilitarian judgments are reliably the result of justified processes but deontological judgments are not, then he has successfully appealed to empirical findings to draw a philosophical conclusion: that utilitarian judgments are justified and deontological ones are not. One could simply reject his premises about what constitutes justifed/unjustified grounds for belief, and in that case his argument would not be convincing. I don’t endorse his conclusions because I think his empirical findings are not compelling; not because I think he’s made any illicit philosophical moves.
You can do that if you want, but (1) it’s still a narrow case within a much larger philosophical framework and (2) such cases are usually pretty simple and don’t require sophisticated knowledge of psychology.
To the contrary, Berker criticizes Greene precisely because his neuroscientific work is hardly relevant to the moral argument he’s making. You don’t need a complex account of neuroscience or psychology to know that people’s intuitions in the trolley problem are changing merely because of an apparently non-significant change in the situation. Philosophers knew that a century ago.
But nobody believes that judgements are correct or wrong merely because of the process that produces them. That just produces grounds for skepticism that the judgements are reliable—and it is skepticism of a sort that was already known without any reference to psychology, for instance through Plantinga’s evolutionary argument against naturalism or evolutionary debunking arguments.
Also it’s worth clarifying that Greene only deals with a particular instance of a deontological judgement rather than deontological judgements in general.
It’s only a question of moral epistemology, so you could simply disagree on how he talks about intuitions or abandon the idea altogether (https://global.oup.com/academic/product/philosophy-without-intuitions-9780199644865?cc=us&lang=en&).
Again, it’s worth stressing that this is a fairly narrow and methodologically controversial area of moral philosophy. There is a difference between giving an opinion on a novel approach to a subject, and telling a group of people what subject they need to study in order to be well-informed. Even if you do take the work of x-philers for granted, it’s not the sort of thing that can be done merely with education in psychology and neuroscience, because people who understand that side of the story but not the actual philosophy are going to be unable to evaluate or make the substantive moral arguments which are necessary for empirically informed work.
Thanks for the excellent reply.
Greene would probably not dispute that philosophers have generally agreed that the difference between the lever and footbridge cases are due to “apparently non-significant changes in the situation”
However, what philosophers have typically done is either bit the bullet and said one ought to push, or denied that one ought to push in the footbridge case, but then feel the need to defend commonsense intuitions by offering a principled justification for the distinction between the two. The trolley literature is rife with attempts to vindicate an unwillingness to push, because these philosophers are starting from the assumption that commonsense moral intuitions track deep moral truths and we must explicate the underlying, implicit justification our moral competence is picking up on.
What Greene is doing by appealing to neuroscientific/psychological evidence is to offer a selective debunking explanation of some of those intuitions but not the others. If the evidence demonstrates that one set of outputs (deontological judgments) are the result of an unreliable cognitive process, and another set of outputs (utilitarian judgments) are the result of reliable cognitive processes, then he can show that we have reason to doubt one set of intuitions but not the other, provided we agree with his criteria about what constitutes a reliable vs. an unreliable process. A selective debunking argument of this kind, relying as it does on the reliability of distinct psychological systems or processes, does in fact turn on the empirical evidence (in this case, on his dual process model of moral cognition).
[But nobody believes that judgements are correct or wrong merely because of the process that produces them.]
Sure, but Greene does not need to argue that deontological/utilitarian conclusions are correct or incorrect, only that we have reason to doubt one but not the other. If we can offer reasons to doubt the very psychological processes that give rise to deontological intuitions, this skepticism may be sufficient to warrant skepticism about the larger project of assuming that these intuitions are underwitten by implicit, non-obvious justifications that the philosopher’s job is to extract and explicate.
You mention evolutionary debunking arguments as an alternative that is known “without any reference to psychology.” I think this is mistaken. Evolutionary debunking arguments are entirely predicated on specific empirical claims about the evolution of human psychology, and are thus a perfect example of the relevance of empirical findings to moral philosophy.
[Also it’s worth clarifying that Greene only deals with a particular instance of a deontological judgement rather than deontological judgements in general.]
Yes, I completely agree and I think this is a major weakness with Greene’s account.
I think there are two other major problems: the fMRI evidence he has is not very convincing, and trolley problems offer a distorted psychological picture of the distinction between utilitarian and non-utilitarian moral judgment. Recent work by Kahane shows that people who push in footbridge scenarios tend not to be utilitarians, just people with low empathy. The same people that push tend to also be more egoistic, less charitable, less impartial, less concerned about maximizing welfare, etc.
Regarding your last point two points: I agree that one move is to simply reject how he talks about intuitions (or one could raise other epistemic challenges presumably). I also agree that training in psychology/neuroscience but not philosophy impairs one’s ability to evaluate arguments that presumably depend on competence in both. I am not sure why you bring this up though, so if there was an inference I should draw from this help me out!