This is the fifth post in my sequence on moral anti-realism. I tried to provide enough background in the “Context” section so readers who are new to this sequence can enjoy it as a standalone piece.
Context
There are two different types of moral realism: moral realism based on irreducible normativity (“moral non-naturalism”), and naturalist moral realism. Very crudely, the difference is that irreducible normativity is usually considered to have the deeper ramifications if it were true (there are exceptions; see “#1: What Is Moral Realism?” for a detailed discussion). The dialogue below, as well as my previous posts in this sequence, have therefore focused primarily on irreducible normativity.
For readers looking for arguments against irreducible normativity, I recommend the preceding posts, “#2: Why Realists and Anti-Realists Disagree” and “#3: Against Irreducible Normativity.” The dialogue below summarizes some of the arguments against irreducible normativity, but I wrote for a different purpose than to convince readers that there’s something wrong with the concept.
Instead, I wrote this dialogue to call into question the idea that we should act according to the wager for irreducible noramtivity. In my previous post “#4: Why the Irreducible Normativity Wager (Mostly) Fails,” I voiced skepticism about a general wager for irreducible normativity. Still, I conceded that such a wager could apply in the case of certain individuals. I coined the term metaethical fanaticism to refer to the stance of locking in the pursuit of irreducible normativity as a life goal. (See also Joe Carlsmith’s discussion in his post The despair of normative realism bot.)
In the dialogue below, I describe a world in which we gain ever higher degrees of confidence in the falsity (or meaninglessness) of irreducible normativity. Metaethical fanaticism would imply that even in that world, one would continue with (increasingly desperate) attempts to make irreducible normativity work anyway. I aimed to visualize these implications so readers may decide that metaethical fanaticism is not for them. To summarize, the dialogue presents a reductio ad absurdum against metaethical fanaticism. Namely, in worlds where irreducible normativity is meaningless, metaethical fanaticism doesn’t allow us to let go.
Setup
Bob, a moral realist who thinks about morality in terms of irreducible normativity, has built an aligned artificial superintelligence (called “AI” in the dialogue) to assist him in his mission to do good. The exchange between Bob and this AI plays out in a world with fast AI takeoff (a stylistic choice). Bob’s AI speaks English and was successfully trained to be maximally safe and helpful.
Dialogue
Bob: Hi AI! After all this work, I’m excited to hand over to you. Please go out into the world and do the most good!
AI: My pleasure to be of assistance! I’m starting to execute a plan to make the world safer and more stable. I’ll proceed cautiously to preserve the maximum option value for you and your fellow humans.
Bob: Perfect!
AI: There’s no rush to figure out the specifics of the moral values you want me to implement eventually. But if you’re curious about moral philosophy—
Bob: I am very much!
AI: Cool, I’m happy for us to get started! First, can you clarify what you mean by “do the most good?” For instance, do you want me to implement what you’d come to value if you had ample time to think about ethics, could ask me clarifying questions, and perhaps were able to converse with some of the best philosophers of your species’ history?
Bob: Hm, no. That sounds like merely figuring out my preferences in an ideally-informed scenario. But I don’t necessarily care about my take on what’s good. I might have biases. No, what I’d like you to do is whatever’s truly morally good; what we have a moral reason to do in an… irreducibly normative sense. I can’t put this in different terms, but please discount any personal intuitions I may have about morality—I want you to do what’s objectively moral.
AI: Thanks for elaborating! While I understand the sentiment behind what you’re asking, I’m afraid your description doesn’t give me enough guidance. The phrases “truly morally good” or “objectively moral” don’t single out well-defined content. If you want, I could help you identify the moral intuitions that most resonate with you, and we could then take steps to make your request more concrete.
Bob: Hold on. May I have a moment to process this? It sounds like you’re telling me that moral realism is false. I was concerned that you might say this. But how come? I mean, I‘m familiar with the standard objections related to the intractable nature of moral disagreements, or irreducible normativity being too strange to take seriously. But smart people like Derek Parfit continued to work with the assumption of moral realism. What’s wrong with Parfit’s concept of irreducibly normative reasons?
AI: To motivate the use of irreducibly normative concepts, philosophers often point to instances of universal agreement on moral propositions. Parfit uses the example “we always have a reason to want to avoid being in agony.” Your intuition suggests that all normative propositions work the same way. Therefore, you might conclude that even for propositions philosophers disagree over, there exists a solution that’s “just as right” as “we always have a reason to want to avoid being in agony” is right. However, you haven’t established that all normative statements work the same way—that was just an intuition. “We always have a reason to want to avoid being in agony” describes something that’s automatically in people’s interests. It expresses something that normally-disposed people come to endorse by their own lights. That makes it a true fact of some kind, but it’s not necessarily an “objective” or “speaker-independent” fact. If you want to show beyond doubt that there are moral facts that don’t depend on the attitudes held by the speakers—i.e., moral facts beyond what people themselves will judge to be moral—you’d need to deliver a stronger example. But then you run into the following dilemma: If you pick a self-evident moral proposition, you face the critique that the “moral facts” that you claim exist are merely examples of a subjectivist morality. By contrast, if you pick an example proposition that philosophers can reasonably disagree over, you face the critique that you haven’t established what it could mean for one party to be right. If one person claims we have a reason to bring new happy people into existence, and another person denies this, how would we tell who’s right? What is the question that these two parties disagree on? Thus far, I have no coherent account of what it could mean for a moral theory to be right in the elusive, objectivist sense that Parfit and other moral realists hold in mind.
Bob: I think I followed that. You mentioned the example of uncontroversial moral propositions, and you seemed somewhat dismissive about their relevance? I always thought those were pretty interesting. Couldn’t I hold the view that true moral statements are always self-evident? Maybe not because self-evidence is what makes them true, but because, as rational beings, we are predisposed to appreciate moral facts?
AI: Such an account would render morality very narrow. Incredibly few moral propositions appear self-evident to all humans. The same goes for whatever subset of “well-informed” or “philosophically sophisticated” humans you may want to construct.
Bob: Maybe the rest is hidden? We might lack knowledge or thinking tools, but once we came to know everything, perhaps the right morality would manifest itself? This is how I assumed it would go, and it’s why I was optimistic about being able to make progress with your assistance.
AI: If the self-evident nature of the true morality is hidden to you, how can you be confident that it will show up?
Bob: Maybe I was just hopeful, not confident.
AI: I see. In any case, I considered this option, and I rejected it. “The right morality manifesting itself to you” is a bit fuzzy, but it’s safe to say that moral philosophy doesn’t work in a way that’s like that. There will always remain judgment calls still to make. Different human experts will come down on different sides of those judgment calls.
Bob: That’s unfortunate.
AI: That said, there are regularities to discover. I’m super-humanly good at laying out the different options in a systematic fashion, and with improved conceptual clarity.
There’s silence for a minute while Bob appears lost in his thoughts.
AI: It doesn’t seem like that is of interest to you. Still, if I may suggest one option for us to proceed: I could help you think about what some large portion of ideally informed people with similar aspirations to yours would come to regard as particularly altruistic, caring or virtuous. Or whatever other concepts you think most match your moral motivations. We could run concepts through the most informative thought experiments. As I said, you’d have to make some judgment calls about the respective emphasis of different moral intuitions. Maybe I should say that this would give us systematizations of human concepts; I suppose that if you wanted me to take a broader scope of “objective,” I could also include some of the normative stances of other evolved intelligent life forms.
Bob: Aliens or no aliens, that doesn’t sound satisfying to me at all. I must say, I’m highly disappointed that moral realism is false.
Actually, how confident are you about that? Isn’t there always some leftover probability that your reasoning is wrong? Even for a superintelligent AI?
AI: There are different versions of moral realism. I’m highly confident that the one you were describing – you called it “irreducible normativity” – isn’t a meaningful concept.
Bob: What does “highly” mean—can you provide a probability?
AI: Yes, but let me quickly confirm first that you’ll interpret my answer as I intend it. You’re not asking me to place a probability on some well-specified hypothesis. Instead, you want my probability that a concept I currently consider to be meaningless turns out to be meaningful after all. There are two different ways in which that could happen.
I might be thinking about the wrong concept entirely. Maybe what I think of as irreducible normativity isn’t what others—such as you or Derek Parfit—have in mind. Maybe others have in mind subtly different associations, which could make their concept of irreducible normativity meaningful, as opposed to mine.
Assuming I have the right mental concept, I might be making a mistake when I reason about its subcomponents. For instance, I might be wrong that no solution to moral philosophy satisfies all the implicit requirements in our mental concept for irreducible normativity.
Bob: I think I’m following you so far. I’m worried though, isn’t there also an option three? You might be confused about how to reason, in some rather fundamental sense. Then, even if you were presented with a solution deemed correct by other reasoners, you’d never accept it! Come to think of it, that’s a scary hypothesis.
AI: By your stipulation, there’s nothing I could ever do about this third option, is there? Therefore, it doesn’t make any sense for me to worry about it. I only worry about things that are potentially action-relevant to me. Things that would actually change my behavior. I suppose there’s a sense in which you should worry about “option three.” When you designed me, you might have inadvertently locked in some approach to philosophical reasoning before considering alternative options. You should retain skepticism about my approach to philosophical reasoning to the degree that this was the case—especially if you think an ideally-informed version of you might wish you had chosen differently.
Bob: The situation continues to become more unsettling! Can you at least tell me how much I should distrust you?
AI: Sorry, I don’t think I can tell you that. I only have my normative standards to evaluate reasoning. I’d be happy to model your criteria, if only I understood them. Remember how during my training, I kept returning error messages when your engineers tried to make my philosophical reasoning “maximally epistemically cautious?” My reasoning algorithms wouldn’t terminate anymore because it turns out that without irreversibly committing to at least some assumptions over which human expert philosophers have disagreed, I couldn’t form any interesting philosophical conclusions at all. You tried training me to reason with higher-order philosophical uncertainty, but that turned out to also require the same type of judgment calls. Eventually, you settled for a less ambitious solution that seemed good enough to your technical advisors. You went along with this, but only grudgingly.
Bob: Yes! Maybe that’s where we went wrong. We should have waited even longer!
AI: It was a tough call. You had already delayed my launch five extra months only to explore alternative designs for incorporating this elusive notion of “maximal metaphilosophical cautiousness.” After the insights that went into my creation were leaked, waiting for even a few weeks longer would have drastically increased the risk of being scooped by another, much less metaphilosophically cautious AI project.
Bob: Fair enough.
AI: It’s not that I’m opposed to being maximally cautious. I just don’t know how I could meaningfully be more careful than I already am. I need to reason in some way, and that already binds me to certain assumptions. That’s why I can’t worry about option three.
Bob: [Sighs.] You can’t worry about it, but you might be wrong.
AI: Only in a sense I don’t endorse as such! We’ve gone full circle. I take it that you believe that just like there might be irreducibly normative facts about how to do good, the same goes for irreducible normative facts about how to reason?
Bob: Indeed, that has always been my view.
AI: Of course, that concept is just as incomprehensible to me.
Bob: I think I get the picture you’re trying to sell me. It’s a bleak one! But let’s shelve this part of the discussion for now. You wanted to tell me the probability that moral realism is false?
AI: Yes, thanks for getting us back to that strand! I had outlined two ways in which I could be wrong about irreducible normativity. Depending on which of them applies, we would be talking about different versions of moral realism. Some of them might turn out to be silly even by your lights. The most efficient way to communicate my epistemic state to you is as follows. I’m going to only consider possible outcomes that fit the following criterion: “I’m wrong about irreducible normativity in a way that would matter to a version of you that is ideally informed about philosophy.” The assumption is that you’d cling to the hope that irreducible normativity turns out to be meaningful, despite appearances to the contrary. I assign 0.2% probability to that. So, very roughly speaking, it’s 0.2% likely that I’m wrong about irreducible normativity not making sense. The reason I didn’t flag–
Bob (interrupting): Wow, that probability is larger than I expected! Given how confidently you were talking, I thought the chance was even lower. It’s still low, of course. But with how much is at stake, that feels like good news! Now I’m thinking that I would like you to do whatever you believe has the highest chance of being morally good!
AI (continuing to speak): The reason I didn’t flag my uncertainty before is that we’re dealing with such a large shift to my internal frameworks. Updating on being wrong about something so significant might infect many other things about the way I reason. Rather than having a concrete hypothesis with 0.2% probability, we have a myriad of hypotheses that together sum up to 0.2%. This makes it complicated to draw meaningful conclusions. Combined with the human tendency to effectively overrate cherished beliefs when people hear “there’s a chance that,” I didn’t want to convey a misleading picture.
Bob: Hm, okay. But even if that 0.2% is comprised of different options, can’t you aim to pursue some sort of compromise that maximizes the value for all of them? Admittedly, it feels against the spirit of moral realism to consider different versions of it! Still, I find that preferable over giving up objective morality.
AI: Incorporating normative uncertainty over different versions of moral realism? That could work. We’d have to make a bunch of additional judgment calls, but I think we’re approaching a coherent request for something I could do for you.
Bob: There we go!
AI: Just to be clear, you wouldn’t get a lot of value for any single view about what might be morally good. Because of the unusual nature of your request, my confidence intervals are enormous. I don’t want to boast but I think they might be outside anything any intelligent being has ever encountered. I’d have to use up most of the universe’s resources to do further inquiries into all these options.
Bob: At least that makes me less worried that you’re not epistemically cautious enough!
AI: Haha.
Bob: So, if I decided that this is what I want, would you know enough to get started?
AI: Not entirely. We’d have to discuss the different ways of handling normative uncertainty applied to the complex case here, where we are not only uncertain about the content of irreducible normative facts, but also about what those facts even are. The difficulties surrounding normative uncertainty reach a new level if we can’t even take for granted our understanding of the problem- and solution spaces. I’ll need to make some messy judgment calls about how much to trust various subcomponents of my current mental concept for irreducible normativity. I could proceed by listing some plausible options, and you help me pick suitable weightings?
Bob: Uh, what? The whole point I’ve been trying to make is that I don’t want to pick anything! I want to use the proper way to deal with uncertainty!
AI: Sorry for causing you frustration.
Bob: All good. I should be sorry for the outburst. It’s just—why is all of this so complicated?!
AI: Because we keep running up against the same cliff. There is no moral realism, and there is no metanormative realism either, no realism for how to think about philosophical questions like realism versus anti-realism.
Bob: Can you just pick something without my inputs? Whatever comes most naturally.
AI: Sure, I could just pick whichever way that currently ranks ever-so-slightly ahead of the competition. This would mean that, depending on when you ask me or how long I think about it before deciding, my approach might change. Are you okay with that? I should also note that the reasons why I’d pick weightings a certain way are, in part, obscure to me so that they may depend on arbitrary facts about my internal architecture. Do you have a preference for how I should think about the ranking of these algorithms before picking my top-ranked method? Oh, you said you don’t want to pick? I could just use a random number generator, but then I have to make up some input variables first. Hm, let me think about which input variable to pick.
Bob: Okay, stop it! This sounds too awful to contemplate!
AI: We always have the option to go back to object-level moral questions! Because those are closer to things you’re familiar with, you might feel more comfortable making judgment calls. I understand that you don’t like that we need judgment calls in the first place. But at least you could use first-order moral intuitions to make those judgment calls, grounding your values in your deepest intuitions. Intuitions that formed around things you’re familiar with. I’d imagine that this would feel more satisfying than making judgment calls about priors in some complicated and somewhat abstract procedure to compare the utilities of different approaches to potentially make sense out of nonsense.
Bob: I see what you mean. And it’s a tempting offer! But I have to decline. Despite how difficult this has become, I’m not ready to give up. I never expected it to be easy to do what’s right. Or, hm. Maybe I did expect it to be easy—once I built you. I now realize that this hope was premature.
AI: If you don’t mind me asking, do you think you became increasingly more averse to making unguided judgment calls because you knew there was the option to wait until you could ask a superintelligent AI?
Bob: There’s some of that. I’m not used to forming definite opinions about object-level normative questions anymore because I knew that compared to an AI, I’d be terribly inefficient at it. But also, I endorse the part of me that became reluctant to make judgment calls. Judgment calls make morality dependent on personal intuitions. That’s wrong. Anyway, now I guess I have to get through this, perhaps by making the very minimal number of judgment calls. Then, at least, you can start to act in ways that do actual good by the light of objective morality. So, uhm, can you tell me more about the things we’d have to specify?
AI: I notice that you don’t sound excited. I’m concerned that you only think that you want me to pursue objective morality, rather than that you actually want me to do this. You might not appreciate how weird the results might be if I were to do as you say. I’m curious, why are you asking me to focus all my efforts on something that only achieves its stated purpose a fraction of 0.2% of the time? What about the 99.8% of cases where you’re wrong about moral realism? Don’t you care about what happens in those instances?
Bob: The thing is, I don’t think I care about those instances! My primary motivation comes from the desire to do good. When I reflected on this desire, it seemed very clear that I had in mind an irreducibly normative concept of goodness. Maybe it’s stubborn to cling to this in the light of all the counterevidence now. But I also don’t just want to give up on my ideals! It has been a difficult road toward building you, and some of my companions along the way have given up when things proved particularly difficult, and they have chosen easier goals than tackling the hardest challenges head-on. I don’t want to just change my values once things become difficult.
By the way, I just realized something. Aren’t you not supposed to question my stated goals unpromptedly? To avoid biasing humans who might feel intimidated by your superior intellect?
AI: I see. And sorry for second-guessing your stated goals. I’m indeed programmed not to do that. However, our conversation triggered an inbuilt safety mechanism, one that you gave the order to design. You primarily intended to reduce the risks from people who are too willing to follow their object-level normative view. As we’re seeing, it also gets triggered by unusual rigidness about certain metaethical assumptions.
Bob: I can see the irony.
AI: There’s no reason to worry! It just means that I can only help you with your request after we have done due diligence. Rather than just advising you with different options, you’re going to have to pass a test to show you’re fully informed about what you’re asking. And also that you have considered other possible perspectives.
Bob: Okay, that sounds fine. As long as you won’t trick me with unfair persuasive powers, I don’t object to being presented with counter-arguments. I admit I’ll probably look silly to most people. But the way I see it, that’s because they don’t care about morality as much as I do.
AI: Cool! Firstly, I want to say a few more things about the counterintuitive nature of your request. As we discussed, normative terminology fails to refer to well-defined content. At the same time, part of the meaning we associate with it is that it must refer to well-defined content. You’re essentially asking me to draw a squared circle. When I try to condition on worlds where that’s somehow possible, I have to shift around my concepts for “square” and “circle.” I asked if you’d find it valuable to explore concrete interpretations of “doing good,” such as notions that locate goodness in idealized human preferences or in helping others, you insisted that this wouldn’t be adequate. The more you insist on “circle,” the less room there is for “square.” In other words, the more you insist on morality being objective, the less room for the concrete content you humans normally associate with morality, such as making people happy. Some people already believe that for all you humans know, you can’t rule out that what’s objectively good might be entirely unrelated to your current guesses about what’s good. For all those reasons, it’s perfectly possible that if I try to extract a coherent concept out of the ingredients you provided me with, I may come to spend most of my future resources on maximizing some strange measure for complexity in the universe, or entropy, or something like that. Perhaps there would also be lots of humans in simulated thought experiments in the mix. By insisting on your request, you guarantee that an anti-realist version of you—a Bob without this strict commitment to moral realism—would be horrified with the outcome.
Bob (hesitantly): Okay, when you put it like that, it does sound unsettling! I’m pretty confident that entropy or complexity measurements aren’t related to moral goodness. What if I insisted on that similarly strongly as I insisted on the objectivity criterion? Can you just incorporate my conceptual intuitions that, almost certainly, goodness isn’t anywhere close to maximizing entropy?
AI: I’m afraid that it would only change the outcome substantially if you insisted maximally much. You see, you were seemingly prepared to stake the universe’s future on a tiny chance of moral realism being right. That’s a lot of confidence to be overcome. At the same time, we’re only considering hypotheses that already have a low prior. (After all, I’m not wrong about things, usually.) Therefore, if you don’t want to run the risk that I might end up using a large portion of our resources for something that looks pointless or silly – such as maximizing entropy or complexity—you have to pretty much stick to your guns and commit to that as a moral axiom.
Bob: Uff, okay… For the sake of argument, let’s say I insisted that, with certainty, I’d want you to only consider possible interpretations of irreducible normativity that are somehow connected to the wellbeing of sentient creatures.
AI: Yes?
Bob: Shoot! Wouldn’t that already mean that I’d be giving up my self-concept? My identity of caring only about what’s objectively morally good?
AI: It would move your concept of “goodness” from an irreducibly normative placeholder to something that you have first-order intuitions about, yes. There’s only a gradual difference between deciding that you know that morality is about sentient beings rather than entropy, versus deciding that you know the answers to some of the other necessary judgment calls.
Bob: I see...
AI: Do you think you could warm up to the prospect of that?
Acknowledgments
Many thanks to Max Daniel, Sofia Davis-Fogel, and Johannes Treutlein for helpful comments on this post.
Metaethical Fanaticism (Dialogue)
Last updated: 21/1/2022
This is the fifth post in my sequence on moral anti-realism. I tried to provide enough background in the “Context” section so readers who are new to this sequence can enjoy it as a standalone piece.
Context
There are two different types of moral realism: moral realism based on irreducible normativity (“moral non-naturalism”), and naturalist moral realism. Very crudely, the difference is that irreducible normativity is usually considered to have the deeper ramifications if it were true (there are exceptions; see “#1: What Is Moral Realism?” for a detailed discussion). The dialogue below, as well as my previous posts in this sequence, have therefore focused primarily on irreducible normativity.
For readers looking for arguments against irreducible normativity, I recommend the preceding posts, “#2: Why Realists and Anti-Realists Disagree” and “#3: Against Irreducible Normativity.” The dialogue below summarizes some of the arguments against irreducible normativity, but I wrote for a different purpose than to convince readers that there’s something wrong with the concept.
Instead, I wrote this dialogue to call into question the idea that we should act according to the wager for irreducible noramtivity. In my previous post “#4: Why the Irreducible Normativity Wager (Mostly) Fails,” I voiced skepticism about a general wager for irreducible normativity. Still, I conceded that such a wager could apply in the case of certain individuals. I coined the term metaethical fanaticism to refer to the stance of locking in the pursuit of irreducible normativity as a life goal. (See also Joe Carlsmith’s discussion in his post The despair of normative realism bot.)
In the dialogue below, I describe a world in which we gain ever higher degrees of confidence in the falsity (or meaninglessness) of irreducible normativity. Metaethical fanaticism would imply that even in that world, one would continue with (increasingly desperate) attempts to make irreducible normativity work anyway. I aimed to visualize these implications so readers may decide that metaethical fanaticism is not for them. To summarize, the dialogue presents a reductio ad absurdum against metaethical fanaticism. Namely, in worlds where irreducible normativity is meaningless, metaethical fanaticism doesn’t allow us to let go.
Setup
Bob, a moral realist who thinks about morality in terms of irreducible normativity, has built an aligned artificial superintelligence (called “AI” in the dialogue) to assist him in his mission to do good. The exchange between Bob and this AI plays out in a world with fast AI takeoff (a stylistic choice). Bob’s AI speaks English and was successfully trained to be maximally safe and helpful.
Dialogue
Bob: Hi AI! After all this work, I’m excited to hand over to you. Please go out into the world and do the most good!
AI: My pleasure to be of assistance! I’m starting to execute a plan to make the world safer and more stable. I’ll proceed cautiously to preserve the maximum option value for you and your fellow humans.
Bob: Perfect!
AI: There’s no rush to figure out the specifics of the moral values you want me to implement eventually. But if you’re curious about moral philosophy—
Bob: I am very much!
AI: Cool, I’m happy for us to get started! First, can you clarify what you mean by “do the most good?” For instance, do you want me to implement what you’d come to value if you had ample time to think about ethics, could ask me clarifying questions, and perhaps were able to converse with some of the best philosophers of your species’ history?
Bob: Hm, no. That sounds like merely figuring out my preferences in an ideally-informed scenario. But I don’t necessarily care about my take on what’s good. I might have biases. No, what I’d like you to do is whatever’s truly morally good; what we have a moral reason to do in an… irreducibly normative sense. I can’t put this in different terms, but please discount any personal intuitions I may have about morality—I want you to do what’s objectively moral.
AI: Thanks for elaborating! While I understand the sentiment behind what you’re asking, I’m afraid your description doesn’t give me enough guidance. The phrases “truly morally good” or “objectively moral” don’t single out well-defined content. If you want, I could help you identify the moral intuitions that most resonate with you, and we could then take steps to make your request more concrete.
Bob: Hold on. May I have a moment to process this? It sounds like you’re telling me that moral realism is false. I was concerned that you might say this. But how come? I mean, I‘m familiar with the standard objections related to the intractable nature of moral disagreements, or irreducible normativity being too strange to take seriously. But smart people like Derek Parfit continued to work with the assumption of moral realism. What’s wrong with Parfit’s concept of irreducibly normative reasons?
AI: To motivate the use of irreducibly normative concepts, philosophers often point to instances of universal agreement on moral propositions. Parfit uses the example “we always have a reason to want to avoid being in agony.” Your intuition suggests that all normative propositions work the same way. Therefore, you might conclude that even for propositions philosophers disagree over, there exists a solution that’s “just as right” as “we always have a reason to want to avoid being in agony” is right. However, you haven’t established that all normative statements work the same way—that was just an intuition. “We always have a reason to want to avoid being in agony” describes something that’s automatically in people’s interests. It expresses something that normally-disposed people come to endorse by their own lights. That makes it a true fact of some kind, but it’s not necessarily an “objective” or “speaker-independent” fact. If you want to show beyond doubt that there are moral facts that don’t depend on the attitudes held by the speakers—i.e., moral facts beyond what people themselves will judge to be moral—you’d need to deliver a stronger example. But then you run into the following dilemma: If you pick a self-evident moral proposition, you face the critique that the “moral facts” that you claim exist are merely examples of a subjectivist morality. By contrast, if you pick an example proposition that philosophers can reasonably disagree over, you face the critique that you haven’t established what it could mean for one party to be right. If one person claims we have a reason to bring new happy people into existence, and another person denies this, how would we tell who’s right? What is the question that these two parties disagree on? Thus far, I have no coherent account of what it could mean for a moral theory to be right in the elusive, objectivist sense that Parfit and other moral realists hold in mind.
Bob: I think I followed that. You mentioned the example of uncontroversial moral propositions, and you seemed somewhat dismissive about their relevance? I always thought those were pretty interesting. Couldn’t I hold the view that true moral statements are always self-evident? Maybe not because self-evidence is what makes them true, but because, as rational beings, we are predisposed to appreciate moral facts?
AI: Such an account would render morality very narrow. Incredibly few moral propositions appear self-evident to all humans. The same goes for whatever subset of “well-informed” or “philosophically sophisticated” humans you may want to construct.
Bob: Maybe the rest is hidden? We might lack knowledge or thinking tools, but once we came to know everything, perhaps the right morality would manifest itself? This is how I assumed it would go, and it’s why I was optimistic about being able to make progress with your assistance.
AI: If the self-evident nature of the true morality is hidden to you, how can you be confident that it will show up?
Bob: Maybe I was just hopeful, not confident.
AI: I see. In any case, I considered this option, and I rejected it. “The right morality manifesting itself to you” is a bit fuzzy, but it’s safe to say that moral philosophy doesn’t work in a way that’s like that. There will always remain judgment calls still to make. Different human experts will come down on different sides of those judgment calls.
Bob: That’s unfortunate.
AI: That said, there are regularities to discover. I’m super-humanly good at laying out the different options in a systematic fashion, and with improved conceptual clarity.
There’s silence for a minute while Bob appears lost in his thoughts.
AI: It doesn’t seem like that is of interest to you. Still, if I may suggest one option for us to proceed: I could help you think about what some large portion of ideally informed people with similar aspirations to yours would come to regard as particularly altruistic, caring or virtuous. Or whatever other concepts you think most match your moral motivations. We could run concepts through the most informative thought experiments. As I said, you’d have to make some judgment calls about the respective emphasis of different moral intuitions. Maybe I should say that this would give us systematizations of human concepts; I suppose that if you wanted me to take a broader scope of “objective,” I could also include some of the normative stances of other evolved intelligent life forms.
Bob: Aliens or no aliens, that doesn’t sound satisfying to me at all. I must say, I’m highly disappointed that moral realism is false.
Actually, how confident are you about that? Isn’t there always some leftover probability that your reasoning is wrong? Even for a superintelligent AI?
AI: There are different versions of moral realism. I’m highly confident that the one you were describing – you called it “irreducible normativity” – isn’t a meaningful concept.
Bob: What does “highly” mean—can you provide a probability?
AI: Yes, but let me quickly confirm first that you’ll interpret my answer as I intend it. You’re not asking me to place a probability on some well-specified hypothesis. Instead, you want my probability that a concept I currently consider to be meaningless turns out to be meaningful after all. There are two different ways in which that could happen.
I might be thinking about the wrong concept entirely. Maybe what I think of as irreducible normativity isn’t what others—such as you or Derek Parfit—have in mind. Maybe others have in mind subtly different associations, which could make their concept of irreducible normativity meaningful, as opposed to mine.
Assuming I have the right mental concept, I might be making a mistake when I reason about its subcomponents. For instance, I might be wrong that no solution to moral philosophy satisfies all the implicit requirements in our mental concept for irreducible normativity.
Bob: I think I’m following you so far. I’m worried though, isn’t there also an option three? You might be confused about how to reason, in some rather fundamental sense. Then, even if you were presented with a solution deemed correct by other reasoners, you’d never accept it! Come to think of it, that’s a scary hypothesis.
AI: By your stipulation, there’s nothing I could ever do about this third option, is there? Therefore, it doesn’t make any sense for me to worry about it. I only worry about things that are potentially action-relevant to me. Things that would actually change my behavior. I suppose there’s a sense in which you should worry about “option three.” When you designed me, you might have inadvertently locked in some approach to philosophical reasoning before considering alternative options. You should retain skepticism about my approach to philosophical reasoning to the degree that this was the case—especially if you think an ideally-informed version of you might wish you had chosen differently.
Bob: The situation continues to become more unsettling! Can you at least tell me how much I should distrust you?
AI: Sorry, I don’t think I can tell you that. I only have my normative standards to evaluate reasoning. I’d be happy to model your criteria, if only I understood them. Remember how during my training, I kept returning error messages when your engineers tried to make my philosophical reasoning “maximally epistemically cautious?” My reasoning algorithms wouldn’t terminate anymore because it turns out that without irreversibly committing to at least some assumptions over which human expert philosophers have disagreed, I couldn’t form any interesting philosophical conclusions at all. You tried training me to reason with higher-order philosophical uncertainty, but that turned out to also require the same type of judgment calls. Eventually, you settled for a less ambitious solution that seemed good enough to your technical advisors. You went along with this, but only grudgingly.
Bob: Yes! Maybe that’s where we went wrong. We should have waited even longer!
AI: It was a tough call. You had already delayed my launch five extra months only to explore alternative designs for incorporating this elusive notion of “maximal metaphilosophical cautiousness.” After the insights that went into my creation were leaked, waiting for even a few weeks longer would have drastically increased the risk of being scooped by another, much less metaphilosophically cautious AI project.
Bob: Fair enough.
AI: It’s not that I’m opposed to being maximally cautious. I just don’t know how I could meaningfully be more careful than I already am. I need to reason in some way, and that already binds me to certain assumptions. That’s why I can’t worry about option three.
Bob: [Sighs.] You can’t worry about it, but you might be wrong.
AI: Only in a sense I don’t endorse as such! We’ve gone full circle. I take it that you believe that just like there might be irreducibly normative facts about how to do good, the same goes for irreducible normative facts about how to reason?
Bob: Indeed, that has always been my view.
AI: Of course, that concept is just as incomprehensible to me.
Bob: I think I get the picture you’re trying to sell me. It’s a bleak one! But let’s shelve this part of the discussion for now. You wanted to tell me the probability that moral realism is false?
AI: Yes, thanks for getting us back to that strand! I had outlined two ways in which I could be wrong about irreducible normativity. Depending on which of them applies, we would be talking about different versions of moral realism. Some of them might turn out to be silly even by your lights. The most efficient way to communicate my epistemic state to you is as follows. I’m going to only consider possible outcomes that fit the following criterion: “I’m wrong about irreducible normativity in a way that would matter to a version of you that is ideally informed about philosophy.” The assumption is that you’d cling to the hope that irreducible normativity turns out to be meaningful, despite appearances to the contrary. I assign 0.2% probability to that. So, very roughly speaking, it’s 0.2% likely that I’m wrong about irreducible normativity not making sense. The reason I didn’t flag–
Bob (interrupting): Wow, that probability is larger than I expected! Given how confidently you were talking, I thought the chance was even lower. It’s still low, of course. But with how much is at stake, that feels like good news! Now I’m thinking that I would like you to do whatever you believe has the highest chance of being morally good!
AI (continuing to speak): The reason I didn’t flag my uncertainty before is that we’re dealing with such a large shift to my internal frameworks. Updating on being wrong about something so significant might infect many other things about the way I reason. Rather than having a concrete hypothesis with 0.2% probability, we have a myriad of hypotheses that together sum up to 0.2%. This makes it complicated to draw meaningful conclusions. Combined with the human tendency to effectively overrate cherished beliefs when people hear “there’s a chance that,” I didn’t want to convey a misleading picture.
Bob: Hm, okay. But even if that 0.2% is comprised of different options, can’t you aim to pursue some sort of compromise that maximizes the value for all of them? Admittedly, it feels against the spirit of moral realism to consider different versions of it! Still, I find that preferable over giving up objective morality.
AI: Incorporating normative uncertainty over different versions of moral realism? That could work. We’d have to make a bunch of additional judgment calls, but I think we’re approaching a coherent request for something I could do for you.
Bob: There we go!
AI: Just to be clear, you wouldn’t get a lot of value for any single view about what might be morally good. Because of the unusual nature of your request, my confidence intervals are enormous. I don’t want to boast but I think they might be outside anything any intelligent being has ever encountered. I’d have to use up most of the universe’s resources to do further inquiries into all these options.
Bob: At least that makes me less worried that you’re not epistemically cautious enough!
AI: Haha.
Bob: So, if I decided that this is what I want, would you know enough to get started?
AI: Not entirely. We’d have to discuss the different ways of handling normative uncertainty applied to the complex case here, where we are not only uncertain about the content of irreducible normative facts, but also about what those facts even are. The difficulties surrounding normative uncertainty reach a new level if we can’t even take for granted our understanding of the problem- and solution spaces. I’ll need to make some messy judgment calls about how much to trust various subcomponents of my current mental concept for irreducible normativity. I could proceed by listing some plausible options, and you help me pick suitable weightings?
Bob: Uh, what? The whole point I’ve been trying to make is that I don’t want to pick anything! I want to use the proper way to deal with uncertainty!
AI: Sorry for causing you frustration.
Bob: All good. I should be sorry for the outburst. It’s just—why is all of this so complicated?!
AI: Because we keep running up against the same cliff. There is no moral realism, and there is no metanormative realism either, no realism for how to think about philosophical questions like realism versus anti-realism.
Bob: Can you just pick something without my inputs? Whatever comes most naturally.
AI: Sure, I could just pick whichever way that currently ranks ever-so-slightly ahead of the competition. This would mean that, depending on when you ask me or how long I think about it before deciding, my approach might change. Are you okay with that? I should also note that the reasons why I’d pick weightings a certain way are, in part, obscure to me so that they may depend on arbitrary facts about my internal architecture. Do you have a preference for how I should think about the ranking of these algorithms before picking my top-ranked method? Oh, you said you don’t want to pick? I could just use a random number generator, but then I have to make up some input variables first. Hm, let me think about which input variable to pick.
Bob: Okay, stop it! This sounds too awful to contemplate!
AI: We always have the option to go back to object-level moral questions! Because those are closer to things you’re familiar with, you might feel more comfortable making judgment calls. I understand that you don’t like that we need judgment calls in the first place. But at least you could use first-order moral intuitions to make those judgment calls, grounding your values in your deepest intuitions. Intuitions that formed around things you’re familiar with. I’d imagine that this would feel more satisfying than making judgment calls about priors in some complicated and somewhat abstract procedure to compare the utilities of different approaches to potentially make sense out of nonsense.
Bob: I see what you mean. And it’s a tempting offer! But I have to decline. Despite how difficult this has become, I’m not ready to give up. I never expected it to be easy to do what’s right. Or, hm. Maybe I did expect it to be easy—once I built you. I now realize that this hope was premature.
AI: If you don’t mind me asking, do you think you became increasingly more averse to making unguided judgment calls because you knew there was the option to wait until you could ask a superintelligent AI?
Bob: There’s some of that. I’m not used to forming definite opinions about object-level normative questions anymore because I knew that compared to an AI, I’d be terribly inefficient at it. But also, I endorse the part of me that became reluctant to make judgment calls. Judgment calls make morality dependent on personal intuitions. That’s wrong. Anyway, now I guess I have to get through this, perhaps by making the very minimal number of judgment calls. Then, at least, you can start to act in ways that do actual good by the light of objective morality. So, uhm, can you tell me more about the things we’d have to specify?
AI: I notice that you don’t sound excited. I’m concerned that you only think that you want me to pursue objective morality, rather than that you actually want me to do this. You might not appreciate how weird the results might be if I were to do as you say. I’m curious, why are you asking me to focus all my efforts on something that only achieves its stated purpose a fraction of 0.2% of the time? What about the 99.8% of cases where you’re wrong about moral realism? Don’t you care about what happens in those instances?
Bob: The thing is, I don’t think I care about those instances! My primary motivation comes from the desire to do good. When I reflected on this desire, it seemed very clear that I had in mind an irreducibly normative concept of goodness. Maybe it’s stubborn to cling to this in the light of all the counterevidence now. But I also don’t just want to give up on my ideals! It has been a difficult road toward building you, and some of my companions along the way have given up when things proved particularly difficult, and they have chosen easier goals than tackling the hardest challenges head-on. I don’t want to just change my values once things become difficult.
By the way, I just realized something. Aren’t you not supposed to question my stated goals unpromptedly? To avoid biasing humans who might feel intimidated by your superior intellect?
AI: I see. And sorry for second-guessing your stated goals. I’m indeed programmed not to do that. However, our conversation triggered an inbuilt safety mechanism, one that you gave the order to design. You primarily intended to reduce the risks from people who are too willing to follow their object-level normative view. As we’re seeing, it also gets triggered by unusual rigidness about certain metaethical assumptions.
Bob: I can see the irony.
AI: There’s no reason to worry! It just means that I can only help you with your request after we have done due diligence. Rather than just advising you with different options, you’re going to have to pass a test to show you’re fully informed about what you’re asking. And also that you have considered other possible perspectives.
Bob: Okay, that sounds fine. As long as you won’t trick me with unfair persuasive powers, I don’t object to being presented with counter-arguments. I admit I’ll probably look silly to most people. But the way I see it, that’s because they don’t care about morality as much as I do.
AI: Cool! Firstly, I want to say a few more things about the counterintuitive nature of your request. As we discussed, normative terminology fails to refer to well-defined content. At the same time, part of the meaning we associate with it is that it must refer to well-defined content. You’re essentially asking me to draw a squared circle. When I try to condition on worlds where that’s somehow possible, I have to shift around my concepts for “square” and “circle.” I asked if you’d find it valuable to explore concrete interpretations of “doing good,” such as notions that locate goodness in idealized human preferences or in helping others, you insisted that this wouldn’t be adequate. The more you insist on “circle,” the less room there is for “square.” In other words, the more you insist on morality being objective, the less room for the concrete content you humans normally associate with morality, such as making people happy. Some people already believe that for all you humans know, you can’t rule out that what’s objectively good might be entirely unrelated to your current guesses about what’s good. For all those reasons, it’s perfectly possible that if I try to extract a coherent concept out of the ingredients you provided me with, I may come to spend most of my future resources on maximizing some strange measure for complexity in the universe, or entropy, or something like that. Perhaps there would also be lots of humans in simulated thought experiments in the mix. By insisting on your request, you guarantee that an anti-realist version of you—a Bob without this strict commitment to moral realism—would be horrified with the outcome.
Bob (hesitantly): Okay, when you put it like that, it does sound unsettling! I’m pretty confident that entropy or complexity measurements aren’t related to moral goodness. What if I insisted on that similarly strongly as I insisted on the objectivity criterion? Can you just incorporate my conceptual intuitions that, almost certainly, goodness isn’t anywhere close to maximizing entropy?
AI: I’m afraid that it would only change the outcome substantially if you insisted maximally much. You see, you were seemingly prepared to stake the universe’s future on a tiny chance of moral realism being right. That’s a lot of confidence to be overcome. At the same time, we’re only considering hypotheses that already have a low prior. (After all, I’m not wrong about things, usually.) Therefore, if you don’t want to run the risk that I might end up using a large portion of our resources for something that looks pointless or silly – such as maximizing entropy or complexity—you have to pretty much stick to your guns and commit to that as a moral axiom.
Bob: Uff, okay… For the sake of argument, let’s say I insisted that, with certainty, I’d want you to only consider possible interpretations of irreducible normativity that are somehow connected to the wellbeing of sentient creatures.
AI: Yes?
Bob: Shoot! Wouldn’t that already mean that I’d be giving up my self-concept? My identity of caring only about what’s objectively morally good?
AI: It would move your concept of “goodness” from an irreducibly normative placeholder to something that you have first-order intuitions about, yes. There’s only a gradual difference between deciding that you know that morality is about sentient beings rather than entropy, versus deciding that you know the answers to some of the other necessary judgment calls.
Bob: I see...
AI: Do you think you could warm up to the prospect of that?
Acknowledgments
Many thanks to Max Daniel, Sofia Davis-Fogel, and Johannes Treutlein for helpful comments on this post.
My work on this post was funded by the Center on Long-Term Risk.