Many of those posts in the list seem really relevant to me for the cluster of things you’re pointing at!
On some of the philosophical background assumptions, I would consider adding my ambitiously-titled post The Moral Uncertainty Rabbit Hole, Fully Excavated. (It’s the last post in my metaethics/anti-realism sequence.)
Since the post is long and it says that it doesn’t work maximally well as a standalone piece (without two other posts from earlier in my sequence), it didn’t get much engagement when I published it, so I feel like I should do some advertizing for it here.
As the title indicates, I’m trying to answer questions in that post that many EAs don’t ask themselves because they think about moral uncertainty or moral reflection in an IMO somewhat lazy way.
The post starts with a conundrum for the concept of moral uncertainty:
In an earlier post, I argued that moral uncertainty and confident moral realism don’t go together. Accordingly, if we’re morally uncertain, we must either endorse moral anti-realism or at least put significant credence on it.
This insight has implications because we’re now conflating a few different things under the “moral uncertainty” label:
Metaethical uncertainty (i.e., our remaining probability on moral realism) and the strength of possible wagers for acting as though moral realism is true even if our probability in it is low.
Uncertainty over the values we’d choose after long reflection (our “idealized values”, which most people would be motivated to act upon even if moral realism is false).
Related to how we’d get to idealized values, the possibility of having under-defined values, i.e., the possibility that, because moral realism is false, even idealized moral reflection may lead to different endpoints based on very small changes to the procedure, or that a person’s reflection doesn’t “terminate” because their subjective feeling of uncertainty never goes away inside the envisioned reflection procedure.
My post is all about further elaborating on these distinctions and spelling out their implications for effective altruists.
I start out by introducing the notion of a moral reflection procedure to explain what moral reflection in an idealized setting could look like:
To specify the meaning of “perfectly wise and informed,” we can envision a suitable procedure for moral reflection that a person would hypothetically undergo. Such a reflection procedure comprises a reflection environment and a reflection strategy. The reflection environment describes the options at one’s disposal; the reflection strategy describes how a person would use those options.
Here’s one example of a reflection environment:
My favorite thinking environment: Imagine a comfortable environment tailored for creative intellectual pursuits (e.g., a Google campus or a cozy mansion on a scenic lake in the forest). At your disposal, you find a well-intentioned, superintelligent AI advisor fluent in various schools of philosophy and programmed to advise in a value-neutral fashion. (Insofar as that’s possible – since one cannot do philosophy without a specific methodology, the advisor must already endorse certain metaphilosophical commitments.) Besides answering questions, they can help set up experiments in virtual reality, such as ones with emulations of your brain or with modeled copies of your younger self. For instance, you can design experiments for learning what you’d value if you first encountered the EA community in San Francisco rather than in Oxford or started reading Derek Parfit or Peter Singer after the blog Lesswrong, instead of the other way around.[2] You can simulate conversations with select people (e.g., famous historical figures or contemporary philosophers). You can study how other people’s reflection concludes and how their moral views depend on their life circumstances. In the virtual-reality environment, you can augment your copy’s cognition or alter its perceptions to have it experience new types of emotions. You can test yourself for biases by simulating life as someone born with another gender(-orientation), ethnicity, or into a family with a different socioeconomic status. At the end of an experiment, your (near-)copies can produce write-ups of their insights, giving you inputs for your final moral deliberations. You can hand over authority about choosing your values to one of the simulated (near-)copies (if you trust the experimental setup and consider it too difficult to convey particular insights or experiences via text). Eventually, the person with the designated authority has to provide to your AI assistant a precise specification of values (the format – e.g., whether it’s a utility function or something else – is up to you to decide on). Those values then serve as your idealized values after moral reflection.
(Two other, more rigorously specified reflection procedures are indirect normativity and HCH.[3] Indirect normativity outputs a utility function whereas HCH attempts to formalize “idealized judgment,” which we could then consult for all kinds of tasks or situations.)[4]
“My favorite thinking environment” leaves you in charge as much as possible while providing flexible assistance. Any other structure is for you to specify: you decide the reflection strategy.[5] This includes what questions to ask the AI assistant, what experiments to do (if any), and when to conclude the reflection.
For reflection strategies (how to behave inside a reflection procedure), I discuss a continuum from “conservative” to “open-minded” reflection strategies.
Someone with a conservative reflection strategy is steadfast in their moral reasoning framework. ((What I mean by “moral-reasoning framework” is similar to what Wei Dai calls “metaphilosophy” – it implies having confidence in a particular metaphilosophical stance and using that stance to form convictions about one’s reasoning methodology or object-level moral views.)) They guard their opinions, which turns these into convictions (“convictions” being opinions that one safeguards against goal drift). At its extreme, someone with a maximally conservative reflection strategy has made up their mind and no longer benefits from any moral reflection. People can have moderately conservative reflection strategies where they have formed convictions on some issues but not others.
By contrast, people with open-minded moral reflection strategies are uncertain about either their moral reasoning framework or (at least) their object-level moral views. As the defining feature, they take a passive (“open-minded”) reflection approach focused on learning as much as possible without privileging specific views[7] and without (yet) entering a mode where they form convictions.
That said, “forming convictions” is not an entirely voluntary process – sometimes, we can’t help but feel confident about something after learning the details of a particular debate. As I’ll elaborate below, it is partly for this reason that I think no reflection strategy is inherently superior.
Comparing these two reflection strategies is a core theme of the post, and one takeaway I get to is that none of the two ends of the spectrum is superior to the other. Instead, I see moral reflection as a bit of an art, and we just have to find our personal point on the spectrum.
Relatedly, there’s also the question of “What’s the benefit of reflection now” vs. “how much do we want to just leave things to future selves or hypothetical future selves in a reflection procedure.” (The point being that it is is not by-default obvious that moral reflection has to be postponed!)
Reflection procedures are thinking-and-acting sequences we’d undergo if we had unlimited time and resources. While we cannot properly run a moral reflection procedure right now in everyday life, we can still narrow down our uncertainty over the hypothetical reflection outcome. Spending time on that endeavor is worth it if the value of information – gaining clarity on one’s values – outweighs the opportunity cost from acting under one’s current (less certain) state of knowledge.
Gaining clarity on our values is easier for those who would employ a more conservative reflection strategy in their moral reflection procedure. After all, that means their strategy involves guarding some pre-existing convictions, which gives them advance knowledge of the direction of their moral reflection.[9]
By contrast, people who would employ more open-minded reflection strategies may not easily be able to move past specific layers of indecision. Because they may be uncertain how to approach moral reasoning in the first place, they can be “stuck” in their uncertainty. (Their hope is to get unstuck once they are inside the reflection procedure, once it becomes clearer how to proceed.)
[...]
If moral realism were true, the timing of that transition (“the reflection strategy becoming increasingly conservative as the person forms more convictions”) is obvious. It would happen once the person knows enough to see the correct answers, once they see the correct way of narrowing down their reflection or (eventually) the correct values to adopt at the very end of it.
In the moral realist picture, expressions like “safeguarding opinions” or “forming convictions” (which I use interchangeably) seem out of place. Obviously, the idea is to “form convictions” about only the correct principles!
However, as I’ve argued in previous posts, moral realism is likely false.
This is then followed by a discussion on whether “idealized values” are chosen or discovered.
Under moral anti-realism, there are two empirical possibilities[10] for “When is someone ready to form convictions?.” In the first possibility, things work similarly to naturalist moral realism but on a personal/subjectivist basis. We can describe this option as “My idealized values are here for me to discover.” By this, I mean that, at any given moment, there’s a fact of the matter to “What I’d conclude with open-minded moral reflection.” (Specifically, a unique fact – it cannot be that I would conclude vastly different things in different runs of the reflection procedure or that I would find myself indifferent about a whole range of options.)
The second option is that my idealized values aren’t “here for me to discover.” In this view, open-minded reflection is too passive – therefore, we have to create our values actively. Arguments for this view include that (too) open-minded reflection doesn’t reliably terminate; instead, one must bring normative convictions to the table. “Forming convictions,” according to this second option, is about making a particular moral view/outlook a part of one’s identity as a morality-inspired actor. Finding one’s values, then, is not just about intellectual insights.
I will argue that the truth is somewhere in between.
Why do I think this? There’s more in my post, but here are some of the interesting bits, which seem especially relevant to the topic of “long reflection”:
There are two reasons why I think open-minded reflection isn’t automatically best:
We have to make judgment calls about how to structure our reflection strategy. Making those judgment calls already gets us in the business of forming convictions. So, if we are qualified to do that (in “pre-reflection mode,” setting up our reflection procedure), why can’t we also form other convictions similarly early?
Reflection procedures come with an overwhelming array of options, and they can be risky (in the sense of having pitfalls – see later in this section). Arguably, we are closer (in the sense of our intuitions being more accustomed and therefore, arguably, more reliable) to many of the fundamental issues in moral philosophy than to matters like “carefully setting up a sequence of virtual reality thought experiments to aid an open-minded process of moral reflection.” Therefore, it seems reasonable/defensible to think of oneself as better positioned to form convictions about object-level morality (in places where we deem it safe enough).
Reflection strategies require judgment calls
In this section, I’ll elaborate on how specifying reflection strategies requires many judgment calls. The following are some dimensions alongside which judgment calls are required (many of these categories are interrelated/overlapping):
Social distortions: Spending years alone in the reflection environment could induce loneliness and boredom, which may have undesired effects on the reflection outcome. You could add other people to the reflection environment, but who you add is likely to influence your reflection (e.g., because of social signaling or via the added sympathy you may experience for the values of loved ones).
Transformative changes: Faced with questions like whether to augment your reasoning or capacity to experience things, there’s always the question “Would I still trust the judgment of this newly created version of myself?”
Distortions from (lack of) competition: As Wei Dai points out in this Lesswrong comment: “Current human deliberation and discourse are strongly tied up with a kind of resource gathering and competition.” By competition, he means things like “the need to signal intelligence, loyalty, wealth, or other ‘positive’ attributes.” Within some reflection procedures (and possibly depending on your reflection strategy), you may not have much of an incentive to compete. On the one hand, a lack of competition or status considerations could lead to “purer” or more careful reflection. On the other hand, perhaps competition functions as a safeguard, preventing people from adopting values where they cannot summon sufficient motivation under everyday circumstances. Without competition, people’s values could become decoupled from what ordinarily motivates them and more susceptible to idiosyncratic influences, perhaps becoming more extreme.
Lack of morally urgent causes: In the blogpost On Caring, Nate Soares writes: “It’s not enough to think you should change the world — you also need the sort of desperation that comes from realizing that you would dedicate your entire life to solving the world’s 100th biggest problem if you could, but you can’t, because there are 99 bigger problems you have to address first.” In that passage, Soares points out that desperation can strongly motivate why some people develop an identity around effective altruism. Interestingly enough, in some reflection environments (including “My favorite thinking environment”), the outside world is on pause. As a result, the phenomenology of “desperation” that Soares described would be out of place. If you suffered from poverty, illnesses, or abuse, these hardships are no longer an issue. Also, there are no other people to lift out of poverty and no factory farms to shut down. You’re no longer in a race against time to prevent bad things from happening, seeking friends and allies while trying to defend your cause against corrosion from influence seekers. This constitutes a massive change in your “situation in the world.” Without morally urgent causes, you arguably become less likely to go all-out by adopting an identity around solving a class of problems you’d deem urgent in the real world but which don’t appear pressing inside the reflection procedure. Reflection inside the reflection procedure may feel more like writing that novel you’ve always wanted to write – it has less the feel of a “mission” and more of “doing justice to your long-term dream.”[11]
Ordering effects: The order in which you learn new considerations can influence your reflection outcome. (See page 7 in this paper. Consider a model of internal deliberation where your attachment to moral principles strengthens whenever you reach reflective equilibrium given everything you already know/endorse.)
Persuasion and framing effects: Even with an AI assistant designed to give you “value-neutral” advice, there will be free parameters in the AI’s reasoning that affect its guidance and how it words things. Framing effects may also play a role when interacting with other humans (e.g., epistemic peers, expert philosophers, friends, and loved ones).
Pitfalls of reflection procedures
There are also pitfalls to avoid when picking a reflection strategy. The failure modes I list below are avoidable in theory,[12] but they could be difficult to avoid in practice:
Going off the rails: Moral reflection environments could be unintentionally alienating (enormous option space; time spent reflecting could be unusually long). Failure modes related to the strangeness of the moral reflection environment include existential breakdown and impulsively deciding to lock in specific values to be done with it.
Issues with motivation and compliance: When you set up experiments in virtual reality, the people in them (including copies of you) may not always want to play along.
Value attacks: Attackers could simulate people’s reflection environments in the hope of influencing their reflection outcomes.
Addiction traps: Superstimuli in the reflection environment could cause you to lose track of your goals. For instance, imagine you started asking your AI assistant for an experiment in virtual reality to learn about pleasure-pain tradeoffs or different types of pleasures. Then, next thing you know, you’ve spent centuries in pleasure simulations and have forgotten many of your lofty ideals.
Unfairly persuasive arguments: Some arguments may appeal to people because they exploit design features of our minds rather than because they tell us “What humans truly want.” Reflection procedures with argument search (e.g., asking the AI assistant for arguments that are persuasive to lots of people) could run into these unfairly compelling arguments. For illustration, imagine a story like “Atlas Shrugged” but highly persuasive to most people. We can also think of “arguments” as sequences of experiences: Inspired by the Narnia story, perhaps there exists a sensation of eating a piece of candy so delicious that many people become willing to sell out all their other values for eating more of it. Internally, this may feel like becoming convinced of some candy-focused morality, but looking at it from the outside, we’ll feel like there’s something problematic about how the moral update came about.)
Subtle pressures exerted by AI assistants: AI assistants trained to be “maximally helpful in a value-neutral fashion” may not be fully neutral, after all. (Complete) value-neutrality may be an illusory notion, and if the AI assistants mistakenly think they know our values better than we do, their advice could lead us astray. (See Wei Dai’s comments in this thread for more discussion and analysis.)
Conclusion: “One has to actively create oneself”
“Moral reflection” sounds straightforward – naively, one might think that the right path of reflection will somehow reveal itself. However, as we think of the complexities of setting up a suitable reflection environment and how we’d proceed inside it, what it would be like and how many judgment calls we’d have to make, we see that things can get tricky.
Joe Carlsmith summarized it as follows in an excellent post (what Carlsmith calls “idealizing subjectivism” corresponds to what I call “deferring to moral reflection”):
>My current overall take is that especially absent certain strong empirical assumptions, >idealizing subjectivism is ill-suited to the role some hope it can play: namely, providing >a privileged and authoritative (even if subjective) standard of value. Rather, the >version of the view I favor mostly reduces to the following (mundane) observations:
If you already value X, it’s possible to make instrumental mistakes relative to X.
You can choose to treat the outputs of various processes, and the attitudes of various hypothetical beings, as authoritative to different degrees.
>This isn’t necessarily a problem. To me, though, it speaks against treating your >“idealized values” the way a robust meta-ethical realist treats the “true values.” That is, >you cannot forever aim to approximate the self you “would become”; you must actively >create yourself, often in the here and now. Just as the world can’t tell you what to >value, neither can your various hypothetical selves — unless you choose to let them. Ultimately, it’s on you.
In my ((Lukas’s)) words, the difficulty with deferring to moral reflection too much is that the benefits of reflection procedures (having more information and more time to think; having access to augmented selves, etc.) don’t change what it feels like, fundamentally, to contemplate what to value. For all we know, many people would continue to feel apprehensive about doing their moral reasoning “the wrong way” since they’d have to make judgment calls left and right. Plausibly, no “correct answers” would suddenly appear to us. To avoid leaving our views under-defined, we have to – at some point – form convictions by committing to certain principles or ways of reasoning. As Carslmith describes it, one has to – at some point – “actively create oneself.” (The alternative is to accept the possibility that one’s reflection outcome may be under-defined.)
It is possible to delay the moment of “actively creating oneself” to a time within the reflection procedure. (This would correspond to an open-minded reflection strategy; there are strong arguments to keep one’s reflection strategy at least moderately open-minded.) However, note that, in doing so, one “actively creates oneself” as someone who trusts the reflection procedure more than one’s object-level moral intuitions or reasoning principles. This may be true for some people, but it isn’t true for everyone. Alternatively, it could be true for someone in some domains but not others.[13]
I further discuss the notion of “having under-defined values.” This happens if someone defers to moral reflection with the expectation that it’ll terminate with a specific answer, but they’re pre-disposed to following reflection strategies that are open-ended enough so that the reflection will, in practice, have under-defined outcomes.
Having under-defined values isn’t necessarily a problem – I discuss the pros and cons of it in the post.
Towards the end of the post, there’s a section where I discuss the IMO most sophisticated wager for “acting as though moral realism is true” (the wager for naturalist moral realism, rather than the one for non-naturalist/irreducible-normativity-based moral realism which I discussed earlier in my sequence). In that discussion, I conclude that this naturalist moral realism wager actually often doesn’t overpower what we’d do anyway under anti-realism. (The reasoning here is that naturalist moral realism feels somewhat watered down compared to non-naturalist moral realism, so that it’s actually “built on the same currency” as how we’d anyway structure our reasoning under moral anti-realism. Consequently, whether naturalist moral realism is true isn’t too different from the question of whether idealized values are chosen or discovered – it’s just that now we’re also asking about the degree of moral convergence between different people’s reflection.)
Anyway, that section is hard to summarize, so I recommend just reading it in full in the post (it has pictures and a fun “mountain analogy.”)
Lastly, I end the post with some condensed takeaways in the form of advice for someone’s moral reflection:
Selected takeaways: good vs. bad reasons for deferring to (more) moral reflection
To list a few takeaways from this post, I made a list of good and bad reasons for deferring (more) to moral reflection. (Note, again, that deferring to moral reflection comes on a spectrum.)
In this context, it’s important to note that deferring to moral reflection would be wise if moral realism is true or if idealized values are ((on the far end of the spectrum of)) “here for us to discover.” In this sequence, I argued that neither of those is true – but some (many?) readers may disagree.
Assuming that I’m right about the flavor of moral anti-realism I’ve advocated for in this sequence, below are my “good and bad reasons for deferring to moral reflection.”
(Note that this is not an exhaustive list, and it’s pretty subjective. Moral reflection feels more like an art than a science.)
Bad reasons for deferring strongly to moral reflection:
You haven’t contemplated the possibility that the feeling of “everything feels a bit arbitrary; I hope I’m not somehow doing moral reasoning the wrong way” may never go away unless you get into a habit of forming your own views. Therefore, you never practiced the steps that could lead to you forming convictions. Because you haven’t practiced those steps, you assume you’re far from understanding the option space well enough, which only reinforces your belief that it’s too early for you to form convictions.
You observe that other people’s fundamental intuitions about morality differ from yours. You consider that an argument for trusting your reasoning and your intuitions less than you otherwise would. As a result, you lack enough trust in your reasoning to form convictions early.
You have an unreflected belief that things don’t matter if moral anti-realism is true. You want to defer strongly to moral reflection because there’s a possibility that moral realism is true. However, you haven’t thought about the argument that naturalist moral realism and moral anti-realism use the same currency, i.e., that the moral views you’d adopt if moral anti-realism were true might matter just as much to you.
Good reasons for deferring strongly to moral reflection:
You don’t endorse any of the bad reasons, and you still feel drawn to deferring to moral reflection. For instance, you feel genuinely unsure how to reason about moral views or what to think about a specific debate (despite having tried to form opinions).
You think your present way of visualizing the moral option space is unlikely to be a sound basis for forming convictions. You suspect that it is likely to be highly incomplete or even misguided compared to how you’d frame your options after learning more science and philosophy inside an ideal reflection environment.
Bad reasons for forming some convictions early:
You think moral anti-realism means there’s no for-you-relevant sense in which you can be wrong about your values.
You think of yourself as a rational agent, and you believe rational agents must have well-specified “utility functions.” Hence, ending up with under-defined values (which is a possible side-effect of deferring strongly to moral reflection) seems irrational/unacceptable to you.
Good reasons for forming some convictions early:
You can’t help it, and you think you have a solid grasp of the moral option space (e.g., you’re likely to pass Ideological Turing tests of some prominent reasoners who conceptualize it differently).
You distrust your ability to guard yourself against unwanted opinion drift inside moral reflection procedures ((if you were to follow a more open-minded reflection strategy)), and the views you already hold feel too important to expose to that risk.
Many of those posts in the list seem really relevant to me for the cluster of things you’re pointing at!
On some of the philosophical background assumptions, I would consider adding my ambitiously-titled post The Moral Uncertainty Rabbit Hole, Fully Excavated. (It’s the last post in my metaethics/anti-realism sequence.)
Since the post is long and it says that it doesn’t work maximally well as a standalone piece (without two other posts from earlier in my sequence), it didn’t get much engagement when I published it, so I feel like I should do some advertizing for it here.
As the title indicates, I’m trying to answer questions in that post that many EAs don’t ask themselves because they think about moral uncertainty or moral reflection in an IMO somewhat lazy way.
The post starts with a conundrum for the concept of moral uncertainty:
This insight has implications because we’re now conflating a few different things under the “moral uncertainty” label:
Metaethical uncertainty (i.e., our remaining probability on moral realism) and the strength of possible wagers for acting as though moral realism is true even if our probability in it is low.
Uncertainty over the values we’d choose after long reflection (our “idealized values”, which most people would be motivated to act upon even if moral realism is false).
Related to how we’d get to idealized values, the possibility of having under-defined values, i.e., the possibility that, because moral realism is false, even idealized moral reflection may lead to different endpoints based on very small changes to the procedure, or that a person’s reflection doesn’t “terminate” because their subjective feeling of uncertainty never goes away inside the envisioned reflection procedure.
My post is all about further elaborating on these distinctions and spelling out their implications for effective altruists.
I start out by introducing the notion of a moral reflection procedure to explain what moral reflection in an idealized setting could look like:
For reflection strategies (how to behave inside a reflection procedure), I discuss a continuum from “conservative” to “open-minded” reflection strategies.
Comparing these two reflection strategies is a core theme of the post, and one takeaway I get to is that none of the two ends of the spectrum is superior to the other. Instead, I see moral reflection as a bit of an art, and we just have to find our personal point on the spectrum.
Relatedly, there’s also the question of “What’s the benefit of reflection now” vs. “how much do we want to just leave things to future selves or hypothetical future selves in a reflection procedure.” (The point being that it is is not by-default obvious that moral reflection has to be postponed!)
This is then followed by a discussion on whether “idealized values” are chosen or discovered.
Why do I think this? There’s more in my post, but here are some of the interesting bits, which seem especially relevant to the topic of “long reflection”:
I further discuss the notion of “having under-defined values.” This happens if someone defers to moral reflection with the expectation that it’ll terminate with a specific answer, but they’re pre-disposed to following reflection strategies that are open-ended enough so that the reflection will, in practice, have under-defined outcomes.
Having under-defined values isn’t necessarily a problem – I discuss the pros and cons of it in the post.
Towards the end of the post, there’s a section where I discuss the IMO most sophisticated wager for “acting as though moral realism is true” (the wager for naturalist moral realism, rather than the one for non-naturalist/irreducible-normativity-based moral realism which I discussed earlier in my sequence). In that discussion, I conclude that this naturalist moral realism wager actually often doesn’t overpower what we’d do anyway under anti-realism. (The reasoning here is that naturalist moral realism feels somewhat watered down compared to non-naturalist moral realism, so that it’s actually “built on the same currency” as how we’d anyway structure our reasoning under moral anti-realism. Consequently, whether naturalist moral realism is true isn’t too different from the question of whether idealized values are chosen or discovered – it’s just that now we’re also asking about the degree of moral convergence between different people’s reflection.)
Anyway, that section is hard to summarize, so I recommend just reading it in full in the post (it has pictures and a fun “mountain analogy.”)
Lastly, I end the post with some condensed takeaways in the form of advice for someone’s moral reflection: