The option value argument doesn’t work when it’s most needed

Winston24 Oct 2023 19:40 UTC

138 points

Long reflection Cause prioritization Existential risk Longtermism Philosophy Quotes Dystopia S-risk Moral uncertainty Option value

If you’re uncertain whether humanity’s future will be net positive, and therefore whether extinction risk reduction is good, you might reason that we should keep civilization going for now so we can learn more and, in the future, make a better-informed decision about whether to keep it going. After all, if we die out, we can never decide to bring humanity back. But if we continue existing, we can always shut everything down later. Call this the option value argument.

I don’t think this argument is very strong. It is exactly in the worlds where things go very badly that the option value argument doesn’t work. The inhabitants of such dystopian worlds are very unlikely to have the ability and/or motivation to carefully reflect and coordinate to stop existing, even if that would be the best thing to do.^[1] If they did, why would these worlds be so dystopian?

That is, continuing to exist will not give us the option to stop existing later when most needed. Humanity won’t have that much control or impartially altruistic motivations in worlds where the most severe s-risks occur. If the future is going very badly, we probably won’t decide to end civilization to prevent it from getting worse. The worst s-risks don’t happen when things are going so well that we can stop to reflect and decide to steer the future in a better direction. Many things have to go just right for humanity to achieve this. But if we’re on track to create s-risks, things are not going right.

The important point here is that we can’t rely on future agents to avoid s-risks by default. Reducing extinction risk doesn’t entail s-risk reduction (especially when we consider which world gets saved). Some resources should go toward preventing worst-case outcomes in advance. To be clear, the takeaway is not that we should consider increasing extinction risk, but rather that we should devote some effort toward increasing the quality of the future conditional on humanity surviving.

Below, I’ll list some quotes explaining this point in more detail.

The expected value of extinction risk reduction is positive by Jan M. Brauner and Friederike M. Grosse-Holz

Brauner and Grosse-Holz discuss “Why the ‘option value argument’ for reducing extinction risk is weak”:

Finally, we consider if future agents could make a better decision on whether to colonize space (or not) than we can, so that it seems valuable to let them decide (option value).
...

If we can defer the decision about whether to colonize space to future agents with more moral and empirical insight, doing so creates option value (part 1.3). However, most expected future disvalue plausibly comes from futures controlled by indifferent or malicious agents. Such “bad” agents will make worse decisions than we, currently, could. Thus, the option value in reducing the risk of human extinction is small.

The whole section 1.3: “1.3: Future agents could later decide not to colonize space (option value)” is relevant and worth reading. In particular, the subsection “Only the relative good futures contain option value”:

For any future scenario to contain option value, the agents in that future need to surpass us in various ways, as outlined above. This has an implication that further diminishes the relevance of the option value argument. Future agents need to have relatively good values and be relatively non-selfishness to decide not to colonize space for moral reasons. But even if these agents colonized space, they would probably do it in a relatively good manner. Most expected future disvalue plausibly comes from futures controlled by indifferent or malicious agents (like misaligned AI). Such “bad” agents will make worse decisions about whether or not to colonize space than we, currently, could, because their preferences are very different from our (reflected) preferences. Potential space colonization by indifferent or malicious agents thus generates large amounts of expected future disvalue, which cannot be alleviated by option value. Option value doesn’t help in the cases where it is most needed (see footnote for an explanatory example)^[45]

Cause prioritization for downside-focused value systems by Lukas Gloor

Some people have argued that even (very) small credences in upside-focused views, such as 1-20% for instance, would in itself already speak in favor of making extinction risk reduction a top priority because making sure there will still be decision-makers in the future provides high option value. I think this gives by far too much weight to the argument from option value. Option value does play a role, but not nearly as strong a role as it is sometimes made out to be. To elaborate, let’s look at the argument in more detail: The naive argument from option value says, roughly, that our descendants will be in a much better position to decide than we are, and if suffering-focused ethics or some other downside-focused view is indeed the outcome of their moral deliberations, they can then decide to not colonize space, or only do so in an extremely careful and controlled way. If this picture is correct, there is almost nothing to lose and a lot to gain from making sure that our descendants get to decide how to proceed.
I think this argument to a large extent misses the point, but seeing that even some well-informed effective altruists seem to believe that it is very strong led me realize that I should write a post explaining the landscape of cause prioritization for downside-focused value systems. The problem with the naive argument from option value is that the decision algorithm that is implicitly being recommended in the argument, namely focusing on extinction risk reduction and leaving moral philosophy (and s-risk reduction in case the outcome is a downside-focused morality) to future generations, makes sure that people follow the implications of downside-focused morality in precisely the one instance where it is least needed, and never otherwise. If the future is going to be controlled by philosophically sophisticated altruists who are also modest and willing to change course given new insights, then most bad futures will already have been averted in that scenario. An outcome where we get long and careful reflection without downsides is far from the only possible outcome. In fact, it does not even seem to me to be the most likely outcome (although others may disagree). No one is most worried about a scenario where epistemically careful thinkers with their heart in the right place control the future; the discussion is instead about whether the probability that things will accidentally go off the rails warrants extra-careful attention. (And it is not as though it looks like we are particularly on the rails currently either.) Reducing non-AI extinction risk does not preserve much option value for downside-focused value systems because most of the expected future suffering probably comes not from scenarios where people deliberately implement a solution they think is best after years of careful reflection, but instead from cases where things unexpectedly pass a point of no return and compassionate forces do not get to have control over the future. Downside risks by action likely loom larger than downside risks by omission, and we are plausibly in a better position to reduce the most pressing downside risks now than later. (In part because “later” may be too late.)
This suggests that if one is uncertain between upside- and downside-focused views, as opposed to being uncertain between all kinds of things except downside-focused views, the argument from option value is much weaker than it is often made out to be. Having said that, non-naively, option value still does upshift the importance of reducing extinction risks quite a bit – just not by an overwhelming degree. In particular, arguments for the importance of option value that do carry force are for instance:
There is still some downside risk to reduce after long reflection
Our descendants will know more about the world, and crucial considerations in e.g. infinite ethics or anthropics could change the way we think about downside risks (in that we might for instance realize that downside risks by omission loom larger than we thought)
One’s adoption of (e.g.) upside-focused views after long reflection may correlate favorably with the expected amount of value or disvalue in the future (meaning: conditional on many people eventually adopting upside-focused views, the future is more valuable according to upside-focused views than it appears during an earlier state of uncertainty)
The discussion about the benefits from option value is interesting and important, and a lot more could be said on both sides. I think it is safe to say that the non-naive case for option value is not strong enough to make extinction risk reduction a top priority given only small credences in upside-focused views, but it does start to become a highly relevant consideration once the credences become reasonably large. Having said that, one can also make a case that improving the quality of the future (more happiness/value and less suffering/disvalue) conditional on humanity not going extinct is probably going to be at least as important for upside-focused views and is more robust under population ethical uncertainty – which speaks particularly in favor of highly prioritizing existential risk reduction through AI policy and AI alignment.

Beginner’s guide to reducing s-risks by Anthony DiGiovanni

It has been argued that, under moral uncertainty, the most robustly positive approach to improving the long-term future is to preserve option value for humans and our descendants, and this entails prioritizing reducing risks of human extinction (MacAskill). That is, suppose we refrain from optimizing for the best action under our current moral views (which might be s-risk reduction), in order to increase the chance that humans survive to engage in extensive moral reflection.⁹ The claim is that the downside of temporarily taking this suboptimal action, by the lights of our current best guess, is outweighed by the potential upside of discovering and acting upon other moral priorities that we would otherwise neglect.
One counterargument is that futures with s-risks, not just those where humans go extinct, tend to be futures where typical human values have lost control over the future, so the option value argument does not privilege extinction risk reduction. First, if intelligent beings from Earth initiate space settlement before a sufficiently elaborate process of collective moral reflection, the astronomical distances between the resulting civilizations could severely reduce their capacity to coordinate on s-risk reduction (or any moral priority) (MacAskill 2022, Ch. 4; Gloor 2018). Second, if AI agents permanently disempower humans, they may cause s-risks as well. To the extent that averting s-risks is more tractable than ensuring AIs do not want to disempower humans at all (see next section), or one has a comparative advantage in s-risk reduction, option value does not necessarily favor working on extinction risks from AI.

Acknowledgments

Thanks to David Althaus and Lukas Gloor for comments and discussion.

^
I personally don’t expect (post-)humans will carefully reflect and coordinate to do the best thing even in futures that go fairly well, but that’s more open to discussion. And in any case, it’s not a crux for the option value argument.

What links here?

Winston24 Oct 2023 19:40 UTC

138 points

7 comments6 min readEA link

Long reflection Cause prioritization Existential risk Longtermism Philosophy Quotes Dystopia S-risk Moral uncertainty Option value

Magnus Vinding 25 Oct 2023 7:36 UTC
27 points
10 ∶ 1
I think this is an important point. In general terms, it seems worth keeping in mind that option value also entails option disvalue (e.g. the option of losing control and giving rise to a worst-case future).
Regarding long reflection in particular, I notice that the quotes above seem to mostly mention it in a positive light, yet its feasibility and desirability can also be separately criticized, as I’ve tried to do elsewhere:
First, there are reasons to doubt that a condition of long reflection is feasible or even desirable, given that it would seem to require strong limits to voluntary actions that diverge from the ideal of reflection. To think that we can choose to create a condition of long reflection may be an instance of the illusion of control. Human civilization is likely to develop according to its immediate interests, and seems unlikely to ever be steered via a common process of reflection.
Second, even if we were to secure a condition of long reflection, there is no guarantee that humanity would ultimately be able to reach a sufficient level of agreement regarding the right path forward — after all, it is conceivable that a long reflection could go awfully wrong, and that bad values could win out due to poor execution or malevolent agents hijacking the process.
The limited feasibility of a long reflection suggests that there is no substitute for reflecting now. Failing to clarify and act on our values from this point onward carries a serious risk of pursuing a suboptimal path that we may not be able to reverse later. The resources we spend pursuing a long reflection (which seems unlikely to ever occur) are resources not spent on addressing issues that might be more important and more time-sensitive, such as steering away from worst-case outcomes.
Brian_Tomasik 9 Jan 2024 3:36 UTC
22 points
7 ∶ 0
It’s great to have these quotes all in one place. :)

In addition to the main point you made—that the futures containing the most suffering are often the ones that it’s too late to stop—I would also argue that even reflective, human-controlled futures could be pretty terrible because a lot of humans have (by my lights) some horrifying values. For example, human-controlled futures might accept enormous s-risks for the sake of enormous positive value, might endorse strong norms of retribution, might severely punish outgroups or heterodoxy, might value giving agents free will more than preventing harm (cf. the “free will theodicy”), and so on.

The option-value argument works best when I specifically am the one whose options are being kept open (although even in this case there can be concerns about losing my ideals, becoming selfish, being corrupted by other influences, etc). But humanity as a whole is a very different agent from myself, and I don’t trust humanity to make the same choices I would; often the exact opposite.

If paperclip maximizers wait to tile the universe with paperclips because they want to first engage in a Long Reflection to figure out if those paperclips should be green or blue, or whether they should instead be making staples, this isn’t exactly reassuring.
SiebeRozendal 26 Oct 2023 9:16 UTC
11 points
1 ∶ 0
Good to see this point made on the forum! I discuss this as well in my 2019 MA Philosophy thesis (based off similar sources): http://www.sieberozendal.com/wp-content/uploads/2020/01/Rozendal-S.T.-2019-Uncertainty-About-the-Expected-Moral-Value-of-the-Long-Term-Future.-MA-Thesis.pdf

Only when humanity is both able and motivated to significantly change the course of the future do we have option value. However, suppose that our descendants both have the ability and the motivation to affect the future for the good of everyone, such that a future version of humanity is wise enough to recognize when the expected value of the future is negative and coordinated and powerful enough to go extinct or make other significant changes. As other authors have raised (Brauner & Grosse-Holz, 2018), given such a state of affairs it seems unlikely that the future would be bad! After all, humanity would be wise, powerful, and coordinated. Most of the bad futures we are worried about do not follow from such a version of humanity, but from a version that is powerful but unwise and/or uncoordinated.

To be clear, there would be a small amount of option value. There could be some fringe cases in which a wise and powerful future version of humanity would have good reason to expect the future to be better if they went extinct, and be able to do so. Or perhaps it would be possible for a small group of dedicated, altruistic agents to bring humanity to extinction, without risking even worse outcomes. At the same time they would need to be unable to improve humanity’s trajectory significantly in any other way for extinction to be their highest priority. Furthermore, leaving open this option also works the other way around: a small group of ambitious individuals could make humanity go extinct if the future looks overwhelmingly positive.

Never got around to putting that excerpt on the forum
JackM 25 Oct 2023 0:49 UTC
9 points
3 ∶ 0
I’m sorry, this doesn’t engage with the main point(s) you are trying to make, but I’m not sure why you use the term “existential risk” (which you define as risks of human extinction and undesirable lock-ins that don’t involve s-risk-level suffering) when you could have just used the term “extinction risk”.
You say:
If you’re uncertain whether humanity’s future will be net positive, and therefore whether existential risk^[1] reduction is good, you might reason that we should keep civilization going for now so we can learn more and, in the future, make a better-informed decision about whether to keep it going.
Reducing extinction risk if humanity’s future is net negative is bad. However, reducing risk of “undesirable lock-ins” seems robustly good no matter what the expected value of the future is. So I’m not sure bucketing these two together under the heading of “existential risk” really works.
- Winston 25 Oct 2023 1:51 UTC
  8 points
  2 ∶ 0
  Parent
  Thanks :) Good point.
  
  Minor point: I don’t think it’s strictly true that reducing risks of undesirable lock-ins is robustly good no matter what the expected value of the future is. It could be that a lock-in is not good, but it prevents an even worse outcome from occurring.
  
  I included other existential risks in order to counter the following argument: “As long as we prevent non-s-risk-level undesirable lock-ins in the near-term, future people can coordinate to prevent s-risks.” This is a version of the option value argument that isn’t about extinction risk. I realize this might be a weird argument for someone to make, but I covered it to be comprehensive.
  
  But the way I wrote this, I was pretty much just focused on extinction risk. So I agree it doesn’t make a lot of sense to include other kinds of x-risks. I’ll edit this now.
Wei Dai 18 Nov 2025 0:33 UTC
7 points
0 ∶ 0
I wish you titled the post something like “The option value argument for preventing extinction doesn’t work”. Your current title (“The option value argument doesn’t work when it’s most needed”) has the unfortunate side effects of:
1. People being more likely to misinterpret or misremember your post as claiming that trying to increase option value doesn’t work in general.
2. Reducing extinction risk becomes the most salient example of an idea for increasing option value.
3. People using “the option value argument” to mean the the option value argument for preventing extinction, even when this can’t be inferred from context. (See example.)
4. It’s harder to use the phrase “the option value argument” contextually to refer to the option value argument currently or previously discussed, when it’s not about extinction risk, due to it becoming a term of art for “the option value argument for preventing extinction”.
I think it may not be too late to change the title and stop or reverse these effects.
SummaryBot 25 Oct 2023 12:44 UTC
6 points
0 ∶ 0
Executive summary: The option value argument for reducing extinction risk is weak, since it fails in the dystopian futures where it’s most needed. Future agents in such worlds likely won’t have the motivation or coordination to shut down civilization.
Key points:
1. The option value argument says we should reduce extinction risk so future agents can decide whether to continue civilization. But this requires dystopian futures to have the altruism and coordination to shut down, which is unlikely.
2. Bad futures with indifferent or malicious agents won’t make the right decisions about ending civilization. So option value doesn’t help in the cases where it’s most needed.
3. Most expected disvalue comes from uncontrolled futures passing points of no return, not deliberate choices after moral reflection. So option value doesn’t preserve much value for downside-focused views.
4. Reducing extinction risk doesn’t entail reducing s-risks, which could still occur in survived dystopian futures. So it’s not the most robust approach under moral uncertainty.
5. Some option value remains, but it is not strong enough alone to make extinction risk an overwhelming priority compared to improving future quality.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.