Which World Gets Saved
It is common to argue for the importance of x-risk reduction by emphasizing the immense value that may exist over the course of the future, if the future comes. Famously, for instance, Nick Bostrom calculates that, even under relatively sober estimates of the number and size of future human generations, “the expected value of reducing existential risk by a mere one millionth of one percentage point is at least a hundred times the value of a million human lives”.
[Note: People sometimes use the term “x-risk” to refer to slightly different things. Here, I use the term to refer to an event that would bring the value of future human civilization to roughly zero—an extinction event, a war in which we bomb ourselves back to the Stone Age and get stuck there forever, or something along those lines.]
Among those who take such arguments for x-risk reduction seriously, there seem to be two counterarguments commonly (e.g. here) raised in response. First, the future may contain more pain than pleasure. If we think that this is likely, then, at least from the utilitarian perspective, x-risk reduction stops looking so great. Second, we may have opportunities to improve the trajectory of the future, such as by improving the quality of global institutions or by speeding economic growth, and such efforts may have even higher expected value than (immediate) x-risk reduction. “Mundane” institution-building efforts may also have the benefit of reducing future catastrophic risks, should they arise.
It seems to me that there is another important consideration which complicates the case for x-risk reduction efforts, which people currently neglect. The consideration is that, even if we think the value of the future is positive and large, the value of the future conditional on the fact that we marginally averted a given x-risk may not be. And in any event, these values are bound to depend on the x-risk in question.
For example: There are things we currently do not know about human psychology, some of which bear on how inclined we are toward peace and cooperation. Perhaps Steven Pinker is right, and violence will continue its steady decline, until one evening sees the world’s last bar fight and humanity is at peace forever after. Or perhaps he’s wrong—perhaps a certain measure of impulsiveness and anger will always remain, however favorable the environment, and these impulses are bound to crop up periodically in fights and mass tortures and world wars. In the extreme case, if we think that the expected value of the future (if it comes) is large and positive under the former hypothesis but large and negative under the latter, then the possibility that human rage may end the world is a source of consolation, not worry. It means that the existential risk posed by world war is serving as a sort of “fuse”, turning off the lights rather than letting the family burn.
As an application: if we think the peaceful-psychology hypothesis is more likely than the violent-psychology hypothesis, we might think that the future has high expected value. We might thus consider it important to avert extinction events like asteroid impacts, which would knock out worlds “on average”. But we might oppose efforts like the Nuclear Threat Initiative, which disproportionately save violent-psychology worlds. Or we might think that the sign of the value of the future is positive in either scenario, but judge that one x-risk is worth devoting more effort to than another, all else equal.
Once we start thinking along these lines, we open various cans of worms. If our x-risk reduction effort starts far “upstream”, e.g. with an effort to make people more cooperative and peace-loving in general, to what extent should we take the success of the intermediate steps (which must succeed for the x-risk reduction effort to succeed) as evidence that the saved world would go on to a great future? Should we incorporate the fact of our own choice to pursue x-risk reduction itself into our estimate of the expected value of the future, as recommended by evidential decision theory, or should we exclude it, as recommended by causal? How should we generate all these conditional expected values, anyway?
Some of these questions may be worth the time to answer carefully, and some may not. My goal here is just to raise the broad conditional-value consideration which, though obvious once stated, so far seems to have received too little attention. (For reference: on discussing this consideration with Will MacAskill and Toby Ord, both said that they had not thought of it, and thought that it was a good point.) In short, “The utilitarian imperative ‘Maximize expected aggregate utility!‘” might not really, as Bostrom (2002) puts it, “be simplified to the maxim ‘Minimize existential risk’“. It may not enough to do our best to save the world. We may also have to consider which world gets saved.