The expected value of extinction risk reduction is positive

By Jan M. Brauner and Fried­er­ike M. Grosse-Holz

Work on this ar­ti­cle has been funded by the Cen­tre for Effec­tive Altru­ism, but the ar­ti­cle rep­re­sents the per­sonal views of the au­thors.

Short summary

There are good rea­sons to care about sen­tient be­ings liv­ing in the mil­lions of years to come. Car­ing about the fu­ture of sen­tience is some­times taken to im­ply re­duc­ing the risk of hu­man ex­tinc­tion as a moral pri­or­ity. How­ever, this im­pli­ca­tion is not ob­vi­ous so long as one is un­cer­tain whether a fu­ture with hu­man­ity would be bet­ter or worse than one with­out it.

In this ar­ti­cle, we try to give an all-things-con­sid­ered an­swer to the ques­tion: “Is the ex­pected value of efforts to re­duce the risk of hu­man ex­tinc­tion pos­i­tive or nega­tive?”. Among oth­ers, we cover the fol­low­ing points:

  • What hap­pens if we sim­ply tally up the welfare of cur­rent sen­tient be­ings on earth and ex­trap­o­late into the fu­ture; and why that isn’t a good idea

  • Think­ing about the pos­si­ble val­ues and prefer­ences of fu­ture gen­er­a­tions, how these might al­ign with ours, and what that implies

  • Why the “op­tion value ar­gu­ment” for re­duc­ing ex­tinc­tion risk is weak

  • How the po­ten­tial of a non-hu­man an­i­mal civil­i­sa­tion or an ex­tra-ter­res­trial civil­i­sa­tion tak­ing over af­ter hu­man ex­tinc­tion in­creases the ex­pected value of ex­tinc­tion risk reduction

  • Why, if we had more em­piri­cal in­sight or moral re­flec­tion, we might have moral con­cern for things out­side of earth, and how that in­creases the value of ex­tinc­tion risk reduction

  • How avoid­ing a global catas­tro­phe that would not lead to ex­tinc­tion can have very long-term effects

Long Summary

If most ex­pected value or dis­value lies in the billions of years to come, al­tru­ists should plau­si­bly fo­cus their efforts on im­prov­ing the long-term fu­ture. It is not clear whether re­duc­ing the risk of hu­man ex­tinc­tion would, in ex­pec­ta­tion, im­prove the long-term fu­ture, be­cause a fu­ture with hu­man­ity may be bet­ter or worse than one with­out it.

From a con­se­quen­tial­ist, welfarist view, most ex­pected value (EV) or dis­value of the fu­ture comes from sce­nar­ios in which (post-)hu­man­ity colonizes space, be­cause these sce­nar­ios con­tain most ex­pected be­ings. Sim­ply ex­trap­o­lat­ing the cur­rent welfare (part 1.1) of hu­mans and farmed and wild an­i­mals, it is un­clear whether we should sup­port spread­ing sen­tient be­ings to other planets.

From a more gen­eral per­spec­tive (part 1.2), fu­ture agents will likely care morally about the same things we find valuable or about any of the things we are neu­tral to­wards. It seems very un­likely that they would see value ex­actly where we see dis­value. If fu­ture agents are pow­er­ful enough to shape the world ac­cord­ing to their prefer­ences, this asym­me­try im­plies the EV of fu­ture agents coloniz­ing space is pos­i­tive from many welfarist per­spec­tives.

If we can defer the de­ci­sion about whether to colonize space to fu­ture agents with more moral and em­piri­cal in­sight, do­ing so cre­ates op­tion value (part 1.3). How­ever, most ex­pected fu­ture dis­value plau­si­bly comes from fu­tures con­trol­led by in­differ­ent or mal­i­cious agents. Such “bad” agents will make worse de­ci­sions than we, cur­rently, could. Thus, the op­tion value in re­duc­ing the risk of hu­man ex­tinc­tion is small.

The uni­verse may not stay empty, even if hu­man­ity goes ex­tinct (part 2.1). A non-hu­man an­i­mal civ­i­liza­tion, ex­trater­res­tri­als or un­con­trol­led ar­tifi­cial in­tel­li­gence that was cre­ated by hu­man­ity might colonize space. Th­ese sce­nar­ios may be worse than (post-)hu­man space coloniza­tion in ex­pec­ta­tion. Ad­di­tion­ally, with more moral or em­piri­cal in­sight, we might re­al­ize that the uni­verse is already filled with be­ings or things we care about (part 2.2). If the uni­verse is already filled with dis­value that fu­ture agents could alle­vi­ate, this gives fur­ther rea­son to re­duce ex­tinc­tion risk.

In prac­tice, many efforts to re­duce the risk of hu­man ex­tinc­tion also have other effects of long-term sig­nifi­cance. Such efforts might of­ten re­duce the risk of global catas­tro­phes (part 3.1) from which hu­man­ity would re­cover, but which might set tech­nolog­i­cal and so­cial progress on a worse track than they are on now. Fur­ther­more, such efforts of­ten pro­mote global co­or­di­na­tion, peace and sta­bil­ity (part 3.2), which is cru­cial for safe de­vel­op­ment of pivotal tech­nolo­gies and to avoid nega­tive tra­jec­tory changes in gen­eral.

Ag­gre­gat­ing these con­sid­er­a­tions, efforts to re­duce ex­tinc­tion risk seem pos­i­tive in ex­pec­ta­tion from most con­se­quen­tial­ist views, rang­ing from neu­tral on some views to ex­tremely pos­i­tive on oth­ers. As efforts to re­duce ex­tinc­tion risk also seem highly lev­er­aged and time-sen­si­tive, they should prob­a­bly hold promi­nent place in the long-ter­mist EA port­fo­lio.

In­tro­duc­tion and background

The fu­ture of Earth-origi­nat­ing life might be vast, last­ing mil­lions of years and con­tain­ing many times more be­ings than cur­rently al­ive (Bostrom, 2003). If fu­ture be­ings mat­ter morally, it should plau­si­bly be a ma­jor moral con­cern that the fu­ture plays out well. So how should we, to­day, pri­ori­tise our efforts aimed at im­prov­ing the fu­ture?

We could try to re­duce the risk of hu­man ex­tinc­tion. A fu­ture with hu­man­ity would be dras­ti­cally differ­ent from one with­out it. Few other fac­tors seems as pivotal for how the world will look like in the mil­lions of years to come as whether or not hu­man­ity sur­vives the next few cen­turies and mil­len­nia. Effec­tive efforts to re­duce the risk of hu­man ex­tinc­tion could thus have im­mense long-term im­pact. If we were sure that this im­pact was pos­i­tive, ex­tinc­tion risk re­duc­tion would plau­si­bly be one of the most effec­tive ways to im­prove the fu­ture.

How­ever, it is not at first glance clear that re­duc­ing ex­tinc­tion risk is pos­i­tive from an im­par­tial al­tru­is­tic per­spec­tive. For ex­am­ple, fu­ture hu­mans might have ter­rible lives that they can’t es­cape from, or hu­mane val­ues might ex­ert lit­tle con­trol over the fu­ture, re­sult­ing in fu­ture agents caus­ing great harm to other be­ings. If in­deed it turned out that we weren’t sure if ex­tinc­tion risk re­duc­tion was pos­i­tive, we would pri­ori­tize other ways to im­prove the fu­ture with­out mak­ing ex­tinc­tion risk re­duc­tion a pri­mary goal.

To in­form this pri­ori­ti­sa­tion, in this ar­ti­cle we es­ti­mate the ex­pected value of efforts to re­duce the risk of hu­man ex­tinc­tion.

Mo­ral assumptions

Through­out this ar­ti­cle, we base our con­sid­er­a­tions on two as­sump­tions:

  1. That it morally mat­ters what hap­pens in the billions of years to come. From this very long-term view, mak­ing sure the fu­ture plays out well is a pri­mary moral con­cern.

  2. That we should aim to satisfy our re­flected moral prefer­ences. Most peo­ple would want to act ac­cord­ing to the prefer­ences they would have upon ideal­ized re­flec­tion, rather than ac­cord­ing to their cur­rent prefer­ences. The pro­cess of ideal­ized re­flec­tion will differ be­tween peo­ple. Some peo­ple might want to re­vise their prefer­ences af­ter they be­came much smarter, more ra­tio­nal and had spent mil­lions of years in philo­soph­i­cal dis­cus­sion. Others might want to largely keep their cur­rent moral in­tu­itions, but learn em­piri­cal facts about the world (e.g. about the na­ture of con­scious­ness).

Most ar­gu­ments fur­ther as­sume that the state the world is brought into by one’s ac­tions is what mat­ters morally (as op­posed to e.g. the ac­tions fol­low­ing a spe­cific rule). We thus take a con­se­quen­tial­ist view, judg­ing po­ten­tial ac­tions by their con­se­quences.

Parts 1.1 and 1.2 fur­ther take a welfarist per­spec­tive, as­sum­ing that what mat­ters morally in states of the world is the welfare of sen­tient be­ings. In a way, that means as­sum­ing our re­flected prefer­ences are welfarist. Welfare will be broadly defined as in­clud­ing plea­sure and pain, but also com­plex val­ues or the satis­fac­tion of prefer­ences. From this per­spec­tive, a state of the world is good if it is good for the in­di­vi­d­u­als in this world. Across sev­eral be­ings, welfare will be ag­gre­gated ad­di­tively[1], no mat­ter how far in the fu­ture an ex­pected be­ing lives. Ad­di­tional be­ings with pos­i­tive (nega­tive) welfare com­ing into ex­is­tence will count as morally good (bad). In short, parts 1.1 and 1.2 take the view of welfarist con­se­quen­tial­ism with a to­tal view on pop­u­la­tion ethics (see e.g. (Greaves, 2017)), but the ar­gu­ments also hold for other similar views.

If we make the as­sump­tions out­lined above, nearly all ex­pected value or dis­value in a fu­ture with hu­man­ity arises from sce­nar­ios in which (post-)hu­mans colonize space. The coloniz­able uni­verse seems very large, so sce­nar­ios with space coloniza­tion likely con­tain a lot more be­ings than sce­nar­ios with earth­bound life only (Bostrom, 2003). Con­di­tional on hu­man sur­vival, space coloniza­tion also does not seem too un­likely, thus nearly all ex­pected fu­ture be­ings live in sce­nar­ios with space coloniza­tion[2]. We thus take “a fu­ture with hu­man­ity” to mean “(post-)hu­man space coloniza­tion” for the main text and briefly dis­cuss what a fu­ture with only earth­bound hu­man­ity might look like in Ap­pendix 1.

Out­line of the article

Ul­ti­mately, we want to know “What is the ex­pected value (EV) of efforts to re­duce the risk of hu­man ex­tinc­tion?”. We will ad­dress this ques­tion in three parts:

  • In part 1, we ask “What is the EV of (post-)hu­man space coloniza­tion[3]?”. We first at­tempt to ex­trap­o­late the EV from the amounts of value and dis­value in to­day’s world and how they would likely de­velop with space coloniza­tion. We then turn to­ward a more gen­eral ex­am­i­na­tion of what fu­ture agents’ tools and prefer­ences might look like and how they will, in ex­pec­ta­tion, shape the fu­ture. Fi­nally, we con­sider if fu­ture agents could make a bet­ter de­ci­sion on whether to colonize space (or not) than we can, so that it seems valuable to let them de­cide (op­tion value).

  • In part 1 we tac­itly as­sumed the uni­verse with­out hu­man­ity is and stays empty. In part 2, we drop that as­sump­tion. We eval­u­ate how the pos­si­bil­ity of space coloniza­tion by al­ter­na­tive agents and the pos­si­bil­ity of ex­ist­ing but tractable dis­value in the uni­verse change the EV of keep­ing hu­mans around.

  • In part 3, we ask “Be­sides re­duc­ing ex­tinc­tion risk, what will be the con­se­quences of our efforts?”. We look at how differ­ent efforts to re­duce ex­tinc­tion risk might in­fluence the long-term fu­ture by re­duc­ing global catas­trophic risk and by pro­mot­ing global co­or­di­na­tion and sta­bil­ity.

We stress that the con­clu­sions of the differ­ent parts should not be sep­a­rated from the con­text. Since we are rea­son­ing about a topic as com­plex and un­cer­tain as the long-term fu­ture, we take sev­eral views, aiming to ul­ti­mately reach a ver­dict by ag­gre­gat­ing across them.

A note on dis­value-focus

The moral view on which this ar­ti­cle is based is very broad and can in­clude enor­mously differ­ent value sys­tems, in par­tic­u­lar differ­ent de­grees of ‘dis­value-fo­cus’. We con­sider a moral view dis­value-fo­cused if it holds the pre­ven­tion/​re­duc­tion of dis­value is (vastly) more im­por­tant than the cre­ation of value. One ex­am­ple are views that hold the pre­ven­tion or re­duc­tion of suffer­ing as an es­pe­cially high moral pri­or­ity.

The de­gree of dis­value fo­cus one takes chiefly in­fluences the EV of re­duc­ing ex­tinc­tion risk.

From very dis­value-fo­cused views, (post-) hu­man space coloniza­tion may not seem de­sir­able even if the fu­ture con­tains a much bet­ter ra­tio of value to dis­value than to­day. There is lit­tle to gain from space coloniza­tion if the cre­ation of value (e.g. happy be­ings) morally mat­ters lit­tle. On the other hand, space coloniza­tion would mul­ti­ply the amount of sen­tient be­ings and thereby mul­ti­ply the ab­solute amount of dis­value.

At first glance it thus seems that re­duc­ing the risk of hu­man ex­tinc­tion is not a good idea from a strongly dis­value-fo­cused per­spec­tive. How­ever, the value of ex­tinc­tion risk re­duc­tion for dis­value-fo­cused views gets shifted up­wards con­sid­er­ably by the ar­gu­ments in part 2 and 3 of this ar­ti­cle.

Part 1: What is the EV of (post-)hu­man space coloniza­tion?[4]

1.1: Ex­trap­o­lat­ing from to­day’s world

Space coloniza­tion is hard. By the time our tech­nol­ogy is ad­vanced enough, hu­man civ­i­liza­tion will pos­si­bly have changed con­sid­er­ably in many ways. How­ever, to get a first grasp of the ex­pected value of the long-term fu­ture, we can model it as a rough ex­trap­o­la­tion of the pre­sent. What if hu­man­ity as we know it colonized space? There would be vastly more sen­tient be­ings, in­clud­ing hu­mans, farmed an­i­mals and wild an­i­mals[5]. To es­ti­mate the ex­pected value of this fu­ture, we will con­sider three ques­tions:

  1. How many hu­mans, farmed an­i­mals and wild an­i­mals will ex­ist?

  2. How should we weigh the welfare of differ­ent be­ings?

  3. For each of hu­mans, farmed an­i­mals and wild an­i­mals:

    1. Is the cur­rent av­er­age welfare net pos­i­tive/​av­er­age life worth liv­ing?

    2. How will welfare de­velop in the fu­ture?

We will then at­tempt to draw a con­clu­sion. Note that through­out this con­sid­er­a­tion, we take an in­di­vi­d­u­al­is­tic welfarist per­spec­tive on wild an­i­mals. This per­spec­tive stands in con­trast to e.g. valu­ing func­tional ecosys­tems and might seem un­usual, but is in­creas­ingly pop­u­lar.

There will likely be more farmed and wild an­i­mals than hu­mans, but the ra­tio will de­crease com­pared to the present

In to­day’s world, both farmed and wild an­i­mals out­num­ber hu­mans by far. There are about 3-4 times more farmed land an­i­mals and about 13 times more farmed fish[6] than hu­mans al­ive. Wild an­i­mals pre­vail over farmed an­i­mals, with about 10 times more wild birds than farmed birds and 100 times more wild mam­mals than farmed mam­mals al­ive at any point. Mov­ing on to smaller wild an­i­mals, the num­bers in­crease again, with 10 000 times more ver­te­brates than hu­mans, and be­tween 100 000 000 − 10 000 000 000 times more in­sects and spi­ders than hu­mans[7].

In the fu­ture, the rel­a­tive num­ber of an­i­mals com­pared to hu­mans will likely de­crease con­sid­er­ably.

Farmed an­i­mals will not be al­ive if an­i­mal farm­ing sub­stan­tially de­creases or stops, which seems more likely than not for both for moral and eco­nom­i­cal rea­sons. Hu­man­ity’s moral cir­cle seems to have been ex­pand­ing through­out his­tory (Singer, 2011) and fur­ther ex­pan­sion to an­i­mals may well lead us to stop farm­ing an­i­mals.[8] Also fi­nan­cially, plant-based meat al­ter­na­tives or lab-grown meat will likely de­velop to be more effi­cient than grow­ing an­i­mals (Tuomisto and Teix­eira de Mat­tos, 2011). How­ever, none of these de­vel­op­ments seems un­equiv­o­cally des­tined to end fac­tory-farm­ing[9], and the his­tor­i­cal track record shows that meat con­sump­tion per head has been grow­ing for > 50 years[10]. Over­all, it seems likely but not ab­solutely clear that the num­ber of farmed an­i­mals rel­a­tive to hu­mans will be smaller in the fu­ture. For wild an­i­mals, we can ex­trap­o­late from a his­tor­i­cal trend of de­creas­ing wild an­i­mal pop­u­la­tions. Even if wild an­i­mals were spread to other planets for ter­raform­ing, the an­i­mal /​ hu­man ra­tio would likely be lower than to­day.

Welfare of differ­ent be­ings can be weighted by (ex­pected) consciousness

To de­ter­mine the EV of the fu­ture, we need to ag­gre­gate welfare across differ­ent be­ings. It seems like we should weigh the ex­pe­rience of a hu­man, a cow and a bee­tle differ­ently when adding up, but by how much? This is a hard ques­tion with no clear an­swer, but we out­line some ap­proaches here. The de­gree to which an an­i­mal is con­scious (“the lights are on”, the be­ing is aware of its ex­pe­riences, emo­tions and thoughts), or the con­fi­dence we have in an an­i­mal be­ing con­scious, can serve as a pa­ram­e­ter by which to weight welfare. To ar­rive at a num­ber for this pa­ram­e­ter, we can use prox­ies such as brain mass, neu­ron count and men­tal abil­ities di­rectly. Alter­na­tively, we may ag­gre­gate these prox­ies with other con­sid­er­a­tions into an es­ti­mate of con­fi­dence that a be­ing is con­scious. For in­stance, the Open Philan­thropy Pro­ject es­ti­mates the prob­a­bil­ity that cows are con­scious at 80%.

The EV of (post-)hu­man lives is likely positive

Cur­rently, the av­er­age hu­man life seems to be per­ceived as be­ing worth liv­ing. Sur­vey data and ex­pe­rience sam­pling sug­gests that most hu­mans are quite con­tent with their lives and ex­pe­rience more pos­i­tive than nega­tive emo­tions on a day-to-day ba­sis[11]. If they find it not worth liv­ing, hu­mans can take their life, but rel­a­tively few peo­ple com­mit suicide (Suicide ac­counts for 1.7 % of all deaths in US).[12] We could con­clude that hu­man welfare is pos­i­tive.

We should, how­ever, note the two caveats in this con­clu­sion. First, a live can be per­ceived as worth liv­ing even if it is nega­tive from a welfarist per­spec­tive.[13] Se­cond, the av­er­age life might not be worth liv­ing if the suffer­ing of the worst off was suffi­ciently more in­tense than the hap­piness of the ma­jor­ity of peo­ple.

Over­all, it seems that from a large ma­jor­ity of con­se­quen­tial­ist views, the cur­rent ag­gre­gated hu­man welfare is pos­i­tive.

In the fu­ture, we will prob­a­bly make progress that will im­prove the av­er­age hu­man life. His­toric trends have been pos­i­tive across many in­di­ca­tors of hu­man well-be­ing, knowl­edge, in­tel­li­gence and ca­pa­bil­ity. On a global scale, vi­o­lence is de­clin­ing, co­op­er­a­tion in­creas­ing (Pinker, 2011). Yet, the trend does not in­clude all in­di­ca­tors: sub­jec­tive welfare has (in re­cent times) re­mained sta­ble or im­proved very lit­tle, and men­tal health prob­lems are more preva­lent. Th­ese de­vel­op­ments have sparked re­search into pos­i­tive psy­chol­ogy and men­tal health treat­ment, which is slowly bear­ing fruit. As more fun­da­men­tal is­sues are grad­u­ally im­proved, hu­man­ity will likely shift more re­sources to­wards ac­tively im­prov­ing welfare and men­tal health. Pow­er­ful tools like ge­netic de­sign and vir­tual re­al­ity could be used to fur­ther im­prove the lives of the broad ma­jor­ity as well as the worst-off. While there are good rea­sons to as­sume that hu­man welfare in the fu­ture will be more pos­i­tive than now, we still face un­cer­tain­ties (e.g. from low prob­a­bil­ity events like mal­i­cious, but very pow­er­ful au­to­cratic regimes and un­known un­knowns).

EV of farmed an­i­mals’ lives is prob­a­bly negative

Cur­rently, 93% of farmed an­i­mals live on fac­tory farms in con­di­tions that likely make their lives not worth liv­ing. Although there are pos­i­tive sides to an­i­mal life on farms com­pared to life in the wild[14], these are likely out­weighed by nega­tive ex­pe­riences[15]. Most farmed an­i­mals also lack op­por­tu­ni­ties to ex­hibit nat­u­rally de­sired be­havi­ours like groom­ing. While there is clearly room for im­prove­ment in fac­tory farm­ing con­di­tions, the ques­tion “is the av­er­age life worth liv­ing?” must be an­swered sep­a­rately for each situ­a­tion and re­mains con­tro­ver­sial[16]. On av­er­age, a fac­tory farm an­i­mal life to­day prob­a­bly has nega­tive welfare.

In the fu­ture, fac­tory farm­ing is likely to be abol­ished or mod­ified to im­prove an­i­mal welfare as our moral cir­cle ex­pands to an­i­mals (see above). We can thus be mod­er­ately op­ti­mistic that farm an­i­mal welfare will im­prove and/​or less farm an­i­mals will be al­ive.

The EV of wild an­i­mals’ lives is very un­clear, but po­ten­tially negative

Cur­rently, we know too lit­tle about the lives and per­cep­tion of wild an­i­mals to judge whether their av­er­age welfare is pos­i­tive or nega­tive. We see ev­i­dence of both pos­i­tive[17] and nega­tive[18] ex­pe­riences. Mean­while, our per­spec­tive on wild an­i­mals might be skewed to­wards charis­matic big mam­mals liv­ing rel­a­tively good lives. We thus over­look the vast ma­jor­ity of wild an­i­mals, based both on bio­mass and neu­ral count. Most smaller wild an­i­mal species (in­ver­te­brates, in­sects etc) are r-se­lected, with most in­di­vi­d­u­als liv­ing very short lives be­fore dy­ing painfully. While vast num­bers of those lives seem nega­tive from a welfarist per­spec­tive, we may chose to weight them less based on the con­sid­er­a­tions out­lined above. In sum­mary, most welfarist views would prob­a­bly judge the ag­gre­gated welfare of wild an­i­mals as nega­tive. The more one thinks that smaller, r-se­lected an­i­mals mat­ter morally, the more nega­tive av­er­age wild an­i­mal welfare be­comes.

In fu­ture, we may re­duce the suffer­ing of wild an­i­mals, but it is un­clear whether their welfare would be pos­i­tive. Fu­ture hu­mans may be driven by the ex­pan­sion of the moral cir­cle and em­pow­ered by tech­nolog­i­cal progress (e.g. biotech­nol­ogy) to im­prove wild an­i­mal lives. How­ever, if av­er­age wild an­i­mal welfare re­mains nega­tive, it would still be bad to in­crease wild an­i­mal num­bers by space coloniza­tion.


It re­mains un­clear whether the EV of a fu­ture in which a hu­man civ­i­liza­tion similar to the one we know colonized space is pos­i­tive or nega­tive.

To quan­tify the above con­sid­er­a­tions from a welfarist per­spec­tive, we cre­ated a math­e­mat­i­cal model. This model yields a pos­i­tive EV for a fu­ture with space coloniza­tion if differ­ent be­ings are weighted by neu­ron count and a nega­tive EV if they are weighted by sqrt(neu­ron count). In the first case, av­er­age welfare is pos­i­tive, driven by the spread­ing of happy (post-)hu­mans. In the sec­ond case, av­er­age welfare is nega­tive as suffer­ing wild an­i­mals are spread. The model is also based on a se­ries of low-con­fi­dence as­sump­tions[19], al­ter­a­tion of which could flip the sign of the out­come again.

More qual­i­ta­tively, the EV of an ex­trap­o­lated fu­ture heav­ily de­pends on one’s moral views. The de­gree to which one is fo­cused on avoid­ing dis­value seems es­pe­cially im­por­tant. Con­sider that ev­ery day, hu­mans and an­i­mals are be­ing tor­tured, mur­dered, or in psy­cholog­i­cal de­spair. Those who would walk away from Ome­las might also walk away from cur­rent and ex­trap­o­lated fu­ture wor­lds.

Fi­nally, we should note how lit­tle we know about the world and how this im­pacts our con­fi­dence in con­sid­er­a­tions about an ex­trap­o­lated fu­ture. To illus­trate the ex­tent of our em­piri­cal un­cer­tainty, con­sider that we are ex­trap­o­lat­ing from 100 000 years of hu­man ex­is­tence, 10 000 years of civ­i­liza­tional his­tory and 200 years of in­dus­trial his­tory to po­ten­tially 500 mil­lion years on earth (and much longer in the rest of the uni­verse). If peo­ple in the past had guessed about the EV of the fu­ture in a similar man­ner, they would most likely have got­ten it wrong (e.g. they might not have con­sid­ered moral rele­vance of an­i­mals, or not have known that there is a uni­verse to po­ten­tially colonize). We might be miss­ing cru­cial con­sid­er­a­tions now in analo­gous ways.

1.2: Fu­ture agents’ tools and preferences

While part 1.1 ex­trap­o­lates di­rectly from to­day’s world, part 1.2 takes a more ab­stract ap­proach. To es­ti­mate the EV of (post-)hu­man space-coloniza­tion in more broadly ap­pli­ca­ble terms, we con­sider three ques­tions:

  1. Will fu­ture agents have the tools to shape the world ac­cord­ing to their prefer­ences?

  2. Will fu­ture agents’ prefer­ences re­sem­ble our ‘re­flected prefer­ences’ (see ‘Mo­ral as­sump­tions’ sec­tion)?

  3. Can we ex­pect the net welfare of fu­ture agents and pow­er­less be­ings to be pos­i­tive or nega­tive?

We then at­tempt to es­ti­mate the EV of fu­ture agents coloniz­ing space from a welfarist con­se­quen­tial­ist view.

Fu­ture agents will have pow­er­ful tools to shape the world ac­cord­ing to their preferences

Since climb­ing down from the trees, hu­man­ity has changed the world a great deal. We have done this by de­vel­op­ing in­creas­ingly pow­er­ful tools to satisfy our prefer­ences (i.e. prefer­ences to eat, stay healthy and warm, and com­mu­ni­cate with friends (even if they are far away)). As far as hu­mans have al­tru­is­tic prefer­ences, pow­er­ful tools have made act­ing on them less costly. For in­stance, if you see some­one is badly hurt and want to help, you don’t have to carry them home and care for them your­self any­more, you can just call an am­bu­lance. How­ever, pow­er­ful tools have also made it eas­ier to cause harm, ei­ther by satis­fy­ing harm­ful prefer­ences (e.g. weapons of mass de­struc­tion) or as a side-effect of our ac­tions that we are in­differ­ent to. Tech­nolo­gies that en­able fac­tory farm­ing do enor­mous harm to an­i­mals, al­though they were de­vel­oped to satisfy a prefer­ence for eat­ing meat, not for harm­ing an­i­mals[20].

It seems likely that fu­ture agents will have much more pow­er­ful tools than we do to­day. Th­ese tools could be used to make the fu­ture bet­ter or worse. For in­stance, biotech­nol­ogy and ge­netic en­g­ineer­ing could help us cure dis­eases and live longer, but they could also en­force in­equal­ity if treat­ments are too ex­pen­sive for most peo­ple. Ad­vanced AI could make all kinds of ser­vices much cheaper but could also be mi­sused. For more po­tent and com­plex tools, the stakes are even higher. Con­sider the ex­am­ple of tech­nolo­gies that fa­cil­i­tate space coloniza­tion. Th­ese tools could be used to cause the ex­is­tence of many times more happy lives than would be pos­si­ble on Earth, but also to spread suffer­ing.

In sum­mary, fu­ture agents will have the tools to cre­ate enor­mous value (more ex­am­ples here) or dis­value (more ex­am­ples here).[21] It is thus im­por­tant to con­sider the val­ues/​prefer­ences that fu­ture agents might have.

We can ex­pect fu­ture agents to have other-re­gard­ing prefer­ences that we would, af­ter re­flec­tion, find some­what positive

When refer­ring to fu­ture agents’ prefer­ences, we dis­t­in­guish be­tween ‘self-re­gard­ing prefer­ences’, i.e. prefer­ences about states of af­fairs that di­rectly af­fect an agent, and ‘other-re­gard­ing prefer­ences’, i.e. prefer­ences about the world that re­main even if an agent is not di­rectly af­fected (see foot­note[22] for a pre­cise defi­ni­tion). Fu­ture agents’ other-re­gard­ing prefer­ences will be cru­cial for the value of the fu­ture. For ex­am­ple, if the fu­ture con­tains pow­er­less be­ings in ad­di­tion to pow­er­ful agents, the welfare of the former will de­pend to a large de­gree on the other-re­gard­ing prefer­ences of the lat­ter (much more about that later).

We can ex­pect a con­sid­er­able frac­tion of fu­ture agents’ prefer­ences to be other-regarding

Most peo­ple al­ive to­day clearly have (pos­i­tive and nega­tive) other-re­gard­ing prefer­ences, but will this be the case for fu­ture agents? It has been ar­gued that over time, other-re­gard­ing prefer­ences could be stripped away by Dar­wi­nian se­lec­tion. We ex­plore this ar­gu­ment and sev­eral coun­ter­ar­gu­ments in ap­pendix 2. We con­clude that fu­ture agents will, in ex­pec­ta­tion, have a con­sid­er­able frac­tion of other-re­gard­ing prefer­ences.

Fu­ture agents’ prefer­ences will in ex­pec­ta­tion be par­allel rather than anti-par­allel to our re­flected preferences

We want to es­ti­mate the EV of a fu­ture shaped by pow­er­ful tools ac­cord­ing to fu­ture agents’ other-re­gard­ing prefer­ences. In this ar­ti­cle we as­sume that we should ul­ti­mately aim to satisfy our re­flected moral prefer­ences, the prefer­ences we would have af­ter an ideal­ized re­flec­tion pro­cess (as dis­cussed in the “Mo­ral as­sump­tions” sec­tion above). Thus, we must es­tab­lish how fu­ture agents’ other-re­gard­ing prefer­ences (FAP) com­pare to our re­flected other-re­gard­ing prefer­ences (RP). Briefly put, we need to ask: “would we want the same things as these fu­ture agents who will shape the world?”

FAP can be some­where on a spec­trum from par­allel to or­thog­o­nal to anti-par­allel to RP. If FAP and RP are par­allel, fu­ture agents agree ex­actly with our re­flected prefer­ences. If the are anti-par­allel, fu­ture agents see value ex­actly where we see dis­value. And if the are or­thog­o­nal, fu­ture agents value what we re­gard as neu­tral, and vice versa. We now ex­am­ine how FAP will be dis­tributed on this spec­trum.

As­sume that fu­ture agents care about moral re­flec­tion. They will then have bet­ter con­di­tions for an ideal­ized re­flec­tion pro­cess than we have, for sev­eral rea­sons:

  • Fu­ture agents will prob­a­bly be more in­tel­li­gent and ra­tio­nal[23]

  • Em­piri­cal ad­vances will help in­form moral in­tu­itions (e.g. ex­pe­rience ma­chines might al­low agents to get a bet­ter idea of other be­ings’ ex­pe­riences)

  • Philos­o­phy will ad­vance further

  • Fu­ture agents will have more time and re­sources to deliberate

Given these pre­req­ui­sites, it seems that fu­ture agents’ moral re­flec­tion would in ex­pec­ta­tion lead to FAP that are par­allel rather than anti-par­allel to RP. How much over­lap be­tween FAP and RP to ex­pect re­mains difficult to es­ti­mate.[24]

How­ever, sce­nar­ios in which fu­ture agents do not care about moral re­flec­tion might sub­stan­tially in­fluence the EV of the fu­ture. For ex­am­ple, it might be likely that hu­man­ity loses con­trol and the agents shap­ing the fu­ture bear no re­sem­blance to hu­mans. This could be the case if de­vel­op­ing con­trol­led ar­tifi­cial gen­eral in­tel­li­gence (AGI) is very hard, and the prob­a­bil­ity that mis­al­igned AGI will be de­vel­oped is high (in this case, the fu­ture agent is a mis­al­igned AI).[25]

Even if (post-)hu­mans re­main in con­trol, hu­man moral in­tu­itions might turn out to be con­tin­gent the start­ing con­di­tions of the re­flec­tion pro­cess and not very con­ver­gent across the species. Thus, FAP may not de­velop into any clear di­rec­tion, but rather drift ran­domly[26]. Very strong and fast goal drift might be pos­si­ble if fu­ture agents in­clude digi­tal (hu­man) minds be­cause such minds would not be re­strained by the cul­tural uni­ver­sals rooted in the phys­i­cal brain ar­chi­tec­ture.

If it turns out that FAP de­velop differ­ently from RP, FAP will in ex­pec­ta­tion be or­thog­o­nal to RP rather than anti-par­allel. The space of pos­si­ble prefer­ences is vast, so it seems much more likely that FAP will be com­pletely differ­ent from RP, rather than ex­actly op­po­site[27] (See foot­note[28] for an ex­am­ple). In sum­mary, FAP par­allel or or­thog­o­nal to RP both seem likely, but a large frac­tion of FAP be­ing anti-par­allel to RP seems fairly un­likely. This main claim seems true for most “ideal­ized re­flec­tion pro­cesses” that peo­ple would choose.

How­ever, FAP be­ing be­tween par­allel and or­thog­o­nal to RP in ex­pec­ta­tion does not nec­es­sar­ily im­ply the fu­ture will be good. Ac­tions driven by (or­thog­o­nal) FAP could have very harm­ful side-effects, as judged by our re­flected prefer­ences. Harm­ful side-effects could be dev­as­tat­ing es­pe­cially if fu­ture agents are in­differ­ent to­wards be­ings we (would on re­flec­tion) care about morally. Such nega­tive side-effects might out­weigh pos­i­tive in­tended effects, as has hap­pened in the past[29]. In­deed, some of the most dis­cussed “risks of as­tro­nom­i­cal fu­ture suffer­ing” are ex­am­ples of nega­tive side-effects.[30]

Fu­ture agents’ tools and prefer­ences will in ex­pec­ta­tion shape a world with prob­a­bly net pos­i­tive welfare

Above we ar­gued that we can ex­pect some over­lap be­tween fu­ture agents’ other-re­gard­ing prefer­ences (FAP) and our re­flected other-re­gard­ing prefer­ences (RP). We can thus be some­what op­ti­mistic about the fu­ture in a very gen­eral way, in­de­pen­dent of our first-or­der moral views, if we ul­ti­mately aim to satisfy our re­flected prefer­ences. In the fol­low­ing sec­tion, we will drop some of that gen­er­al­ity. We will ex­am­ine what fu­ture agents’ prefer­ences will im­ply for the welfare of fu­ture be­ings. In do­ing so, we as­sume that we would on re­flec­tion hold an ag­grega­tive, welfarist al­tru­is­tic view (as ex­plained in the back­ground-sec­tion).

If we as­sume these spe­cific RP, can we still ex­pect FAP to over­lap with them? After all, other-re­gard­ing prefer­ences anti-par­allel to welfarist al­tru­ism – such as sadis­tic, hate­ful, re­venge­ful prefer­ences—clearly ex­ist within pre­sent day hu­man­ity. If cur­rent hu­man val­ues trans­ferred broadly into the fu­ture, should we then ex­pect a large frac­tion of FAP be­ing anti-par­allel to welfarist al­tru­ism? Prob­a­bly not. We ar­gue in ap­pendix 3 that al­though this is hard to quan­tify, the large ma­jor­ity of hu­man other-re­gard­ing prefer­ences seem pos­i­tive.

As­sum­ing some­what welfarist FAP, we ex­plore what the fu­ture might be like for two types of be­ings: Fu­ture agents (post-hu­mans) who have pow­er­ful tools to shape the world, and pow­er­less fu­ture be­ings. To ag­gre­gate welfare for moral eval­u­a­tion, we need to es­ti­mate how many be­ings of each type will ex­ist. Pow­er­ful agents will likely be able to cre­ate pow­er­less be­ings as “tools” if this seems use­ful for them. Sen­tient “tools” could in­clude an­i­mals, farmed for meat pro­duc­tion or spread to other planets for ter­raform­ing (e.g. in­sects), but also digi­tal sen­tient minds, like sen­tient robots for task perfor­mance or simu­lated minds cre­ated for sci­en­tific ex­per­i­men­ta­tion or en­ter­tain­ment. The last ex­am­ple seems es­pe­cially rele­vant, as digi­tal minds could be cre­ated in vast amounts if digi­tal sen­tience is pos­si­ble at all, which does not seem un­likely. If we find we morally care about these “tools” upon re­flec­tion, the fu­ture would con­tain many times more pow­er­less be­ings than pow­er­ful agents.

The EV of the fu­ture thus de­pends on the welfare of both pow­er­ful agents and pow­er­less be­ings, with the lat­ter po­ten­tially much more rele­vant than the former. We now con­sider each in turn, ask­ing:

  • How will their ex­pected welfare be af­fected by in­tended effects and side-effects of fu­ture agents’ ac­tions?

  • How to eval­u­ate this morally?

The ag­gre­gated welfare of pow­er­ful fu­ture agents is in ex­pec­ta­tion positive

Fu­ture agents will have pow­er­ful tools to satisfy their self-re­gard­ing prefer­ences and be some­what benev­olent to­wards each other. Thus, we can ex­pect fu­ture agents’ welfare to be in­creased through in­tended effects of their ac­tions.

Side-effects of fu­ture agents’ ac­tions nega­tive for other agents’ welfare would mainly arise if their civ­i­liza­tion is not co­or­di­nated well. How­ever, com­pro­mise and co­op­er­a­tion seem to usu­ally benefit all in­volved par­ties, in­di­cat­ing that we can ex­pect fu­ture agents to de­velop good tools for co­or­di­na­tion and use them a lot.[31] Co­or­di­na­tion also seems es­sen­tial to avert many ex­tinc­tion risks. Thus, a civ­i­liza­tion that avoided ex­tinc­tion so suc­cess­fully that it colonizes space is ex­pected to be quite co­or­di­nated.

Taken to­gether, vastly more re­sources will likely be used in ways that im­prove the welfare of pow­er­ful agents than in ways that diminish their welfare. From the big ma­jor­ity of welfarist views, fu­ture agents’ ag­gre­gated welfare is thus ex­pected to be pos­i­tive. This con­clu­sion is also sup­ported by hu­man his­tory, as im­proved tools, co­op­er­a­tion and al­tru­ism have in­creased the welfare of most hu­mans and av­er­age hu­man lives are seen as worth liv­ing by many (see part 1.1).

The ag­gre­gated welfare of pow­er­less fu­ture be­ings may in ex­pec­ta­tion be positive

As­sum­ing that fu­ture agents are mostly in­differ­ent to­wards the welfare of their “tools”, their ac­tions would af­fect pow­er­less be­ings only via (in ex­pec­ta­tion ran­dom) side-effects. It is thus rele­vant to know the “de­fault” level of welfare of pow­er­less be­ings. If the af­fected pow­er­less be­ings were an­i­mals shaped by evolu­tion, their de­fault welfare might be net nega­tive. This is be­cause evolu­tion­ary pres­sure might re­sult in a pain-plea­sure asym­me­try with suffer­ing be­ing much more in­tense than plea­sure (see foot­note for fur­ther ex­pla­na­tion[32]). Such evolu­tion­ary pres­sure would not ap­ply for de­signed digi­tal sen­tience. Given that our ex­pe­rience with welfare is re­stricted to an­i­mals (incl. hu­mans) shaped by evolu­tion, it is un­clear what the de­fault welfare of digi­tal sen­tients would be. If there is at least some moral con­cern for digi­tal sen­tience, it seems fairly likely that the cre­at­ing agents would pre­fer to give their sen­tient tools net pos­i­tive welfare[33].

If fu­ture agents in­tend to af­fect the welfare of pow­er­less be­ings, they might—be­sides from treat­ing their sen­tient “tools” ac­cord­ingly—cre­ate (dis-)value op­ti­mized sen­tience: minds that are op­ti­mized for ex­treme pos­i­tive or nega­tive welfare. For ex­am­ple, fu­ture agents could simu­late many minds in bliss, or many minds in agony. The mo­ti­va­tion for cre­at­ing (dis-)value op­ti­mized sen­tience could be al­tru­ism, sadism or strate­gic rea­sons[34]. Creat­ing (dis-)value op­ti­mized sen­tience would likely pro­duce much more (nega­tive) welfare per unit of in­vested re­sources than the side-effects on sen­tient tools men­tioned above, as sen­tient tools are op­ti­mized for task perfor­mance, not pro­duc­tion of (dis-)value[35]. (Dis-)value op­ti­mized sen­tience would then be the main de­ter­mi­nant of the ex­pected value of post-hu­man space coloniza­tion, and not side-effects on sen­tient tools.

FAP may be or­thog­o­nal to welfarist al­tru­ism, in which case lit­tle (dis-)value op­ti­mized sen­tience will be pro­duced. How­ever, we ex­pect a much larger frac­tion of FAP to be par­allel to welfarist al­tru­ism than anti-par­allel to it, and thus ex­pect that fu­ture agents will use many more re­sources to cre­ate value-op­ti­mized sen­tience than dis­value-op­ti­mized sen­tience. The pos­si­bil­ity of (dis-)value op­ti­mized sen­tience should in­crease the net ex­pected welfare of pow­er­less fu­ture be­ings. How­ever, there is con­sid­er­able un­cer­tainty about the moral im­pli­ca­tions of one re­source-unit spent op­ti­mized for value or dis­value (see e.g. here and here). On the one hand, (dis)value op­ti­mized sen­tience cre­ated with­out evolu­tion­ary pres­sure might be equally effi­cient in pro­duc­ing moral (dis)value, but used a lot more to pro­duce value. On the other hand, dis­value op­ti­mized sen­tience might lead to es­pe­cially in­tense suffer­ing. Many peo­ple in­tu­itively give more moral im­por­tance to the pre­ven­tion of suffer­ing the worse it gets (e.g. pri­ori­tar­i­anism).

In sum­mary, it seems plau­si­ble that a lit­tle con­cern for the welfare of sen­tient tools could go a long way. Even if most fu­ture agents were com­pletely in­differ­ent to­wards sen­tient tools (=ma­jor­ity of FAP or­thog­o­nal to RP), pos­i­tive in­tended effects – cre­ation of value-op­ti­mized sen­tience – could plau­si­bly weigh heav­ier than side-effects.


Mo­rally eval­u­at­ing the fu­ture sce­nar­ios sketched in part 1.2 is hard be­cause we are un­cer­tain. Both em­piri­cally un­cer­tain what the fu­ture will be like and morally un­cer­tain what our in­tu­itions will be like. The key unan­swered ques­tions are

  • How much can we ex­pect the prefer­ences that shape the fu­ture to over­lap with our re­flected prefer­ences?

  • In ab­sence of con­cern for the welfare of sen­tient tools, how good or bad is their de­fault welfare?

  • How will the scales of in­tended effects and side-effects com­pare?

Taken to­gether, we be­lieve that the ar­gu­ments in this sec­tion in­di­cate that the EV of (post)-hu­man space coloniza­tion would only be nega­tive from rel­a­tively strongly dis­value-fo­cused views. From the ma­jor­ity, but not over­whelming ma­jor­ity, of welfarist views the EV of (post)-hu­man space coloniza­tion seems pos­i­tive.[36][37]

In parts 1.1 and 1.2, we di­rectly es­ti­mated the EV of (post-)hu­man space coloniza­tion and found it to be very un­cer­tain. In the re­main­ing parts, we will im­prove our es­ti­mate via other ap­proaches that are less de­pen­dent on spe­cific pre­dic­tions about how (post-)hu­mans will shape the fu­ture.

1.3: Fu­ture agents could later de­cide not to colonize space (op­tion value)

We are of­ten un­cer­tain about what the right thing to do is. If we can defer the de­ci­sion to some­one wiser than our­selves, this is gen­er­ally a good call. We can also defer across time: we can keep our op­tions open for now, and hope our de­scen­dants will be able to make bet­ter de­ci­sions. This op­tion value may give us a rea­son to pre­fer to keep our op­tions open.

For in­stance, our de­scen­dants may be in a bet­ter po­si­tion to judge whether space coloniza­tion would be good or bad. If they can see that space coloniza­tion would be nega­tive, they can re­frain from (fur­ther) coloniz­ing space: They have the op­tion to limit the harm. In con­trast, if hu­man­ity goes ex­tinct, the op­tion of (post)-hu­man space coloniza­tion is for­ever lost. So avoid­ing ex­tinc­tion cre­ates ‘op­tion value’(e.g. Ma­caskill).[38] This spe­cific type of ‘op­tion value’ - from fu­ture agents choos­ing not to colonize space—and not the more gen­eral value of keep­ing op­tions open, is what we will be refer­ring to through­out this sec­tion.[39] This type of op­tion value ex­ist for nearly all moral views, and is very un­likely to be nega­tive.[40] How­ever, as we will dis­cuss in this chap­ter, this value is rather small com­pared to other con­sid­er­a­tions.

A con­sid­er­able frac­tion of fu­tures con­tains op­tion value

Re­duc­ing the risk of hu­man ex­tinc­tion only cre­ates op­tion value if fu­ture agents will make a bet­ter de­ci­sion, by our (re­flected) lights, about whether to colonize space than we could. If they will make worse de­ci­sions than us, we would rather de­cide our­selves.

In or­der for fu­ture agents to make bet­ter de­ci­sions than us and ac­tu­ally act on them, they need to sur­pass us in at least one of the fol­low­ing as­pects:

  • Bet­ter values

  • Bet­ter judge­ment what space coloniza­tion will be like (based on in­creased em­piri­cal un­der­stand­ing and ra­tio­nal­ity)

  • Greater will­ing­ness and abil­ity to make de­ci­sions based on moral val­ues (non-self­ish­ness and co­or­di­na­tion)


Hu­man val­ues change. We are dis­gusted by many of our an­ces­tors’ moral views, and they would find ours equally re­pug­nant. We can even look back on our own moral views and dis­agree. There is no rea­son for these trends to stop ex­actly now: hu­man moral­ity will likely con­tinue to change.

Yet at each stage in the change, we are likely to view our val­ues as ob­vi­ously cor­rect. This en­courages a greater de­gree of moral un­cer­tainty than feels nat­u­ral. We should ex­pect that our moral views would change af­ter ideal­ized re­flec­tion (al­though this also de­pends on which meta-eth­i­cal the­ory is cor­rect and how ideal­ized re­flec­tion works).

We ar­gued in part 1.2 that fu­ture agents’ prefer­ences will in ex­pec­ta­tion have some over­lap with our re­flected prefer­ences. Even if that over­lap is not very high, a high de­gree of moral un­cer­tainty would in­di­cate that we would of­ten pre­fer fu­ture agents’ prefer­ences over our cur­rent, un­re­flected prefer­ences. In a size­able frac­tion of fu­ture sce­nar­ios, fu­ture agents with more time and bet­ter tools to re­flect, can be ex­pected to make bet­ter de­ci­sions than one could to­day.

Em­piri­cal un­der­stand­ing and rationality

We now un­der­stand the world bet­ter than our an­ces­tors, and are able to think more clearly. If those trends con­tinue, fu­ture agents may un­der­stand bet­ter what space coloniza­tion will be like, and so bet­ter un­der­stand how good it will be on a given set of val­ues.

For ex­am­ple, fu­ture agents’ es­ti­mate of the EV of space coloniza­tion will benefit from

  • Bet­ter em­piri­cal un­der­stand­ing of the uni­verse (for in­stance about ques­tions dis­cussed in part 2.2)[41] and bet­ter pre­dic­tions, fuel­led by more sci­en­tific knowl­edge and bet­ter fore­cast­ing techniques

  • In­creased in­tel­li­gence and ra­tio­nal­ity[42], al­low­ing them to more ac­cu­rately de­ter­mine what the best ac­tion is based on their val­ues.

As long as there is some over­lap be­tween their prefer­ences and one’s re­flected prefer­ences, this gives an ad­di­tional rea­son to defer to fu­ture agents’ de­ci­sions (ex­am­ple see foot­note).[43]

Non-self­ish­ness and coordination

We of­ten know what’s right, but don’t fol­low through on it any­way. What is true for diets also ap­plies here:

  • Fu­ture agents would need to ac­tu­ally make the de­ci­sion about space coloniza­tion based on moral rea­son­ing[44]. This might im­ply act­ing against strong eco­nomic in­cen­tives push­ing to­wards space coloniza­tion.

  • Fu­ture agents need to be co­or­di­nated well enough to avoid space coloniza­tion. That might be a challenge in non-sin­gle­ton fu­tures since fu­ture civ­i­liza­tion would need ways to en­sure that not a sin­gle agent starts space coloniza­tion.

It seems likely that fu­ture agents would prob­a­bly sur­pass our cur­rent level of em­piri­cal un­der­stand­ing, ra­tio­nal­ity, and co­or­di­na­tion, and in a con­sid­er­able frac­tion of pos­si­ble fu­tures they might also do bet­ter on val­ues and non-self­ish­ness. How­ever, we should note that to ac­tu­ally not colonize space, they would have to sur­pass a cer­tain thresh­old in all of these fields, which may be quite high. Thus, a lit­tle bit of progress doesn’t help—op­tion value is only cre­ated in defer­ring the de­ci­sion to fu­ture agents if they sur­pass this thresh­old.

Only the rel­a­tive good fu­tures con­tain op­tion value

For any fu­ture sce­nario to con­tain op­tion value, the agents in that fu­ture need to sur­pass us in var­i­ous ways, as out­lined above. This has an im­pli­ca­tion that fur­ther diminishes the rele­vance of the op­tion value ar­gu­ment. Fu­ture agents need to have rel­a­tively good val­ues and be rel­a­tively non-self­ish­ness to de­cide not to colonize space for moral rea­sons. But even if these agents colonized space, they would prob­a­bly do it in a rel­a­tively good man­ner. Most ex­pected fu­ture dis­value plau­si­bly comes from fu­tures con­trol­led by in­differ­ent or mal­i­cious agents (like mis­al­igned AI). Such “bad” agents will make worse de­ci­sions about whether or not to colonize space than we, cur­rently, could, be­cause their prefer­ences are very differ­ent from our (re­flected) prefer­ences. Po­ten­tial space coloniza­tion by in­differ­ent or mal­i­cious agents thus gen­er­ates large amounts of ex­pected fu­ture dis­value, which can­not be alle­vi­ated by op­tion value. Op­tion value doesn’t help in the cases where it is most needed (see foot­note for an ex­plana­tory ex­am­ple)[45]


If fu­ture agents are good enough, there is op­tion value in defer­ring the de­ci­sion whether to colonize space to them. In some not-too-small frac­tion of pos­si­ble fu­tures, agents will fulfill the crite­ria and thus op­tion value adds pos­i­tively to the EV of re­duc­ing ex­tinc­tion risk. How­ever, the fu­tures ac­count­ing for most ex­pected fu­ture dis­value are likely con­trol­led by in­differ­ent or mal­i­cious agents. Such “bad” agents would likely make worse de­ci­sions than we could. A large amount of ex­pected fu­ture dis­value is thus not amend­able from alle­vi­a­tion through op­tion value. Over­all, we think the op­tion value in re­duc­ing the risk of hu­man ex­tinc­tion is prob­a­bly fairly mod­er­ate, but there is a lot of un­cer­tainty and con­tin­gency on one’s spe­cific moral and em­piri­cal views[46]. Model­ling the con­sid­er­a­tions of this sec­tion showed that if the 90% con­fi­dence in­ter­val of value of the fu­ture was from −0.9 to 0.9 (ar­bi­trary value units), op­tion value was 0.07.

Part 2: Ab­sence of (post-)hu­man space coloniza­tion does not im­ply a uni­verse de­void of value or disvalue

Up to now, we have tac­itly as­sumed that the sign of EV of (post)-hu­man space coloniza­tion de­ter­mines whether ex­tinc­tion risk re­duc­tion is worth­while. This only holds if with­out hu­man­ity, the EV of the fu­ture is roughly zero, be­cause the (coloniz­able) uni­verse is and will stay de­void of value or dis­value. We now con­sider two classes of sce­nar­ios in which this is not the case, with im­por­tant im­pli­ca­tions es­pe­cially for peo­ple who think that EV of (post-)hu­man space coloniza­tion is likely nega­tive.

2.1 Whether (post-)hu­mans coloniz­ing space is good or bad, space coloniza­tion by other agents seems worse

If hu­man­ity goes ex­tinct with­out coloniz­ing space, some kind of other be­ings would likely sur­vive on earth[47]. Th­ese be­ings might evolve into a non-hu­man tech­nolog­i­cal civ­i­liza­tion in the hun­dreds of mil­lions of years left on earth and even­tu­ally colonize space. Similarly, ex­trater­res­tri­als (that might already ex­ist or come into ex­is­tence in the fu­ture) might colonize (more of) our cor­ner of the uni­verse, if hu­man­ity does not.

In these cases, we must ask whether we pre­fer (post-)hu­man space coloniza­tion over the al­ter­na­tives. Whether al­ter­na­tive civ­i­liza­tions would be more or less com­pas­sion­ate or co­op­er­a­tive than hu­mans, we can only guess. We may how­ever as­sume that our re­flected prefer­ences de­pend on some as­pects of be­ing hu­man, such as hu­man cul­ture or the biolog­i­cal struc­ture of the hu­man brain[48]. Thus, our re­flected prefer­ences likely over­lap more with a (post-)hu­man civ­i­liza­tion than al­ter­na­tive civ­i­liza­tions. As fu­ture agents will have pow­er­ful tools to shape the world ac­cord­ing to their prefer­ences, we should pre­fer (post-)hu­man space coloniza­tion over space coloniza­tion by an al­ter­na­tive civ­i­liza­tion.

To un­der­stand how we can fac­tor this con­sid­er­a­tion into the over­all EV of a fu­ture with (post-) hu­man space coloniza­tion, con­sider the fol­low­ing ex­am­ple of Ana and Chris. Ana thinks the EV of (post-)hu­man space coloniza­tion is nega­tive. For her, the EV of po­ten­tial al­ter­na­tive space coloniza­tion is thus even more nega­tive. This should cause peo­ple who, like Ana, are pes­simistic about the EV of (post-)hu­man space coloniza­tion (and thus the value of re­duc­ing the risk of hu­man ex­tinc­tion) to up­date to­wards re­duc­ing the risk of hu­man ex­tinc­tion be­cause the al­ter­na­tive is even worse (tech­ni­cal caveat in foot­note[49]).

Chris thinks that the EV of (post-)hu­man space coloniza­tion is pos­i­tive. For him, the EV of po­ten­tial al­ter­na­tive space coloniza­tion could be pos­i­tive or nega­tive. For peo­ple like Chris, who are op­ti­mistic about the EV of (post-)hu­man space coloniza­tion (and thus the value of re­duc­ing the risk of hu­man ex­tinc­tion), the di­rec­tion of up­date is thus less clear. They should up­date to­wards re­duc­ing the risk of hu­man ex­tinc­tion if the po­ten­tial al­ter­na­tive civ­i­liza­tion is bad, or away from it if the po­ten­tial al­ter­na­tive civ­i­liza­tion is merely less good. Taken to­gether, this con­sid­er­a­tion im­plies a stronger up­date for fu­ture pes­simists like Ana than for fu­ture op­ti­mists like Chris. This be­comes clearer in the math­e­mat­i­cal deriva­tion[50] or when con­sid­er­ing an ex­am­ple[51].

It re­mains to es­ti­mate how big the up­date should be. Based on our best guesses about the rele­vant pa­ram­e­ters (Fermi-es­ti­mate see here), it seems like fu­ture pes­simists should con­sid­er­ably shift their judge­ment of the EV of hu­man ex­tinc­tion risk re­duc­tion into the less nega­tive di­rec­tion. Fu­ture op­ti­mists should mod­er­ately shift their judge­ment down­wards. There­fore, if one was pre­vi­ously un­cer­tain with roughly equal cre­dence in fu­ture pes­simism and fu­ture op­ti­mism, one’s es­ti­mate for the EV of hu­man ex­tinc­tion risk re­duc­tion should in­crease.

We should note that this is a very broad con­sid­er­a­tion, with de­tails con­tin­gent on the ac­tual moral views peo­ple hold and spe­cific em­piri­cal con­sid­er­a­tions[52].

A spe­cific case of al­ter­na­tive space coloniza­tion could arise if hu­man­ity gets ex­tin­guished by mis­al­igned AGI. It seems likely that mis­al­igned AI would colonize space. Space coloniza­tion by an AI might in­clude (among other things of value/​dis­value to us) the cre­ation of many digi­tal minds for in­stru­men­tal pur­poses. If the AI is only driven by val­ues or­thog­o­nal to ours, it would likely not care about the welfare of those digi­tal minds. Whether we should ex­pect space coloniza­tion by a hu­man-made, mis­al­igned AI to be morally worse than space coloniza­tion by fu­ture agents with (post-)hu­man val­ues has been dis­cussed ex­ten­sively el­se­where. Briefly, nearly all moral views would most likely rather have hu­man value-in­spired space coloniza­tion than space coloniza­tion by AI with ar­bi­trary val­ues, giv­ing ex­tra rea­son to work on AI al­ign­ment es­pe­cially for fu­ture pes­simists.

2.2 Ex­ist­ing dis­value could be alle­vi­ated by coloniz­ing space

With more em­piri­cal knowl­edge and philo­soph­i­cal re­flec­tion, we may find that the uni­verse is already filled with be­ings/​things that we morally care about. In­stead of just in­creas­ing the num­ber of morally rele­vant things (i.e. earth origi­nat­ing sen­tient be­ings), fu­ture agents might then in­fluence the states of morally rele­vant be­ings/​things already ex­ist­ing in the uni­verse[53]. This topic is highly spec­u­la­tive and we should stress that most of the EV prob­a­bly comes from “un­known un­knowns”, which hu­man­ity might dis­cover dur­ing ideal­ized re­flec­tion. Sim­ply put, we might find some way in which fu­ture agents can make the ex­ist­ing world (a lot) bet­ter if they stick around. To illus­trate this gen­eral con­cept, con­sider the fol­low­ing ideas.

We might find that we morally care about things other than sen­tient be­ings, which could be vastly abun­dant in the uni­verse. For ex­am­ple, we may de­velop moral con­cern for fun­da­men­tal physics, e.g. in the form of panpsy­chi­cism. Another pos­si­bil­ity could arise if the solu­tion to the simu­la­tion ar­gu­ment (Bostrom, 2003) is in­deed that we live in a simu­la­tion, with most things of moral rele­vance po­si­tioned out­side of our simu­la­tion but mod­ifi­able by us in yet un­known ways. It might also turn out that we can in­ter­act with other agents in the (po­ten­tially in­finite) uni­verse or mul­ti­verse by acausal trade or mul­ti­verse-wide co­op­er­a­tion, thereby in­fluenc­ing ex­ist­ing things of moral rele­vance (to us) in their part of the uni­verse/​mul­ti­verse. Th­ese spe­cific ideas may look weird. How­ever, given hu­man­ity’s his­tory of re­al­iz­ing that we care about more/​other things than pre­vi­ously thought[54], it should in prin­ci­ple seem likely that our re­flected prefer­ences in­clude some yet un­known un­knowns.

We ar­gued in part 1.2 that fu­ture agents’ prefer­ences will in ex­pec­ta­tion be par­allel rather than anti-par­allel to our re­flected prefer­ences. If the uni­verse is already filled with things/​be­ings of moral con­cern, we can thus as­sume that fu­ture agents will in ex­pec­ta­tion im­prove the state of these things[55]. This cre­ates ad­di­tional rea­son to re­duce the risk of hu­man ex­tinc­tion: There might be a moral re­spon­si­bil­ity for hu­man­ity to stick around and “im­prove the uni­verse”. This per­spec­tive is es­pe­cially rele­vant for dis­value-fo­cused views. From a (strongly) dis­value-fo­cused view, in­creas­ing the num­bers of con­scious be­ings by space coloniza­tion is nega­tive be­cause it gen­er­ates suffer­ing and dis­value. It might seem that there is lit­tle to gain if space coloniza­tion goes well, but much to lose if it goes wrong. If, how­ever, fu­ture agents could alle­vi­ate ex­ist­ing dis­value, then hu­man­ity’s sur­vival (po­ten­tially in­clud­ing space coloniza­tion) has up­sides that may well be larger than the ex­pected down­sides (Fermi-es­ti­mate see foot­note[56]).[57]

Part 3: Efforts to re­duce ex­tinc­tion risk may also im­prove the future

If we had a but­ton that re­duces hu­man ex­tinc­tion risk, and has no other effect, we would only need the con­sid­er­a­tions in parts 1 and 2 to know whether we should press it. In prac­tice, efforts to re­duce ex­tinc­tion risk of­ten have other morally rele­vant con­se­quences, which we ex­am­ine be­low.

3.1: Efforts to re­duce non-AI ex­tinc­tion risk re­duce global catas­trophic risk[58]

Global catas­tro­phe here refers to a sce­nario of hun­dreds of mil­lions of hu­man deaths and re­sult­ing so­cietal col­lapse. Many po­ten­tial causes of hu­man ex­tinc­tion, like a large scale epi­demic, nu­clear war, or run­away cli­mate change, are far more likely to lead to a global catas­tro­phe than to com­plete ex­tinc­tion. Thus, many efforts to re­duce the risk of hu­man ex­tinc­tion also re­duce global catas­trophic risk. In the fol­low­ing, we ar­gue that this effect adds sub­stan­tially to the EV of efforts to re­duce ex­tinc­tion risk, even from the very-long term per­spec­tive of this ar­ti­cle. This doesn’t hold for efforts to re­duce risks that, like risks from mis­al­igned AGI, are more likely to lead to com­plete ex­tinc­tion than to a global catas­tro­phe.

Apart from be­ing a dra­matic event of im­mense mag­ni­tude for cur­rent gen­er­a­tions, a global catas­tro­phes could severely curb hu­man­ity’s long-term po­ten­tial by desta­bi­liz­ing tech­nolog­i­cal progress and de­railing so­cial progress[59].

Tech­nolog­i­cal progress might be un­co­or­di­nated and in­cau­tious in a world that is poli­ti­cally desta­bi­lized by global catas­tro­phe. For pivotal tech­nolo­gies such as AGI, de­vel­op­ment in an arms race sce­nario (e.g. driven by post-catas­tro­phe re­source scarcity or war) could lead to ad­verse out­comes we can­not cor­rect af­ter­wards.

So­cial progress might like­wise di­vert to­wards op­pos­ing open so­ciety and gen­eral util­i­tar­ian-type val­ues. Can we ex­pect the “new” value sys­tem emerg­ing af­ter a global catas­tro­phe to be ro­bustly worse than our cur­rent value sys­tem? While this is­sue is de­bated[60], Nick Beck­stead gives a strand of ar­gu­ments sug­gest­ing the “new” val­ues would in ex­pec­ta­tion be worse. Com­pared to the rest of hu­man his­tory, we cur­rently seem to be on a un­usu­ally promis­ing tra­jec­tory of so­cial progress. What ex­actly would hap­pen if this pe­riod was in­ter­rupted by a global catas­tro­phe is a difficult ques­tion, and any an­swer will in­volve many judge­ments calls about the con­tin­gency and con­ver­gence of hu­man val­ues. How­ever, as we hardly un­der­stand the driv­ing fac­tors be­hind the cur­rent pe­riod of so­cial progress, we can­not be con­fi­dent it would recom­mence if in­ter­rupted by a global catas­tro­phe. Thus, if one sees the cur­rent tra­jec­tory as broadly pos­i­tive, one should ex­pect this value to be par­tially lost if a global catas­tro­phe oc­curs.

Taken to­gether, re­duc­ing global catas­trophic risk seems to be a valuable effect of efforts to re­duce ex­tinc­tion risk. This as­pect is fairly rele­vant even from a very-long term per­spec­tive be­cause catas­tro­phes are much more likely than ex­tinc­tion. A Fermi-Es­ti­mate sug­gests the long-term im­pact from the pre­ven­tion of global catas­tro­phes is about 50% of the im­pact from avoid­ing ex­tinc­tion events. The po­ten­tial long-term con­se­quences from a global catas­tro­phe in­clude worse val­ues and an in­crease in the like­li­hood of mis­al­igned AI sce­nar­ios. Th­ese con­se­quences seem bad from most moral per­spec­tives, in­clud­ing strongly dis­value-fo­cused ones. Con­sid­er­ing the effects on global catas­trophic risk should sug­gest a sig­nifi­cant up­date in the eval­u­a­tion of the EV of efforts to re­duce ex­tinc­tion risk to­wards more pos­i­tive (or less nega­tive) val­ues.

3.2: Efforts to re­duce ex­tinc­tion risk of­ten pro­mote co­or­di­na­tion, peace and sta­bil­ity, which is broadly good

The shared fu­ture of hu­man­ity is a (trans­gen­er­a­tional) global pub­lic good (Bostrom, 2013), thus so­ciety needs to co­or­di­nate to pre­serve it, e.g. by pro­vid­ing fund­ing and other in­cen­tives. Most ex­tinc­tion risk also arises from tech­nolo­gies that al­low for one agent (in­ten­tion­ally or by mis­take) to start a po­ten­tial ex­tinc­tion event (e.g. re­lease a harm­ful virus or start a nu­clear war). Co­or­di­nated ac­tion and care­ful de­ci­sions are thus needed and in­deed, the broad­est efforts to re­duce ex­tinc­tion risk di­rectly pro­mote global co­or­di­na­tion, peace and sta­bil­ity. More fo­cused efforts of­ten pro­mote “nar­row co­op­er­a­tion” within a spe­cific field (e.g. nu­clear non-pro­lifer­a­tion) or set up pro­cesses (e.g. pathogenic surveillance) that in­crease global sta­bil­ity by re­duc­ing per­ceived lev­els of threat from non-ex­tinc­tion events (e.g. bioter­ror­ist at­tacks).

Taken to­gether, efforts to re­duce ex­tinc­tion risk also pro­mote a more co­or­di­nated, peace­ful and sta­ble global so­ciety. Fu­ture agents in such a so­ciety will prob­a­bly make wiser and more care­ful de­ci­sions, re­duc­ing the risk of un­ex­pected nega­tive tra­jec­tory changes in gen­eral. Safe de­vel­op­ment of AI will speci­fi­cally de­pend on these fac­tors. There­fore, efforts to re­duce ex­tinc­tion risk may also steer the world away from some of the worst non-ex­tinc­tion out­comes, which likely in­volve war, vi­o­lence and arms races.

Note that there may be a trade-off as most tar­geted efforts seem more ne­glected and there­fore promis­ing lev­ers for ex­tinc­tion risk re­duc­tion. How­ever, their effects on global co­or­di­na­tion, peace and sta­bil­ity are less cer­tain and likely smaller than the effects of broad efforts aimed di­rectly at in­creas­ing these fac­tors. Broad efforts to pro­mote global co­or­di­na­tion, peace and sta­bil­ity might be among the most promis­ing ap­proaches to ro­bustly im­prove the fu­ture and re­duce the risk of dystopian out­comes con­di­tional on hu­man sur­vival.


The ex­pected value of efforts to re­duce the risk of hu­man ex­tinc­tion (from non-AI causes) seems ro­bustly positive

So all things con­sid­ered, what is the ex­pected value of efforts to re­duce the risk of hu­man ex­tinc­tion? In the first part, we con­sid­ered what might hap­pen if hu­man ex­tinc­tion is pre­vented for long enough that fu­ture agents, maybe our biolog­i­cal de­scen­dants, digi­tal hu­mans, or (mis­al­igned) AGI cre­ated by hu­mans, colonize space. The EV of (post-)hu­man space coloniza­tion is prob­a­bly pos­i­tive from many welfarist per­spec­tives, but very un­cer­tain. We also ex­am­ined the ‘op­tion value ar­gu­ment’, ac­cord­ing to which we should try to avoid ex­tinc­tion and defer the de­ci­sion to colonize space (or not) to wiser fu­ture agents. We con­cluded that op­tion value, while mostly pos­i­tive, is small and the op­tion value ar­gu­ment hardly con­clu­sive.

In part 2, we ex­plored what the fu­ture uni­verse might look like if hu­mans do go ex­tinct. Vast amounts of value or dis­value might (come to) ex­ist in those sce­nar­ios as well. Some of this (dis-)value could be in­fluenced by fu­ture agents if they sur­vive. This in­sight has lit­tle im­pact for peo­ple who were op­ti­mistic about the fu­ture any­way, but shifts the EV of re­duc­ing ex­tinc­tion risk up­wards for peo­ple who were pre­vi­ously pes­simistic about the fu­ture. In part 3, we ex­tended our con­sid­er­a­tions to ad­di­tional effects of many efforts to re­duce ex­tinc­tion risk, namely re­duc­ing the risk of “mere” global catas­tro­phes and in­creas­ing global co­op­er­a­tion and sta­bil­ity. Th­ese effects gen­er­ate con­sid­er­able ad­di­tional pos­i­tive long-term im­pact. This is be­cause global catas­tro­phes would likely change the di­rec­tion of tech­nolog­i­cal and so­cial progress in a bad way, while global co­op­er­a­tion and sta­bil­ity are pre­req­ui­sites for a pos­i­tive long-term tra­jec­tory.

Some as­pects of moral views make the EV of re­duc­ing ex­tinc­tion risk looks less pos­i­tive than sug­gested above. We will con­sider three such as­pects:

  • From a strongly dis­value-fo­cused view, in­creas­ing the to­tal num­ber of sen­tient be­ings seems nega­tive re­gard­less of the em­piri­cal cir­cum­stances. The EV of (post-) hu­man space coloniza­tion (part 1.1 and 1.2) is thus nega­tive, at least if the uni­verse is cur­rently de­void of value.

  • From a very sta­ble moral view (with low moral un­cer­tainty, thus very lit­tle ex­pected change in prefer­ences upon ideal­ized re­flec­tion), there are no moral in­sights for fu­ture agents to dis­cover and act upon. Fu­ture agents could then only make bet­ter de­ci­sions than us about whether to colonize space through em­piri­cal in­sights. Like­wise, fu­ture agents could only dis­cover op­por­tu­ni­ties to alle­vi­ate as­tro­nom­i­cal dis­value that we cur­rently do not see through em­piri­cal in­sights. Op­tion value (part 1.3) and the effects from po­ten­tially ex­ist­ing dis­value (part 2.2) are re­duced.

  • From a very un­usual moral view (with some of one’s re­flected other-re­gard­ing prefer­ences ex­pected to be anti-par­allel to most of hu­man­ity’s re­flected other-re­gard­ing prefer­ences), fu­ture agents will some­times do the op­po­site of what one would have wanted[61]. This would be true even if fu­ture agents are re­flected and act al­tru­is­ti­cally (ac­cord­ing to a differ­ent con­cep­tion of ‘al­tru­ism’). From that view the fu­ture looks gen­er­ally worse. There is less op­tion value (part 1.3), and if the uni­verse is already filled with be­ings/​things that we morally care about (part 2.2), some­times fu­ture agents might do the wrong thing upon this dis­cov­ery.

To gen­er­ate the (hy­po­thet­i­cal) moral view that is most scep­ti­cal about re­duc­ing ex­tinc­tion risk, we unite all of the three as­pects above. We as­sume a strongly dis­value-fo­cused, very sta­ble and un­usual moral view. Even from this per­spec­tive (in rough or­der of de­scend­ing rele­vance):

  • Efforts to re­duce ex­tinc­tion risk may im­prove the long-term fu­ture by re­duc­ing the risk of global catas­tro­phes and in­creas­ing global co­op­er­a­tion and sta­bil­ity (part 3).

  • There may be some op­por­tu­nity for fu­ture agents to alle­vi­ate ex­ist­ing dis­value (as long as the moral view in ques­tion isn’t com­pletely ‘un­usual’ in all as­pects) (part 2.2)

  • (Post-)hu­mans space coloniza­tion might be prefer­able to space coloniza­tion by non-hu­man an­i­mals or ex­trater­res­tri­als (part 2.1)

  • Small amounts of op­tion value might arise from em­piri­cal in­sights im­prov­ing de­ci­sions (part 1.3).

From this max­i­mally scep­ti­cal view, tar­geted ap­proaches to re­duce the risk of hu­man ex­tinc­tion likely seem some­what un­ex­cit­ing or neu­tral, with high un­cer­tainty (see foot­note[62] for how ad­vo­cates of strongly dis­value-fo­cused views see the EV of efforts to re­duce ex­tinc­tion risk). Re­duc­ing the risk of ex­tinc­tion by mis­al­igned AI prob­a­bly seems pos­i­tive be­cause mis­al­igned AI would also colonize space (see part 2.1).

From views that value the cre­ation of happy be­ings or cre­ation of value more broadly, have con­sid­er­able moral un­cer­tainty, and be­lieve fu­ture re­flected and al­tru­is­tic agents could make good de­ci­sions, the EV of efforts to re­duce ex­tinc­tion risk is likely pos­i­tive and ex­tremely high.

In ag­gre­ga­tion, efforts to re­duce the risk of hu­man ex­tinc­tion seem in ex­pec­ta­tion ro­bustly pos­i­tive from many con­se­quen­tial­ist per­spec­tives.

Efforts to re­duce ex­tinc­tion risk should be a key part of the EA long-ter­mist portfolio

Effec­tive al­tru­ists whose pri­mary moral con­cern is mak­ing sure the fu­ture plays out well will, in prac­tice, need to al­lo­cate their re­sources be­tween differ­ent pos­si­ble efforts. Some of these efforts are op­ti­mized to re­duce ex­tinc­tion risk (e.g. pro­mot­ing biose­cu­rity), oth­ers are op­ti­mized to im­prove the fu­ture con­di­tional on hu­man sur­vival while also re­duc­ing ex­tinc­tion risk (e.g. pro­mot­ing global co­or­di­na­tion or oth­er­wise pre­vent­ing nega­tive tra­jec­tory changes) and some are op­ti­mized to im­prove the fu­ture with­out mak­ing ex­tinc­tion risk re­duc­tion a pri­mary goal (e.g. pro­mot­ing moral cir­cle ex­pan­sion or “worst-case” AI safety re­search).

We have ar­gued above that the EV of efforts to re­duce ex­tinc­tion risk is pos­i­tive, but is it large enough to war­rant in­vest­ment of marginal re­sources? A thor­ough an­swer to this ques­tion re­quires de­tailed ex­am­i­na­tion of the spe­cific efforts in ques­tion and goes be­yond the scope of this ar­ti­cle. We are thus in no po­si­tion to provide a defini­tive an­swer for the com­mu­nity. We will, how­ever, pre­sent two ar­gu­ments that fa­vor in­clud­ing efforts to re­duce ex­tinc­tion risk as a key part in the long-ter­mist EA port­fo­lio. Efforts to re­duce the risks of hu­man ex­tinc­tion are time-sen­si­tive and seem very lev­er­aged. We know of spe­cific risks this cen­tury, we have rea­son­ably good ideas for ways to re­duce them, and if we ac­tu­ally avert an ex­tinc­tion event, this has ro­bust im­pact for mil­lions of years (at least in ex­pec­ta­tion) to come. As a very broad gen­er­al­iza­tion, many efforts op­ti­mized to oth­er­wise im­prove the fu­ture—such as im­prov­ing to­day’s val­ues in the hope that they will prop­a­gate to fu­ture gen­er­a­tions—are less time-sen­si­tive or lev­er­aged. In short, it seems eas­ier to pre­vent an event from hap­pen­ing in this cen­tury than to oth­er­wise ro­bustly in­fluence the fu­ture mil­lions of years down the line.

Key caveats to this ar­gu­ment in­clude that it is not clear how big differ­ences in time-sen­si­tivity and lev­er­age are[63] and that we may still dis­cover highly lev­er­aged ways to “oth­er­wise im­prove the fu­ture”. There­fore, it seems that the EA long-ter­mist port­fo­lio should con­tain all of the efforts de­scribed above, al­low­ing each mem­ber of the com­mu­nity to con­tribute to their com­par­a­tive ad­van­tage. For those hold­ing very dis­value-fo­cused moral views, the more at­trac­tive efforts would plau­si­bly be those op­ti­mized to im­prove the fu­ture with­out mak­ing ex­tinc­tion risk re­duc­tion a pri­mary goal.


We are grate­ful to Brian To­masik, Max Dal­ton, Lukas Gloor, Gre­gory Lewis, Tyler John, Thomas Sit­tler, Alex Nor­man, William MacAskill and Fa­bi­enne Sand­küh­ler for helpful com­ments on the manuscript. Ad­di­tion­ally, we thank Max Daniel, Sören Min­der­mann, Carl Shul­man and Se­bas­tian Sud­er­gaard Sch­midt for dis­cus­sions that helped in­form our views on the mat­ter.

Author con­tri­bu­tions:

Jan con­ceived the ar­ti­cle and the ar­gu­ments pre­sented in it. Fried­er­ike and Jan con­tributed to struc­tur­ing the con­tent and writ­ing.

Ap­pendix 1: What if hu­man­ity stayed earth­bound?

In this ap­pendix, we use the ap­proach of part 1.1 and ap­ply it to a situ­a­tion in which hu­man­ity stays Earth-bound. It is recom­mended to first read part 1.1 be­fore read­ing this ap­pendix.

We think that sce­nar­ios in which hu­man­ity stays Earth-bound are of very limited rele­vance for the EV of the fu­ture for two rea­sons:

  • Even if hu­man­ity stay­ing Earth-bound was the most likely out­come, prob­a­bly only a small frac­tion of ex­pected be­ings live in these sce­nar­ios, so they only con­sti­tute a small frac­tion of ex­pected value or dis­value (as ar­gued in the in­tro­duc­tion).

  • Hu­man­ity stay­ing Earth-bound may not ac­tu­ally be a very likely sce­nario be­cause reach­ing post-hu­man­ity and re­al­iz­ing as­tro­nom­i­cal value might be a de­fault path, con­di­tional on hu­man­ity not go­ing ex­tinct (Bostrom, 2009)

If we as­sume hu­man­ity will stay Earth-bound, it seems that most welfarist views would prob­a­bly favour re­duc­ing ex­tinc­tion risk. If one thinks hu­mans are much more im­por­tant than an­i­mals, it is ob­vi­ous (un­less one com­bined that view with suffer­ing-fo­cused ethics, such as anti­na­tal­ism). If one also cares about an­i­mals, then very plau­si­bly hu­man­ity’s im­pact on wild an­i­mals is more rele­vant than hu­man­ity’s im­pact on farmed an­i­mals, be­cause of the enor­mous num­bers of the former (and es­pe­cially since it seems plau­si­ble that fac­tory farm­ing will not con­tinue in­definitely). So far, hu­man­ity’s main effect on wild an­i­mals has been a per­ma­nent de­crease of pop­u­la­tion size (through habitat de­struc­tion), which is ex­pected to con­tinue as hu­man pop­u­la­tion size grows. Com­pared to that, di­rect in­fluence on wild an­i­mal well-be­ing cur­rently is un­clear and prob­a­bly small (though it is less clear for aquatic life):

  • We kill sig­nifi­cant num­bers of wild an­i­mals, but we don’t know how painful hu­man-caused death com­pared to non-hu­man caused death is

  • Wild an­i­mal gen­er­a­tion times are very short, so the num­ber of an­i­mals af­fected by “never com­ing into ex­is­tence” is prob­a­bly much larger

If one thinks that wild an­i­mals are on net suffer­ing, fu­ture pop­u­la­tion size re­duc­tion seems benefi­cial. If one thinks that wild an­i­mal welfare is net pos­i­tive, then habitat re­duc­tion would be bad. How­ever, there is still unar­guably a lot of suffer­ing in na­ture. Hu­man­ity might even­tu­ally—if we have much more knowl­edge and bet­ter tools, that al­low us to do so at limited costs to our­selves—im­prove wild an­i­mals’ lives (like we already do with e.g. vac­ci­na­tions), so the prospect of that might offset some of the nega­tive value of cur­rent habitat re­duc­tion. Ob­vi­ously, habitat de­struc­tion is nega­tive from a con­ser­va­tion­ist/​en­vi­ron­men­tal­ist per­spec­tive.

Ap­pendix 2: Fu­ture agents will in ex­pec­ta­tion have a con­sid­er­able frac­tion of other-re­gard­ing preferences

Altru­ism in hu­mans likely evolved as a “short­cut” solu­tion to co­or­di­na­tion prob­lems. It was of­ten im­pos­si­ble to fore­cast how much an al­tru­is­tic act would help spread your own genes, but it of­ten would (es­pe­cially in small tribes, where all mem­bers were closely re­lated). Thus, hu­mans for whom al­tru­ism just felt good had a se­lec­tive ad­van­tage.

As agents be­come more ra­tio­nal and long-term plan­ning, a ten­dency to help for purely self­less rea­sons seems less adap­tive. Agents can de­liber­ately co­op­er­ate for strate­gic rea­sons when­ever nec­es­sary and for the ex­actly op­ti­mal amount to op­ti­mize for their own re­pro­duc­tive fit­ness. One might fear that in the long run, only prefer­ences for in­creas­ing one’s own power and in­fluence (and that of one’s de­scen­dants) might re­main un­der Dar­wi­nian se­lec­tion.

But this is not nec­es­sar­ily the case, for two rea­sons:

Dar­wi­nian pro­cesses will se­lect for pa­tience, not “self­ish­ness” (Paul Chris­ti­ano)

Agents rea­son­ing from a long-term per­spec­tive, and the bet­ter the tools to pre­serve val­ues and in­fluence into the fu­ture, may re­duce the need for al­tru­is­tic prefer­ences, but also strongly re­duce se­lec­tion pres­sure for self­ish­ness. In con­trast to short-term plan­ning (overly) al­tru­is­tic agents, long-term plan­ning agents that want to cre­ate value would re­al­ize that amass­ing power is an in­stru­men­tal goal for that, and will try to sur­vive, get re­sources for in­stru­men­tal rea­sons, and co­or­di­nate with oth­ers against unchecked ex­pan­sion of self­ish agents. Thus, fu­ture evolu­tion might se­lect not for self­ish­ness, but for pa­tience or how strongly an agent cares about the long-term. Such long-term prefer­ences should be ex­pected to be more al­tru­is­tic.

Carl Shul­man ad­di­tion­ally makes the point that in a space coloniza­tion sce­nario, agents that want to cre­ate value would only be very slightly dis­ad­van­taged in di­rect com­pe­ti­tion with agents that only care about ex­pand­ing.

Brian To­masik thinks Chris­ti­ano’s ar­gu­ment is valid and al­tru­ism might not be driven to zero in the fu­ture, but is doubt­ful that very-long term al­tru­ist will have strate­gic ad­van­tages over medium-term cor­po­ra­tions and gov­ern­ments and cau­tions against putting too much weight on the­o­ret­i­cal ar­gu­ments: “Hu­man(e) val­ues have only a mild de­gree of con­trol in the pre­sent. So it would be sur­pris­ing if such val­ues had sig­nifi­cantly more con­trol in the far fu­ture.”

Prefer­ences might not even be sub­ject to Dar­wi­nian pro­cesses indefinitely

If the losses from evolu­tion­ary pres­sure in­deed loom large, it seems quite likely that fu­ture gen­er­a­tions would co­or­di­nate against it, e.g. by form­ing a sin­gle­ton (Bostrom, 2006) (which broadly en­com­passes many forms of global co­or­di­na­tion or value/​goal-preser­va­tion). (Of course, there are also fu­ture sce­nar­ios that would strip away all other-re­gard­ing prefer­ences, e.g. in Malthu­sian sce­nar­ios.)

In con­clu­sion, we will end up some­where be­tween no other-re­gard­ing prefer­ences and even more than to­day, with a con­sid­er­able prob­a­bil­ity of fu­ture agents hav­ing a con­sid­er­able frac­tion of other-re­gard­ing prefer­ences.

Ap­pendix 3: What if cur­rent hu­man val­ues trans­ferred broadly into the fu­ture?

Most hu­mans (past and pre­sent) in­tend to do what we now con­sider good (be lov­ing, friendly, al­tru­is­tic) more than they in­tend to harm (be sadis­tic, hate­ful, seek re­venge). Pos­i­tive[64] other-re­gard­ing prefer­ences might be more uni­ver­sal: most peo­ple would, all else equal, pre­fer all hu­man or an­i­mals to be happy, while fewer peo­ple would have such a gen­eral prefer­ence for suffer­ing. This rel­a­tive over­hang of pos­i­tive prefer­ences in hu­man so­ciety is ev­i­dent from rules that ban hurt­ing (some) oth­ers, but not helping oth­ers. Th­ese rules will (if they per­sist) also shape the fu­ture, as they in­crease the costs of do­ing harm.[65]

Through­out hu­man his­tory, there has been a trend away from cru­elty and vi­o­lence.[66] Although hu­mans cause a lot of suffer­ing in the world to­day, this is mostly be­cause peo­ple are in­differ­ent or “lazy”, rather than evil. All in all, it seems fair to say that the sig­nifi­cant ma­jor­ity of hu­man other-re­gard­ing prefer­ences is pos­i­tive, and that most peo­ple would, all else equal, pre­fer more hap­piness and less suffer­ing. How­ever, we ad­mit this is hard to quan­tify.[67]

Refer­ences (only those pub­lished in peer-re­viewed jour­nals, and books):

    Bjørn­skov, C., Boet­tke, P.J., Booth, P., Coyne, C.J., De Vos, M., Ormerod, P., Sacks, D.W., Schwartz, P., Shack­le­ton, J.R., Snow­don, C., 2012. … and the Pur­suit of Hap­piness-Wel­lbe­ing and the Role of Govern­ment.

    Bostrom, N., 2013. Ex­is­ten­tial risk pre­ven­tion as global pri­or­ity. Global Policy 4, 15–31.

    Bostrom, N., 2011. INFINITE ETHICS. Anal­y­sis and Me­ta­physics 9–59.

    Bostrom, N., 2009. The Fu­ture of Hu­man­ity, in: New Waves in Philos­o­phy of Tech­nol­ogy, New Waves in Philos­o­phy. Pal­grave Macmil­lan, Lon­don, pp. 186–215.[ https://​​​​10.1057/​​9780230227279_10](https://​​​​10.1057/​​9780230227279_10)

    Bostrom, N., 2006. What is a sin­gle­ton. Lin­guis­tic and Philo­soph­i­cal In­ves­ti­ga­tions 5, 48–54.

    Bostrom, N., 2004. The fu­ture of hu­man evolu­tion. Death and anti-death: Two hun­dred years af­ter Kant, fifty years af­ter Tur­ing 339–371.

    Bostrom, N., 2003a. Astro­nom­i­cal waste: The op­por­tu­nity cost of de­layed tech­nolog­i­cal de­vel­op­ment. Utili­tas 15, 308–314.

    Bostrom, N., 2003b. Are We Liv­ing in a Com­puter Si­mu­la­tion? The Philo­soph­i­cal Quar­terly 53, 243–255.[ https://​​​​10.1111/​​1467-9213.00309](https://​​​​10.1111/​​1467-9213.00309)

    Greaves, H., 2017. Pop­u­la­tion ax­iol­ogy. Philos­o­phy Com­pass 12, e12442.

    Killingsworth, M.A., Gilbert, D.T., 2010. A wan­der­ing mind is an un­happy mind. Science 330, 932.[ https://​​​​10.1126/​​sci­ence.1192439](https://​​​​10.1126/​​sci­ence.1192439)

    Pinker, S., 2011. The Bet­ter An­gels of our Na­ture. New York, NY: Vik­ing.

    Sagoff, M., 1984. An­i­mal Liber­a­tion and En­vi­ron­men­tal Ethics: Bad Mar­riage, Quick Divorce. Philos­o­phy & Public Policy Quar­terly 4, 6.[ https://​​​​10.13021/​​G8PPPQ.41984.1177](https://​​​​10.13021/​​G8PPPQ.41984.1177)

    Singer, P., 2011. The ex­pand­ing cir­cle: Ethics, evolu­tion, and moral progress. Prince­ton Univer­sity Press.

    Tuomisto, H.L., Teix­eira de Mat­tos, M.J., 2011. En­vi­ron­men­tal Im­pacts of Cul­tured Meat Pro­duc­tion. En­vi­ron. Sci. Tech­nol. 45, 6117–6123.[ https://​​​​10.1021/​​es200130u](https://​​​​10.1021/​​es200130u)


  1. Sim­ply put: two be­ings ex­pe­rienc­ing pos­i­tive (or nega­tive) welfare are morally twice as good (or bad) as one be­ing ex­pe­rienc­ing the same welfare ↩︎

  2. Some con­sid­er­a­tions that might re­duce our cer­tainty that, even given the moral per­spec­tive of this ar­ti­cle, most ex­pected value or dis­value comes from space coloniza­tion:

  3. In this ar­ti­cle, the term ‘(post-)hu­man space coloniza­tion’ is meant to in­clude any form of space coloniza­tion that origi­nates from a hu­man civ­i­liza­tion, in­clud­ing cases in which (biolog­i­cal) hu­mans or hu­man val­ues don’t play a role (e.g. be­cause hu­man­ity lost con­trol over ar­tifi­cial su­per­in­tel­li­gence, which then colonizes space). ↩︎

  4. … as­sum­ing that with­out (post-)hu­man space coloniza­tion, the uni­verse is and stays de­void of value or dis­value, as ex­plained in “Out­line of the ar­ti­cle” ↩︎

  5. We here as­sume that hu­man­ity does not change sub­stan­tially, ex­clud­ing e.g. digi­tal sen­tience from our con­sid­er­a­tions. This may be overly sim­plis­tic, as in­ter­stel­lar travel seems so difficult that a space-far­ing civ­i­liza­tion will likely be ex­tremely differ­ent from us to­day. ↩︎

  6. Around 80 billion farmed fish, which live around one year, are raised and kil­led per year. ↩︎

  7. All es­ti­mates from Brian To­masik ↩︎

  8. There are con­vinc­ing anec­dotes and ex­am­ples for an ex­pand­ing moral cir­cle from fam­ily to na­tion to all hu­mans: The abol­ish­ment of slav­ery; hu­man rights; re­duc­tion in dis­crim­i­na­tion based on gen­der, sex­ual ori­en­ta­tion, race. How­ever, there doesn’t seem to be a lot of hard ev­i­dence. Gw­ern lists a few ex­am­ples of a nar­row­ing moral cir­cle (such as in­fan­ti­cide, tor­ture, other ex­am­ples be­ing less con­vinc­ing). ↩︎

  9. For ex­am­ple:

    • lab-grown meat is very challeng­ing with few peo­ple work­ing on it, lit­tle fund­ing, …

    • Con­sumer adop­tion is far from inevitable

    • Some peo­ple will cer­tainly not want to eat in-vitro meat, so it is un­likely the num­ber of fac­tory-farmed will be abol­ished com­pletely in the medium term, if the cir­cle of em­pa­thy doesn’t in­crease or gov­ern­ments don’t reg­u­late.

  10. There are also con­trary trends. E.g. in Ger­many, meat con­sump­tion per head has been de­creas­ing since 2011, from 62.8 kg in 2011 to 59.2 kg in 2015. In the US, it has been stag­nant for 10 years. ↩︎

  11. For ex­am­ple:

    • Many more peo­ple re­mem­ber feel­ing en­joy­ment or love than pain or de­pres­sion across many coun­tries (Figure 13, here)

    • In nearly ev­ery coun­try, (much) more than 50% of peo­ple re­port feel­ing very happy or rather happy (sec­tion “Eco­nomic growth and hap­piness”, here)

    • Aver­age hap­piness in ex­pe­rience sam­pling in US: 65100 (Killingsworth and Gilbert, 2010)

  12. One could claim that this just shows that peo­ple are afraid of dy­ing or don’t com­mit suicide for other rea­sons, but peo­ple that suffer from de­pres­sion have life­time suicide rates of 2-15%, 10-25 times higher than gen­eral pop­u­la­tion. This at least in­di­cates that suicide rates in­crease if qual­ity of life de­creases. ↩︎

  13. Re­ported well-be­ing: Peo­ple on av­er­age seem to re­port be­ing con­tent with their lives. This is only mod­er­ate ev­i­dence for their lives be­ing pos­i­tive from a welfarist view be­cause peo­ple don’t gen­er­ally think in welfarist terms when eval­u­at­ing their lives and there might be op­ti­mism bias in re­port­ing. Suicide rates: There are many rea­sons why peo­ple with lives not worth liv­ing might re­frain from suicide, for ex­am­ple:

    • pos­si­bil­ity of failing and then be­ing in­sti­tu­tion­al­ized and/​or liv­ing with se­ri­ous disability

    • obli­ga­tions to par­ents, chil­dren, friends

    • fear of hell

  14. For ex­am­ple:

    • always enough food and wa­ter (with some ex­cep­tions)

    • Do­mes­ti­cated an­i­mals have been bred for a long time and now in gen­eral have lower basal stress lev­els and stress re­ac­tions than wild an­i­mals (be­cause they don’t need them)

  15. For ex­am­ple:

    • harm­ful breed­ing (e.g. broiler chicken are po­ten­tially in pain dur­ing the last 2 weeks of their life, be­cause their joints can­not sus­tain their weight)

    • There is no in­cen­tive to satisfy the emo­tional and so­cial needs of farmed an­i­mals. It is quite likely that e.g. pigs can’t ex­hibit their nat­u­ral be­hav­ior (e.g. ges­ta­tion crates). Pigs, hens, veal cat­tle are of­ten kept in ways that they can’t move (or only very lit­tle) for weeks.

    • stress (in­tense con­fine­ment, chicken and pigs show self-mu­tilat­ing be­hav­ior)

    • ex­treme suffer­ing (some per­centage of farmed an­i­mals suffer­ing to death or ex­pe­rienc­ing in­tense pain dur­ing slaugh­ter)

  16. The book Com­pas­sion by the pound, for ex­am­ple, rates the welfare of caged lay­ing hens and pigs as nega­tive, but beef cat­tle, dairy cows, free range lay­ing hens and broiler chick­ens (mar­ket an­i­mals) as pos­i­tive. Other ex­perts dis­agree, es­pe­cially on broiler chick­ens hav­ing lives worth liv­ing. ↩︎

  17. Abil­ity to ex­press nat­u­ral be­havi­our, such as sex, eat­ing, so­cial be­hav­ior, etc. ↩︎

  18. Often painful deaths, dis­ease, par­a­sitism, pre­da­tion, star­va­tion, etc. In gen­eral, there is dan­ger of an­thro­po­mor­phism. Of course I would be cold in Antarc­tica, but a po­lar bear wouldn’t. ↩︎

  19. Speci­fi­cally: moral weight for in­sects, prob­a­bil­ity that hu­man­ity will even­tu­ally im­prove wild an­i­mal welfare, fu­ture pop­u­la­tion size mul­ti­plier (in­sect rel­a­tive to hu­mans) and hu­man and in­sect welfare. ↩︎

  20. If any­thing, at­ti­tudes to­wards an­i­mals have ar­guably be­come more em­pa­thetic. The ma­jor­ity of peo­ple around the globe ex­press con­cern for farm an­i­mal well-be­ing. (How­ever, there is limited data, sev­eral con­founders, and re­sults from in­di­rect ques­tion­ing in­di­cate that the ac­tual con­cern for farmed an­i­mals might be much lower). See e.g.: http://​​­​​comm­frontoffice/​​pub­li­copinion/​​archives/​​ebs/​​ebs_270_en.pdf https://​​www.hori­zon­​​at­tach­ments/​​docs/​​hori­zon-re­search-fac­tory-farm­ing-sur­vey-re­port.pdf http://​​www.tand­fon­​​doi/​​abs/​​10.2752/​​175303713X13636846944367 https://​​​​pmc/​​ar­ti­cles/​​PMC4196765/​​ But also: https://​​​​ar­ti­cle/​​10.1007/​​s11205-009-9492-z ↩︎

  21. Fu­ture tech­nol­ogy, in com­bi­na­tion with unchecked evolu­tion­ary pres­sure, might also lead to fu­tures that con­tain very lit­tle of what we would value upon re­flec­tion (Bostrom, 2004). ↩︎

  22. Self-re­gard­ing prefer­ences are prefer­ences that de­pend on the ex­pected effect of the preferred state of af­fairs on the agent. Th­ese are not syn­ony­mous with purely “self­ish prefer­ences”. Act­ing ac­cord­ing to self-re­gard­ing prefer­ences can lead to acts that benefit oth­ers, such as in trade.

    Other-re­gard­ing prefer­ences are prefer­ences that don’t de­pend on the ex­pected effect of the preferred state of af­fairs on the agent. Other-re­gard­ing prefer­ences can lead to acts that also benefit the ac­tor. E.g. par­ents are happy if they know their chil­dren are happy. How­ever, the par­ents would also want their chil­dren to be happy if they wouldn’t come to know about it. As defined here, other-re­gard­ing prefer­ences are not nec­es­sar­ily pos­i­tive for oth­ers. They can be nega­tive (e.g. sadis­tic/​hate­ful prefer­ences) or neu­tral (e.g. aes­thetic prefer­ences).

    Ex­am­ple of two par­ties at war:

    • Self-re­gard­ing prefer­ence: Mem­bers of the one party want mem­bers of the other party to die, so they can win the war and con­quer the other party’s re­sources.

    • Other-re­gard­ing prefer­ence: Mem­bers of the one party want mem­bers of the other party to die, be­cause they de­vel­oped in­tense hate against them. Even if they don’t get any ad­van­tage from it, they would still want the en­emy to suffer.

  23. In­di­vi­d­ual hu­mans as well as hu­man so­ciety have be­come more in­tel­li­gent over time. See: his­tory of ed­u­ca­tion, sci­en­tific rev­olu­tion, Flynn effect, in­for­ma­tion tech­nol­ogy. Ge­netic en­g­ineer­ing or ar­tifi­cial in­tel­li­gence may fur­ther in­crease our in­di­vi­d­ual and col­lec­tive cog­ni­tion. ↩︎

  24. Even if FAP and RP don’t have a lot of over­lap, there might be ad­di­tional rea­sons to defer to the val­ues of fu­ture gen­er­a­tions. Paul Chris­ti­ano ad­vo­cates one should sym­pa­thize with fu­ture agents’ val­ues, if they are re­flected, for strate­gic co­op­er­a­tive rea­sons, and for a will­ing­ness to dis­card idiosyn­cratic judge­ments. ↩︎

  25. Even if earth-origi­nat­ing AI is ini­tially con­trol­led, this might not guaran­tee con­trol over the fu­ture: Goal preser­va­tion might be costly, if there are trade-offs be­tween learn­ing and goal preser­va­tion dur­ing self-im­prove­ment, es­pe­cially in mul­ti­po­lar sce­nar­ios. ↩︎

  26. How mean­ingful moral re­flec­tion is, and whether we should ex­pect hu­man val­ues to con­verge upon re­flec­tion, also de­pends on un­solved ques­tions in meta-ethics. ↩︎

  27. Of course, or­thog­o­nal other-re­gard­ing prefer­ences can some­times still lead to anti-par­allel ac­tions. Take as an ex­am­ple the de­bate of con­ser­va­tion­ism vs. wild an­i­mal suffer­ing. Both par­ties have other-re­gard­ing prefer­ences over wild an­i­mals. Con­ser­va­tion­ist don’t have a prefer­ences for wild an­i­mal suffer­ing, just for con­serv­ing eco-sys­tems. Wild an­i­mal suffer­ing ad­vo­cates don’t have a prefer­ence against con­serv­ing eco-sys­tems (per se), just against wild an­i­mal suffer­ing. In prac­tice, these or­thog­o­nal views likely recom­mend differ­ent ac­tions re­gard­ing habitat de­struc­tion. How­ever, if there will be fu­ture agents with prefer­ences on both sides, then there is wildly more room for gains through trade and com­pro­mise (such as the im­ple­men­ta­tion of David Pearce’s He­donis­tic im­per­a­tive) in cases like this than if other-re­gard­ing prefer­ences were ac­tu­ally anti-par­allel. Still, as I also re­mark in the con­clu­sion, peo­ple who think their re­flected prefer­ences will be suffi­ciently un­usual to have only a small over­lap with other-re­gard­ing prefer­ences of other hu­mans, even if they are re­flected, will find the whole part 1.2 less com­pel­ling for that rea­son. ↩︎

  28. Maybe we would, af­ter ideal­ized re­flec­tion, in­clude a cer­tain class of be­ings into our other-re­gard­ing prefer­ences, and we would want them to be able to ex­pe­rience, say, free­dom. It seem quite likely that fu­ture agents won’t care about these be­ing at all. How­ever, it seems very un­likely that they would have a par­tic­u­lar other-re­gard­ing prefer­ence for such be­ing to be un-free.

    Or con­sider the pa­per­clip-max­imiser, a canon­i­cal ex­am­ple for mis­al­igned AI and thus a ex­am­ple for FAP cer­tainly not be­ing par­allel to RP. Still, a pa­per­clip-max­i­mizer does not have a par­tic­u­lar aver­sion against flour­ish­ing life, just as we don’t have a par­tic­u­lar aver­sion against pa­per­clips. ↩︎

  29. Ex­am­ples of nega­tive “side-effects” as defined here:

    • The nega­tive “side-effects” of war­fare on the los­ing party are big­ger than the pos­i­tive effects for the win­ning party (as­sum­ing that the mo­ti­va­tion for the war was not “harm­ing the en­emy”, but e.g. ac­quiring the en­emy’s re­sources)

      • This is an ex­am­ple of side effects of pow­er­ful agents’ self-re­gard­ing prefer­ences on other pow­er­ful agents.

    • The nega­tive “side-effects” of fac­tory farm­ing (an­i­mal suffer­ing) are big­ger than the pos­i­tive effects for hu­man­ity (abil­ity to eat meat). Many peo­ple do care about an­i­mals, so this is also an ex­am­ple of self-re­gard­ing prefer­ences con­flict­ing with other-re­gard­ing prefer­ences.

    • The nega­tive “side-effects” of slave-la­bor on the slave are big­ger than the pos­i­tive effects for the slave owner (gain in wealth)

      • Th­ese are both ex­am­ples of side effects of pow­er­ful agents’ self-re­gard­ing prefer­ences on pow­er­less be­ings.

    Of course there are also pos­i­tive side-effects, co­op­er­a­tive and ac­ci­den­tal: E.g.

    • pos­i­tive “side-effects” of pow­er­ful agents act­ing ac­cord­ing to their prefer­ences on other pow­er­ful agents: All gains from trade and cooperation

    • pos­i­tive “side-effects” of pow­er­ful agents act­ing ac­cord­ing to their prefer­ences on pow­er­less be­ings: Ra­bies vac­ci­na­tion for wild an­i­mals. Ar­guably, wild an­i­mal pop­u­la­tion size re­duc­tion.

  30. Ad­di­tion­ally, one might ob­ject that FAP may not be the driv­ing force shap­ing the fu­ture. To­day, it seems that ma­jor de­ci­sion are me­di­ated by a com­plex sys­tem of eco­nom­i­cal and poli­ti­cal struc­tures that of­ten leads to out­comes that don’t al­ign with the prefer­ences of in­di­vi­d­ual hu­mans and that over­weights the in­ter­ests of the eco­nom­i­cally and poli­ti­cally pow­er­ful. On that view, we might ex­pect the in­fluence of hu­man(e) val­ues over the world to re­main small. We think that fu­ture agents will prob­a­bly have bet­ter tools to ac­tu­ally shape the world ac­cord­ing to their prefer­ences, which in­cludes bet­ter tools for me­di­at­ing dis­agree­ment and reach­ing mean­ingful com­pro­mise. But in­so­far as the ar­gu­ment in this foot­note ap­plies, it gives an ad­di­tional rea­son to ex­pect or­thog­o­nal ac­tions, even if FAP aren’t or­thog­o­nal. ↩︎

  31. Note that co­op­er­a­tion does not re­quire car­ing about the part­ner one co­op­er­ates with. Even two agents that don’t care about each other at all may co­op­er­ate in­stead of wag­ing war for the re­sources the other party holds, if they have good tools/​in­sti­tu­tions to ar­range com­pro­mise, be­cause the cost of war­fare is high. ↩︎

  32. Evolu­tion­ary rea­sons for the asym­me­try be­tween biolog­i­cal pain and plea­sure that would not nec­es­sar­ily re­main in de­signed digi­tal sen­tience (ideas owed to Carl Shul­man):

    • An­i­mals try to min­i­mize the du­ra­tion of pain (e.g. by mov­ing away from the source of pain), and try to max­i­mize the du­ra­tion of plea­surable events (e.g by con­tin­u­ing to eat). Thus, painful events are on av­er­age shorter than plea­surable events, and so need to be more in­tense to in­duce the same learn­ing ex­pe­rience.

    • Losses in re­pro­duc­tive fit­ness from one sin­gle nega­tive event (e.g. a deadly in­jury) can be much greater than the gains of re­pro­duc­tive fit­ness from any sin­gle pos­i­tive event, so an­i­mals evolved to want to avoid these events at all cost.

    • Bore­dom/​sa­ti­a­tion can be seen as evolved pro­tec­tion against re­ward chan­nel hack­ing. An­i­mals for which one pleas­ant stim­u­lus stayed pleas­ant in­definitely (e.g. an­i­mal that just con­tinued eat­ing) had less re­pro­duc­tive suc­cess. Pain chan­nels need less pro­tec­tion against hack­ing, be­cause pain chan­nel hack­ing...:

      • only works if there is sus­tained pain in the first place, and

      • is much harder to learn than plea­sure chan­nel hack­ing (the former: af­ter get­ting hurt, an an­i­mal would need to find and eat a pain-re­liev­ing plant; the lat­ter: an an­i­mal just needs to con­tinue eat­ing de­spite not hav­ing any use for ad­di­tional calories)

    This might be part of the rea­son why pain seems much eas­ier to in­stan­ti­ate on de­mand than hap­piness. ↩︎

  33. Even if fu­ture pow­er­ful agents have some con­cern for the welfare of sen­tient tools, sen­tient tools’ welfare might still be net nega­tive, if there are rea­sons that make pos­i­tive-welfare tools much more ex­pen­sive than nega­tive welfare tools (e.g. if suffer­ing is very im­por­tant for task perfor­mance). But even if max­i­mal effi­ciency and welfare of tools are not com­pletely cor­re­lated, we think that most suffer­ing can be avoided while still keep­ing most pro­duc­tivity, so that a lit­tle con­cern for sen­tient tools could thus go a long way. ↩︎

  34. Strate­gic acts in sce­nar­ios with lit­tle co­op­er­a­tion could mo­ti­vate the cre­ation of dis­value-op­ti­mized sen­tience, es­pe­cially in mul­ti­po­lar sce­nar­ios that con­tain both al­tru­is­tic and in­differ­ent agents (black­mailing). How­ever, be­cause un­co­op­er­a­tive acts are bad for ev­ery­one, these sce­nar­ios in ex­pec­ta­tion seem to in­volve lit­tle re­sources. On the pos­i­tive side, there can also be gains from trade be­tween al­tru­is­tic and in­differ­ent agents. ↩︎

  35. Sen­tient tools are op­ti­mized for perfor­mance in the task they are cre­ated for. Per re­source-unit, fu­ture agents would cre­ate: a num­ber of minds as is most effi­cient, with he­do­nic ex­pe­rience as is most effi­cient, op­ti­mized for task.

    (Dis)value-op­ti­mized sen­tience might be di­rectly op­ti­mized for ex­tent of con­scious­ness or in­ten­sity of ex­pe­rience (if that is ac­tu­ally what fu­ture gen­er­a­tions value al­tru­is­ti­cally). Per re­source-unit, fu­ture agents would cre­ate: as many minds as is op­ti­mal for (dis)value, with as pos­i­tive/​nega­tive as pos­si­ble he­do­nic ex­pe­rience, op­ti­mized for con­scious states.

    Such sen­tience might be or­ders of mag­ni­tude more effi­cient in cre­at­ing con­scious ex­pe­rience than sen­tience not op­ti­mized for it. E.g. in hu­mans, only a tiny frac­tion of en­ergy is used for peak con­scious ex­pe­rience: about 20% of en­ergy is used for the brain, only a frac­tion of that is used for con­scious ex­pe­rience, only a frac­tion of which are “peak” ex­pe­riences. ↩︎

  36. The driv­ing force be­hind this judge­ment is not nec­es­sar­ily the be­lief that most fu­tures will be good. Rather, it is the be­lief that the ‘rather good’ fu­tures will con­tain more net value than the ‘rather bad’ fu­tures will con­tain net dis­value.

    • The ‘rather good’ fu­tures con­tain agents with other-re­gard­ing prefer­ences highly par­allel to our re­flected prefer­ences. Many re­sources will be spent in a way that op­ti­mizes for value (by our lights).

    • In the ‘rather bad’ fu­tures, agents are largely self­ish, or have other-re­gard­ing prefer­ences com­pletely or­thog­o­nal to our re­flected other-re­gard­ing prefer­ences. In these fu­tures, most re­sources will be spent for goals that we do not care about, but very few re­sources will be spent to pro­duce things we would dis­value in an op­ti­mized way. On whichever side of ”zero” these sce­nar­ios fall, they seem much closer to par­ity than the “rather good fu­tures” (from most moral views).

  37. As also noted in the dis­cus­sion at the end of the ar­ti­cle, part 1 is less rele­vant for peo­ple who have other-re­gard­ing prefer­ences very differ­ent from other peo­ple, and who be­lieve their RP to be very differ­ent from the RP of the rest of hu­man­ity. ↩︎

  38. Op­tion value is not a sep­a­rate kind of value, and it would be already in­te­grated in the perfect EV calcu­la­tion. How­ever, it is quite easy to over­look, and some­what im­por­tant in this con­text, so it is dis­cussed sep­a­rately here. ↩︎

  39. In a gen­eral sense, ‘op­tion value’ in­cludes the value of any change of strat­egy, for the bet­ter or worse, that fu­ture agents might take upon learn­ing more. How­ever, the gen­eral fact that agents can learn more and adapt their strat­egy is not sur­pris­ing and was already fac­tored into con­sid­er­a­tions 1, 2 and 4. ↩︎

  40. In the more gen­eral defi­ni­tion, op­tion value is not always pos­i­tive. In gen­eral, giv­ing fu­ture agents the op­tion to choose be­tween differ­ent strate­gies can be bad, if the val­ues of fu­ture agents are bad or their epistemics are worse. In this sec­tion, ‘op­tion value’ only refers to the op­tion of fu­ture agents not to colonize space, if they find coloniz­ing space would be bad from an al­tru­is­tic per­spec­tive. It seems very un­likely that, if fu­ture agents re­frain from space coloniza­tion for al­tru­is­tic rea­sons at all, they would do so ex­actly in those cases in which we (cur­rent gen­er­a­tion) would have judge space coloniza­tion as pos­i­tive (ac­cord­ing to our re­flected prefer­ences). So this kind of op­tion value is very un­likely to be nega­tive. ↩︎

  41. Although em­piri­cal in­sights about the uni­verse play a role in both op­tion value and part 2.2, these two con­sid­er­a­tions are differ­ent:

    • Part 2.2: Fur­ther in­sight about the uni­verse might show that there already is a lot of dis­value out there. A benev­olent civ­i­liza­tion might re­duce this dis­value.

    • Op­tion value: Fur­ther in­sight about the uni­verse might show that there already is a lot of value or dis­value out there. That means that we should be un­cer­tain about the EV of (post-)hu­man space coloniza­tion. Our de­scen­dants will be less un­cer­tain, and can then, if they know there is NOT already a lot of dis­value out there, still de­cide to not spread to the stars.

  42. In­di­vi­d­ual hu­mans as well as hu­man so­ciety have be­come more in­tel­li­gent over time. See: his­tory of ed­u­ca­tion, sci­en­tific rev­olu­tion, Flynn effect, in­for­ma­tion tech­nol­ogy. Ge­netic en­g­ineer­ing or ar­tifi­cial in­tel­li­gence may fur­ther in­crease our in­di­vi­d­ual and col­lec­tive cog­ni­tion. ↩︎

  43. For ex­am­ple, if we care only about max­i­miz­ing X, but fu­ture agents will care about max­i­miz­ing X, Y and Z to equal parts, let­ting them de­cide whether or not to colonize space might still lead to more X than if we de­cided, be­cause they have vastly more knowl­edge about the uni­verse and are gen­er­ally much more ca­pa­ble of mak­ing ra­tio­nal de­ci­sions. ↩︎

  44. Even if fu­ture agents can make bet­ter de­ci­sions re­gard­ing our other-re­gard­ing prefer­ences than we (cur­rently) could, fu­ture agents also need to be non-self­ish enough to act ac­cord­ingly—their other-re­gard­ing prefer­ences need to con­sti­tute a suffi­ciently large frac­tion of their over­all prefer­ences. ↩︎

  45. Say we are un­cer­tain about the value in the fu­ture in two ways:

    • 50% cre­dence that dis­value-fo­cused view would be my preferred moral view af­ter ideal­ized re­flec­tion, 50% cre­dence in a ‘bal­anced view’ that also val­ues the cre­ation of value.

    • 50% cre­dence that the fu­ture will be con­trol­led by in­differ­ent ac­tors, with prefer­ences com­pletely or­thog­o­nal to our re­flected prefer­ences, 50% cre­dence that it will be con­trol­led by good ac­tors who have ex­actly the prefer­ences we would have af­ter ideal­ized re­flec­tion.

    The fol­low­ing table shows ex­pected net value of space coloniza­tion with­out con­sid­er­ing op­tion value (again: made-up num­bers):

    In­differ­ent ac­tors Good ac­tors
    Dis­value-fo­cused view −100 −10
    ‘Balanced view’ − 5 100

    Now with op­tion value, only the good ac­tors would limit the harm if the dis­value-fo­cused view was in­deed our (and thus, their) preferred moral view af­ter ideal­ized re­flec­tion:

    In­differ­ent ac­tors Good ac­tors
    Dis­value-fo­cused view −100 0
    ‘Balanced view’ − 5 100
  46. There is more op­tion value, if:

    • One one cur­rently has high moral un­cer­tainty (one ex­pects one’s views to change con­sid­er­ably upon ideal­ized re­flec­tion). With high moral un­cer­tainty, it is more likely that fu­ture agents will have sig­nifi­cantly more ac­cu­rate moral val­ues. Ex­pects fu­ture agents to have a sig­nifi­cantly bet­ter em­piri­cal understanding

    • One’s un­cer­tainty about the EV of the fu­ture comes mainly from moral, and not em­piri­cal, un­cer­tainty. For ex­am­ple, say you are un­cer­tain about the ex­pected value of the fu­ture be­cause you are un­sure whether you would, in your re­flected prefer­ences, en­dorse a strongly dis­value-fo­cused view. If you are gen­er­ally op­ti­mistic about fu­ture agents, you can as­sume fu­ture gen­er­a­tions to be bet­ter in­formed about which moral view to take. Thus, there is a lot of op­tion value in re­duc­ing the risk of hu­man ex­tinc­tion. If, one the other hand, you are un­cer­tain about the EV of the fu­ture be­cause you think there is a high chance that fu­ture agents just won’t be al­tru­is­tic, there is no op­tion value in defer­ring the de­ci­sion about space coloniza­tion to them.

  47. It seems likely that some life-forms would sur­vive, ex­cept if hu­man ex­tinc­tion is caused by some cos­mic catas­tro­phes (not a fo­cus area for effec­tive al­tru­ists, be­cause un­likely and in­tractable) or by spe­cific forms of nano-tech­nol­ogy or by mis­al­igned AI. ↩︎

  48. The ex­tent to which it is true de­pends on the re­flec­tion pro­cess one chooses. Sev­eral peo­ple who read an early draft of this ar­ti­cle com­mented that they would imag­ine their re­flected prefer­ences to be in­de­pen­dent of hu­man-spe­cific fac­tors. ↩︎

  49. The ar­gu­ment in the main text as­sumed that the al­ter­na­tive space coloniza­tion con­tains a com­pa­rable amount of things that we find morally rele­vant as the (post-)hu­man coloniza­tion. But in many cases, the EV of an al­ter­na­tive space coloniza­tion would ac­tu­ally be (near) neu­tral, be­cause the al­ter­na­tive civ­i­liza­tion’s prefer­ences would be or­thog­o­nal to ours. Our val­ues would just be so differ­ent from the AI’s or ex­trater­res­trial val­ues that space coloniza­tion by these agents might of­ten look neu­tral to us. The ar­gu­ment in the main text still ap­plies, but only for those al­ter­na­tive space coloniza­tions that con­tain com­pa­rable ab­solute amounts of value and dis­value.

    How­ever, a very similar ar­gu­ment ap­plies even for al­ter­na­tive coloniza­tions that con­tain less ab­solute amount of things we morally care about. The value of al­ter­na­tive space coloniza­tion would be shifted more to­wards zero, but fu­ture pes­simists would in ex­pec­ta­tion always find al­ter­na­tive space coloniza­tion a worse out­come than no space coloniza­tion. From the fu­ture pes­simistic per­spec­tive, hu­man ex­tinc­tion leads to a bad out­come (al­ter­na­tive coloniza­tion), and not a neu­tral one (no space coloniza­tion). Fu­ture pes­simists should thus up­date to­wards ex­tinc­tion risk re­duc­tion be­ing less nega­tive. Fu­ture op­ti­mists might find the al­ter­na­tive space coloniza­tion bet­ter or worse than no coloniza­tion.

    The math­e­mat­i­cal deriva­tion in the next foot­note takes this caveat into ac­count. ↩︎

  50. As­sump­tion: This deriva­tion makes the as­sump­tion that peo­ple who think the EV of hu­man space coloniza­tion is nega­tive and those who think it is pos­i­tive would still rank a set of po­ten­tial fu­ture sce­nar­ios in the same or­der when eval­u­at­ing them nor­ma­tively. This seems plau­si­ble, but may not be the case. Let’s sim­plify the value of hu­man ex­tinc­tion risk re­duc­tion to:

    EV(re­duc­tion of hu­man ex­tinc­tion risk) = EV(hu­man space coloniza­tion) - EV(hu­man ex­tinc­tion)

    (This sim­plifi­ca­tion is very un­char­i­ta­ble to­wards ex­tinc­tion risk re­duc­tion, even if only con­sid­er­ing the long-term effects, see parts 2 and 3 of this ar­ti­cle). As­sum­ing that no non-hu­man an­i­mal or ex­trater­res­trial civ­i­liza­tion would emerge in case of hu­man ex­tinc­tion, then EV(hu­man ex­tinc­tion)=0, and so fu­ture pes­simists judge:

    EV(re­duc­tion of hu­man ex­tinc­tion risk) = EV(hu­man space coloniza­tion) - EV(hu­man ex­tinc­tion)= EV(hu­man space coloniza­tion) < 0

    And fu­ture op­ti­mists be­lieve:

    EV(re­duc­tion of hu­man ex­tinc­tion risk) = EV(hu­man space coloniza­tion) - EV(hu­man ex­tinc­tion) = EV(hu­man space coloniza­tion) > 0

    Let’s say, if hu­man­ity goes ex­tinct, there will be non-hu­man space coloniza­tion even­tu­ally with the prob­a­bil­ity p. (p can be down-weighted in a way to ac­count for the fact that later space coloniza­tion prob­a­bly means less fi­nal area colonized). This means that:

    EV(hu­man ex­tinc­tion) = p * EV(non-hu­man space coloniza­tion)

    Let’s define the amount of value and dis­value cre­ated by hu­man space coloniza­tion as Vₕ and Dₕ, and the amount value and dis­value cre­ated by the non-hu­man civ­i­liza­tion as Vₙₕ and Dₙₕ.

    We can ex­pect two re­la­tions:

    1. On av­er­age, a non-hu­man civ­i­liza­tion will care less about cre­at­ing value and care less about re­duc­ing dis­value than a hu­man civ­i­liza­tion. We can ex­pect the ra­tio of value to dis­value to be worse in the case of a non-hu­man civ­i­liza­tion:

    (i) Vₙₕ/​Dₙₕ = (Vₕ/​Dₕ) * r, with 0 ⇐ r ⇐ 1

    1. On av­er­age, non-hu­man an­i­mals and ex­trater­res­trial val­ues will be alien to us, their prefer­ences will be or­thog­o­nal to ours. I seems likely that on av­er­age these fu­tures will con­tain less value or dis­value than a fu­ture with hu­man space-coloniza­tion.

    (ii) (Vₙₕ + Dₙₕ) = (Vₕ + Dₕ) * a, with 0 ⇐ a ⇐ 1

    Fi­nally, the ex­pected value of non-hu­man space coloniza­tion can be ex­pressed as (by defi­ni­tion):

    (iii) EV(non-hu­man space coloniza­tion) = Vₙₕ - Dₙₕ

    Us­ing (i), (ii), and (iii) we get:

    EV(hu­man ex­tinc­tion) = EV(non-hu­man space coloniza­tion) * Prob­a­bil­ity(non-hu­man space coloniza­tion) = (Vₙₕ - Dₙₕ) * p = [a * (Vₕ + Dₕ) /​ ((Vₕ/​ Dₕ) * r + 1)] * (r * Vₕ/​ Dₕ − 1) * p

    The first term [in square brack­ets] is always pos­i­tive. The sign of the sec­ond term (in bold) can change de­pend­ing on whether we were pre­vi­ously op­ti­mistic or pes­simistic about the fu­ture.

    If we were pre­vi­ously pes­simistic about the fu­ture, we thought:

    Vₕ - Dₕ < 0 → Vₕ/​ Dₕ < 1

    The sec­ond term is nega­tive, EV of hu­man ex­tinc­tion is nega­tive. Com­pared to the “naive” pes­simistic view (as­sum­ing EV(hu­man ex­tinc­tion) = 0), pes­simists should up­date their view into the di­rec­tion of EV(re­duc­ing hu­man ex­tinc­tion risk) be­ing less nega­tive.

    If we were pre­vi­ously op­ti­mistic about the fu­ture, we thought:

    Vₕ - Dₕ > 0 → Vₕ/​ Dₕ > 1

    Now the sec­ond term can be nega­tive, neu­tral, or pos­i­tive. Com­pared to the naive view, fu­ture op­ti­mists should some­times be more en­thu­si­as­tic (if Vₙₕ/​ Dₙₕ= r * Vₕ/​ Dₕ < 1) and some­times be less en­thu­si­as­tic (if Vₙₕ/​ Dₙₕ= r * Vₕ/​ Dₕ > 1) about ex­tinc­tion risk re­duc­tion than they pre­vi­ously were. ↩︎

  51. Let’s define fu­ture pes­simists as peo­ple who judge the ex­pected value of (post-)hu­man space coloniza­tion as nega­tive; fu­ture op­ti­mists analo­gously. Now con­sider the ex­am­ple of a non-hu­man civ­i­liza­tion sig­nifi­cantly worse than hu­man civ­i­liza­tion (by our lights), such that fu­ture op­ti­mists would find it nor­ma­tively neu­tral, and fu­ture pes­simists find it sig­nifi­cantly more nega­tive than hu­man civ­i­liza­tion. Then fu­ture op­ti­mists would not up­date their judge­ment (com­pared to be­fore con­sid­er­ing the pos­si­bil­ity of a non-hu­man an­i­mal space­far­ing civ­i­liza­tion), but pes­simists would up­date sig­nifi­cantly into the di­rec­tion of hu­man ex­tinc­tion risk re­duc­tion be­ing pos­i­tive. ↩︎

  52. E.g. one might think that hu­man­ity might be com­par­a­tively bad at co­or­di­na­tion (com­pared to e.g. in­tel­li­gent ants), and so rel­a­tively likely to cre­ate un­con­trol­led AI wrong, which might be an ex­cep­tion­ally bad out­come, maybe even worse than an in­tel­li­gent ant civ­i­liza­tion. How­ever, con­sid­er­a­tions like this seem to re­quire highly spe­cific judge­ments and are likely not very ro­bust. ↩︎

  53. Sec­tion 4.2 is not de­pen­dent on a welfarist or even con­se­quen­tial­ist view. More gen­er­ally, it ap­plies to any kind of em­piri­cal or moral in­sight that we might have, which would make us re­al­ize that other things than we pre­vi­ously thought are of great moral value or dis­value. ↩︎

  54. For ex­am­ple:

    • The his­tory of an “ex­pand­ing moral cir­cle” (Singer, 2011), from tribes to na­tions to all hu­mans…

    • The rel­a­tively new no­tion of environmentalism

    • The new no­tion of wild an­i­mal suffering

    • The new no­tion of fu­ture be­ings be­ing (as­tro­nom­i­cally) im­por­tant (Bostrom, 2003)

  55. As­sum­ing that the side-effects of re­sources spent for self-re­gard­ing prefer­ences of fu­ture agents are neu­tral/​sym­met­ric with re­gards to the be­ings/​things out there (which seems to be a rea­son­able as­sump­tion). ↩︎

  56. Fermi-es­ti­mate (wild guesses, again):

    1. As­sume a 20% prob­a­bil­ity that, with more moral and em­piri­cal in­sight, we would con­clude that the uni­verse is already filled with be­ings/​things that we morally care about

    2. As­sume that the al­tru­is­tic im­pact fu­ture agents could have is always pro­por­tional to the amount of re­sources spent for al­tru­is­tic pur­poses. If the uni­verse is de­void of value or dis­value, then al­tru­is­tic re­sources will be spent on cre­at­ing new value (e.g. happy be­ings). If the uni­verse is already filled with be­ings/​things that we morally care about, it will likely con­tain some dis­value. As­sume that in these cases, 25% of al­tru­is­tic re­sources will be used to re­duce this dis­value (and only 75% to cre­ate new value). Also as­sume that re­sources can be used at the same effi­ciency e to cre­ate new dis­value, or to re­duce ex­ist­ing dis­value.

    3. As­sume that re­sources spent for self-re­gard­ing prefer­ences of fu­ture agents would on av­er­age not im­prove or worsen the situ­a­tion for the things of (dis)value already out there.

    4. As­sume that in ex­pec­ta­tion, fu­ture agents will spend 40 times as many re­sources pur­su­ing other-re­gard­ing prefer­ences par­allel to our re­flected prefer­ences (“al­tru­is­tic”) than on pur­su­ing other-re­gard­ing prefer­ences anti-par­allel to our re­flected prefer­ences (“anti-al­tru­is­tic”). Note that this is com­pat­i­ble with fu­ture agents, in ex­pec­ta­tion, spend­ing most of their re­sources on other-re­gard­ing prefer­ences com­pletely or­thog­o­nal to our re­flected prefer­ences.

    5. From a dis­value-fo­cused per­spec­tive, cre­ation of new value does not mat­ter, only cre­ation of new dis­value, or re­duc­tion of already ex­ist­ing dis­value. From such a per­spec­tive: (R: to­tal amount of re­sources spent on par­allel or anti-par­allel other-re­gard­ing prefer­ences).

    • Ex­pected cre­ation of new dis­value = (1/​40) * R * e = 2.5% * R * e

    • Ex­pected re­duc­tion of already ex­ist­ing dis­value = 20% * 25% * (1-(1/​40)) * R * e = 5% * R * e

    Thus, the ex­pected re­duc­tion of dis­value through (post-)hu­man­ity is 2 times greater than ex­pected cre­ation of dis­value. This is, how­ever, an up­per bound. The calcu­la­tion as­sumed that the uni­verse con­tains enough dis­value that fu­ture agents could ac­tu­ally spend 25% al­tru­is­tic re­sources on alle­vi­at­ing it, be­fore hav­ing alle­vi­ated it all. In some cases, the uni­verse might not con­tain that much dis­value, so some re­sources would go into the cre­ation of value again. ↩︎

  57. Analo­gous to part 1.2, this part 2.2 is less rele­vant for peo­ple who be­lieve that some of their re­flected other-re­gard­ing prefer­ences will be so un­usual that they will be anti-par­allel to most of hu­man­ity’s re­flected other-re­gard­ing prefer­ences. Such a view is e.g. defended by Brian To­masik in the con­text of suffer­ing in fun­da­men­tal physics. To­masik ar­gues that, even if he (af­ter ideal­ized re­flec­tion) and fu­ture gen­er­a­tion both came around to care for sen­tience in fun­da­men­tal physics, and even if fu­ture gen­er­a­tions were to in­fluence fun­da­men­tal physics for al­tru­is­tic rea­sons, they would still be more likely to do it in a way that in­creases the vi­vac­ity of physics, which To­masik (af­ter ideal­ized re­flec­tion) would op­pose. ↩︎

  58. This sec­tion draws heav­ily on Nick Beck­stead’s thoughts. ↩︎

  59. Global catas­tro­phes that do not di­rectly cause hu­man ex­tinc­tion may ini­ti­ate de­vel­op­ments that lead to ex­tinc­tion later on. For the pur­poses of this ar­ti­cle, these cases are not differ­ent from di­rect ex­tinc­tion, and are omit­ted here. ↩︎

  60. E.g. Paul Chris­ti­ano: “So if mod­ern civ­i­liza­tion is de­stroyed and even­tu­ally suc­cess­fully re­built, I think we should treat that as re­cov­er­ing most of Earth’s al­tru­is­tic po­ten­tial (though I would cer­tainly hate for it to hap­pen).” In his ar­ti­cle, Chris­ti­ano out­lines sev­eral em­piri­cal and moral judge­ment calls that lead him to his con­clu­sion, such as:

    • As long a moral re­flec­tion and so­phis­ti­ca­tion pro­cess is on­go­ing, which seems likely, civ­i­liza­tions will reach very good val­ues (by his lights).

    • He is will­ing to dis­card his idiosyn­cratic judge­ments.

    • He di­rectly cares about oth­ers’ (re­flected) val­ues.

  61. It is of course a ques­tion whether one should stick with one’s own prefer­ences, if the ma­jor­ity of re­flected and al­tru­is­tic agents have op­po­site prefer­ences. Ac­cord­ing to some em­piri­cal and meta-eth­i­cal as­sump­tions, one should. ↩︎

  62. Differ­ent ad­vo­cates of strong suffer­ing-fo­cused views come to differ­ent judge­ments on the topic. They all seem to agree that, from a purely suffer­ing-fo­cused per­spec­tive, it is not clear whether efforts to re­duce the risk of hu­man ex­tinc­tion are pos­i­tive or nega­tive:

    Lukas Gloor: “it ten­ta­tively seems to me that the effect of mak­ing cos­mic stakes (and there­fore down­side risks) more likely is not suffi­ciently bal­anced by pos­i­tive effects on sta­bil­ity, arms race pre­ven­tion and civ­i­liza­tional val­ues (fac­tors which would make down­side risks less likely). How­ever, this is hard to as­sess and may change de­pend­ing on novel in­sights.” … “We have seen that efforts to re­duce ex­tinc­tion risk (ex­cep­tion: AI al­ign­ment) are un­promis­ing in­ter­ven­tions for down­side-fo­cused value sys­tems, and some of the in­ter­ven­tions available in that space (es­pe­cially if they do not si­mul­ta­neously also im­prove the qual­ity of the fu­ture) may even be nega­tive when eval­u­ated purely from this per­spec­tive.”

    David Pearce: “Should ex­is­ten­tial risk re­duc­tion be the pri­mary goal of: a) nega­tive util­i­tar­i­ans? b) clas­si­cal he­do­nis­tic util­i­tar­i­ans? c) prefer­ence util­i­tar­i­ans? All, or none, of the above? The an­swer is far from ob­vi­ous. For ex­am­ple, one might naively sup­pose that a nega­tive util­i­tar­ian would wel­come hu­man ex­tinc­tion. But only (trans)hu­mans—or our po­ten­tial su­per­in­tel­li­gent suc­ces­sors—are tech­ni­cally ca­pa­ble of phas­ing out the cru­elties of the rest of the liv­ing world on Earth. And only (trans)hu­mans—or rather our po­ten­tial su­per­in­tel­li­gent suc­ces­sors—are tech­ni­cally ca­pa­ble of as­sum­ing stew­ard­ship of our en­tire Hub­ble vol­ume.” … “In prac­tice, I don’t think it’s eth­i­cally fruit­ful to con­tem­plate de­stroy­ing hu­man civil­i­sa­tion, whether by ther­monu­clear Dooms­day de­vices or util­itro­n­ium shock­waves. Un­til we un­der­stand the up­per bounds of in­tel­li­gent agency, the ul­ti­mate sphere of re­spon­si­bil­ity of posthu­man su­per­in­tel­li­gence is un­known. Quite pos­si­bly, this ul­ti­mate sphere of re­spon­si­bil­ity will en­tail stew­ard­ship of our en­tire Hub­ble vol­ume across mul­ti­ple quasi-clas­si­cal Everett branches, maybe ex­tend­ing even into what we naively call the past [...]. In short, we need to cre­ate full-spec­trum su­per­in­tel­li­gence.”

    Brian To­masik: “I’m now less hope­ful that catas­trophic-risk re­duc­tion is plau­si­bly good for pure nega­tive util­i­tar­i­ans. The main rea­son is that some catas­trophic risk, such as from mal­i­cious biotech, do seem to pose non­triv­ial risk of caus­ing com­plete ex­tinc­tion rel­a­tive to their prob­a­bil­ity of merely caus­ing may­hem and con­flict. So I now don’t sup­port efforts to re­duce non-AGI “ex­is­ten­tial risks”. [...] Re­gard­less, nega­tive util­i­tar­i­ans should just fo­cus their sights on more clearly benefi­cial suffer­ing-re­duc­tion pro­jects” ↩︎

  63. For ex­am­ple, in­ter­ven­tions that aim at im­prov­ing hu­man­ity’s val­ues/​in­creas­ing the cir­cle of em­pa­thy might be highly lev­er­aged and time-sen­si­tive, if hu­man­ity achieves goal con­ser­va­tion soon, or val­ues are oth­er­wise sticky. ↩︎

  64. “Pos­i­tive”/​”nega­tive” as defined from a welfarist per­spec­tive. ↩︎

  65. So­cieties may in­crease the costs, and thereby re­duc­ing the fre­quency, of acts fol­low­ing from nega­tive other-re­gard­ing prefer­ences, as long as nega­tive other-re­gard­ing prefer­ences are a minor­ity. E.g. if 5% of a so­ciety have a other-re­gard­ing prefer­ence for in­flict­ing suffer­ing on a cer­tain group (of pow­er­less be­ings), but 95% have a prefer­ence against it, in many so­cietal forms less than 5% of peo­ple will ac­tu­ally in­flict suffer­ing on this group of pow­er­less be­ings, be­cause there will be laws against it, … ↩︎

  66. This fact could be in­ter­preted ei­ther as hu­man na­ture that we will re­vert to, or as a trend of moral progress. The lat­ter seems more likely to us. ↩︎

  67. Another pos­si­ble op­er­a­tional­iza­tion of the ra­tio be­tween pos­i­tive and nega­tive other-re­gard­ing prefer­ences: How much money is spent on pur­su­ing pos­i­tive and nega­tive other-re­gard­ing prefer­ences?

    • Some state bud­gets are clearly pur­suant to pos­i­tive other-re­gard­ing preferences

    • It is less clear whether there are bud­gets that are clearly pur­suant to nega­tive other-re­gard­ing prefer­ences, al­though at least a part of mil­i­tary spend­ing is.