Cause prioritization for downside-focused value systems

Last ed­ited: Au­gust 27th 2019.

This post out­lines my think­ing on cause pri­ori­ti­za­tion from the per­spec­tive of value sys­tems whose pri­mary con­cern is re­duc­ing dis­value. I’m mainly think­ing of suffer­ing-fo­cused ethics (SFE), but I also want to in­clude moral views that at­tribute sub­stan­tial dis­value to things other than suffer­ing, such as in­equal­ity or prefer­ence vi­o­la­tion. I will limit the dis­cus­sion to in­ter­ven­tions tar­geted at im­prov­ing the long-term fu­ture (see the rea­sons in sec­tion II). I hope my post will also be in­for­ma­tive for peo­ple who do not share a down­side-fo­cused out­look, as think­ing about cause pri­ori­ti­za­tion from differ­ent per­spec­tives, with em­pha­sis on con­sid­er­a­tions other than those one is used to, can be illu­mi­nat­ing. More­over, un­der­stand­ing the strate­gic con­sid­er­a­tions for plau­si­ble moral views is es­sen­tial for act­ing un­der moral un­cer­tainty and co­op­er­at­ing with peo­ple with other val­ues.

I will talk about the fol­low­ing top­ics:

  • Which views qual­ify as down­side-fo­cused (given our em­piri­cal situ­a­tion)

  • Why down­side-fo­cused views pri­ori­tize s-risk re­duc­tion over utopia creation

  • Why ex­tinc­tion risk re­duc­tion is un­likely to be a promis­ing in­ter­ven­tion ac­cord­ing to down­side-fo­cused views

  • Why AI al­ign­ment is prob­a­bly pos­i­tive for down­side-fo­cused views, and es­pe­cially pos­i­tive if done with cer­tain precautions

  • What to in­clude in an EA port­fo­lio that in­cor­po­rates pop­u­la­tion eth­i­cal un­cer­tainty and co­op­er­a­tion be­tween value systems

Which views qual­ify as down­side-fo­cused?

I’m us­ing the term down­side-fo­cused to re­fer to value sys­tems that in prac­tice (given what we know about the world) pri­mar­ily recom­mend work­ing on in­ter­ven­tions that make bad things less likely.[1] For ex­am­ple, if one holds that what is most im­por­tant is how things turn out for in­di­vi­d­u­als (welfarist con­se­quen­tial­ism), and that it is com­par­a­tively unim­por­tant to add well-off be­ings to the world, then one should likely fo­cus on pre­vent­ing suffer­ing.[2] That would be a down­side-fo­cused eth­i­cal view.

By con­trast, other moral views place great im­por­tance on the po­ten­tial up­sides of very good fu­tures, in par­tic­u­lar with re­spect to bring­ing about a utopia where vast num­bers of well-off in­di­vi­d­u­als will ex­ist. Pro­po­nents of such views may also be­lieve it to be a top pri­or­ity that a large, flour­ish­ing civ­i­liza­tion ex­ists for an ex­tremely long time. I will call these views up­side-fo­cused.

Up­side-fo­cused views do not have to im­ply that bring­ing about good things is nor­ma­tively more im­por­tant than pre­vent­ing bad things; in­stead, a view also counts as up­side-fo­cused if one has rea­son to be­lieve that bring­ing about good things is eas­ier in prac­tice (and thus more over­all value can be achieved that way) than pre­vent­ing bad things.

A key point of dis­agree­ment be­tween the two per­spec­tives is that the up­side-fo­cused peo­ple might say that suffer­ing and hap­piness are in a rele­vant sense sym­met­ri­cal, and that down­side-fo­cused peo­ple are too will­ing to give up good things in the fu­ture, such as the com­ing into ex­is­tence of many happy be­ings, just to pre­vent suffer­ing. On the other side, down­side-fo­cused peo­ple feel that the other party is too will­ing to ac­cept, say, that many peo­ple suffer­ing ex­tremely goes un­ad­dressed, or is in some sense be­ing ac­cepted in or­der to achieve some pur­port­edly greater good.

Whether a nor­ma­tive view qual­ifies as down­side-fo­cused or up­side-fo­cused is not always easy to de­ter­mine, as the an­swer can de­pend on difficult em­piri­cal ques­tions such as how much dis­value we can ex­pect to be able to re­duce ver­sus how much value we can ex­pect to be able to cre­ate. I feel con­fi­dent how­ever that views ac­cord­ing to which it is not in it­self (par­tic­u­larly) valuable to bring be­ings in op­ti­mal con­di­tions into ex­is­tence come out as largely down­side-fo­cused. The fol­low­ing com­mit­ments may lead to a down­side-fo­cused pri­ori­ti­za­tion:

For those who are un­sure about where their be­liefs may fall on the spec­trum be­tween down­side- and up­side-fo­cused views, and how this af­fects their cause pri­ori­ti­za­tion with re­gard to the long-term fu­ture, I recom­mend be­ing on the look­out for in­ter­ven­tions that are pos­i­tive and im­pact­ful ac­cord­ing to both per­spec­tives. Alter­na­tively, one could en­gage more with pop­u­la­tion ethics to per­haps cash in on the value of in­for­ma­tion from nar­row­ing down one’s un­cer­tainty.

Most ex­pected dis­value hap­pens in the long-term future

In this post, I will only dis­cuss in­ter­ven­tions cho­sen with the in­tent of af­fect­ing the long-term fu­ture – which not ev­ery­one agrees is the best strat­egy for do­ing good. I want to note that choos­ing in­ter­ven­tions that re­li­ably re­duce suffer­ing or pro­mote well-be­ing in the short run also has many ar­gu­ments in its fa­vor.

Hav­ing said that, I be­lieve that most of the ex­pected value comes from the effects our ac­tions have on the long-term fu­ture, and that our think­ing about cause pri­ori­ti­za­tion should ex­plic­itly re­flect this. The fu­ture may come to hold as­tro­nom­i­cal quan­tities of the things that peo­ple value (Bostrom, 2003). Cor­re­spond­ingly, for moral views that place a lot of weight on bring­ing about as­tro­nom­i­cal quan­tities of pos­i­tive value (e.g., hap­piness or hu­man flour­ish­ing), Nick Beck­stead pre­sented a strong case for fo­cus­ing on the long-term fu­ture. For down­side-fo­cused views, that case rests on similar premises. A sim­plified ver­sion of that ar­gu­ment is based on the two fol­low­ing ideas:

  1. Some fu­tures con­tain as­tro­nom­i­cally more dis­value than oth­ers, such as un­con­trol­led space coloniza­tion ver­sus a fu­ture where com­pas­sion­ate and wise ac­tors are in con­trol.

  2. It is suffi­ciently likely that our cur­rent ac­tions can help shape the fu­ture so that we avoid worse out­comes and end up with bet­ter ones. For ex­am­ple, such an ac­tion could be to try to figure out which in­ter­ven­tions best im­prove the long-term fu­ture.

This does not mean that one should nec­es­sar­ily pick in­ter­ven­tions one thinks will af­fect the long-term fu­ture through some spe­cific, nar­row path­way. Rather, I am say­ing (fol­low­ing Beck­stead) that we should pick our ac­tions based pri­mar­ily on what we es­ti­mate their net effects to be on the long-term fu­ture.[6] This in­cludes not only nar­rowly tar­geted in­ter­ven­tions such as tech­ni­cal work in AI al­ign­ment, but also pro­jects that im­prove the val­ues and de­ci­sion-mak­ing ca­pac­i­ties in so­ciety at large to help fu­ture gen­er­a­tions cope bet­ter with ex­pected challenges.

Down­side-fo­cused views pri­ori­tize s-risk re­duc­tion over utopia creation

The ob­serv­able uni­verse has very lit­tle suffer­ing (or in­equal­ity, prefer­ence frus­tra­tion, etc.) com­pared to what could be the case; for all we know suffer­ing at the mo­ment may only ex­ist on one small planet in a com­pu­ta­tion­ally in­effi­cient form of or­ganic life.[7] Ac­cord­ing to down­side-fo­cused views, this is for­tu­nate, but it also means that things can be­come much worse. Suffer­ing risks (or “s-risks”) are defined as events that would bring about suffer­ing on an as­tro­nom­i­cal scale, vastly ex­ceed­ing all suffer­ing that has ex­isted on earth so far. Analo­gously and more gen­er­ally, we can define down­side risks as events that would bring about dis­value (in­clud­ing things other than suffer­ing) at vastly un­prece­dented scales.

Why might this defi­ni­tion be prac­ti­cally rele­vant? Imag­ine the hy­po­thet­i­cal fu­ture sce­nario “Busi­ness as usual (BAU),” where things con­tinue on­wards in­definitely ex­actly as they are to­day, with all bad things be­ing con­fined to earth only. Hy­po­thet­i­cally, let’s say that we ex­pect 10% of fu­tures to be BAU, and we imag­ine there was an in­ter­ven­tion – let’s call it par­adise cre­ation – that changed all BAU fu­tures into fu­tures where a suffer­ing-free par­adise is cre­ated. Let us fur­ther as­sume that an­other 10% of fu­tures will be fu­tures where earth-origi­nat­ing in­tel­li­gence colonizes space and things go very wrong such that, through some path­way or an­other, cre­ates vastly more suffer­ing than has ever ex­isted (and would ever ex­ist) on earth, and lit­tle to no hap­piness or good things. We will call this sec­ond sce­nario “Astro­nom­i­cal Suffer­ing (AS).”

If we limit our at­ten­tion to only the two sce­nar­ios AS and BAU (of course there are many other con­ceiv­able sce­nar­ios, in­clud­ing sce­nar­ios where hu­mans go ex­tinct or where space coloniza­tion re­sults in a fu­ture filled with pre­dom­i­nantly hap­piness and flour­ish­ing), then we see that the to­tal suffer­ing in the AS fu­tures vastly ex­ceeds all the suffer­ing in the BAU fu­tures. Suc­cess­ful par­adise cre­ation there­fore would have a much smaller im­pact in terms of re­duc­ing suffer­ing than an al­ter­na­tive in­ter­ven­tion that averts the 10% s-risk from the AS sce­nario, chang­ing it to a BAU sce­nario for in­stance. Even re­duc­ing the s-risk from AS in our ex­am­ple by a sin­gle per­centage point would be vastly more im­pact­ful than pre­vent­ing the suffer­ing from all the BAU fu­tures.

This con­sid­er­a­tion high­lights why suffer­ing-fo­cused al­tru­ists should prob­a­bly in­vest their re­sources not into mak­ing very good out­comes more likely, but rather into mak­ing dystopian out­comes (or dystopian el­e­ments in oth­er­wise good out­comes) less likely. Utopian out­comes where all suffer­ing is abol­ished through tech­nol­ogy and all sen­tient be­ings get to en­joy lives filled with un­prece­dented heights of hap­piness are cer­tainly some­thing we should hope will hap­pen. But from a down­side-fo­cused per­spec­tive, our own efforts to do good are, on the mar­gin, bet­ter di­rected to­wards mak­ing it less likely that we get par­tic­u­larly bad fu­tures.

While the AS sce­nario above was stipu­lated to con­tain lit­tle to no hap­piness, it is im­por­tant to note that s-risks can also af­fect fu­tures that con­tain more happy in­di­vi­d­u­als than suffer­ing ones. For in­stance, the suffer­ing in a fu­ture with an as­tro­nom­i­cally large pop­u­la­tion count where 99% of in­di­vi­d­u­als are very well off and 1% of in­di­vi­d­u­als suffer greatly con­sti­tutes an s-risk even though up­side-fo­cused views may eval­u­ate this fu­ture as very good and worth bring­ing about. Espe­cially when it comes to the pre­ven­tion of s-risks af­fect­ing fu­tures that oth­er­wise con­tain a lot of hap­piness, it mat­ters a great deal how the risk in ques­tion is be­ing pre­vented. For in­stance, if we en­vi­sion a fu­ture that is utopian in many re­spects ex­cept for a small por­tion of the pop­u­la­tion suffer­ing be­cause of prob­lem X, it is in the in­ter­est of vir­tu­ally all value sys­tems to solve prob­lem X in highly tar­geted ways that move prob­a­bil­ity mass to­wards even bet­ter fu­tures. By con­trast, only few value sys­tems (ones that are strongly or ex­clu­sively about re­duc­ing suffer­ing/​bad things) would con­sider it over­all good if prob­lem X was “solved” in a way that not only pre­vented the suffer­ing due to prob­lem X, but also pre­vented all the hap­piness from the fu­ture sce­nario this suffer­ing was em­bed­ded in. As I will ar­gue in the last sec­tion, moral un­cer­tainty and moral co­op­er­a­tion are strong rea­sons to solve such prob­lems in ways that most value sys­tems ap­prove of.

All of this is based on the as­sump­tion that bad fu­tures, i.e., fu­tures with se­vere s-risks or down­side risks, are rea­son­ably likely to hap­pen (and can tractably be ad­dressed). This seems to be the case, un­for­tu­nately: We find our­selves on a civ­i­liza­tional tra­jec­tory with rapidly grow­ing tech­nolog­i­cal ca­pa­bil­ities, and the ceilings from phys­i­cal limits still far away. It looks as though large-scale space coloniza­tion might be­come pos­si­ble some­day, ei­ther for hu­mans di­rectly, for some suc­ces­sor species, or for in­tel­li­gent ma­chines that we might cre­ate. Life gen­er­ally tends to spread and use up re­sources, and in­tel­li­gent life or in­tel­li­gence gen­er­ally does so even more de­liber­ately. As space coloniza­tion would so vastly in­crease the stakes at which we are play­ing, a failure to im­prove suffi­ciently alongside all the nec­es­sary di­men­sions – both morally and with re­gard to over­com­ing co­or­di­na­tion prob­lems or lack of fore­sight – could re­sult in fu­tures that, even though they may in many cases (differ­ent from the AS sce­nario above!) also con­tain as­tro­nom­i­cally many happy in­di­vi­d­u­als, would con­tain vast quan­tities of suffer­ing. We can en­vi­sion nu­mer­ous con­cep­tual path­ways that lead to as­tro­nom­i­cal amounts of suffer­ing (So­tala & Gloor, 2017), and while each sin­gle path­way may seem un­likely to be in­stan­ti­ated – as with most spe­cific pre­dic­tions about the long-term fu­ture – s-risks are dis­junc­tive, and peo­ple tend to un­der­es­ti­mate the prob­a­bil­ity of dis­junc­tive events (Dawes & Hastie, 2001). In par­tic­u­lar, our his­tor­i­cal track record con­tains all kinds of fac­tors that di­rectly cause or con­tribute to suffer­ing on large scales:

  • Dar­wi­nian com­pe­ti­tion (ex­em­plified by the suffer­ing of wild an­i­mals)

  • Co­or­di­na­tion prob­lems and eco­nomic com­pe­ti­tion (ex­em­plified by suffer­ing caused or ex­ac­er­bated by in­come in­equal­ity, both lo­cally and globally, and the difficulty of reach­ing a long-term sta­ble state where Malthu­sian com­pe­ti­tion is avoided)

  • Ha­tred of out­groups (ex­em­plified by the Holo­caust)

  • In­differ­ence (ex­em­plified by the suffer­ing of an­i­mals in fac­tory farms)

  • Con­flict (ex­em­plified by the suffer­ing of a pop­u­la­tion dur­ing/​af­ter con­quest or siege; some­times cou­pled with the promise that sur­ren­der will spare the tor­ture of civili­ans)

  • Sadism (ex­em­plified by cases of an­i­mal abuse, which may not make up a large source of cur­rent-day suffer­ing but could be­come a big­ger is­sue with cru­elty-en­abling tech­nolo­gies of the fu­ture)

And while one can make a case that there has been a trend for things to be­come bet­ter (see Pinker, 2011), this does not hold in all do­mains (e.g. not with re­gard to the num­ber of an­i­mals di­rectly harmed in the food in­dus­try) and we may, be­cause of filter bub­bles for in­stance, un­der­es­ti­mate how bad things still are even in com­par­a­tively well-off coun­tries such as the US. Fur­ther­more, it is easy to over­es­ti­mate the trend for things to have got­ten bet­ter given that the un­der­ly­ing mechanisms re­spon­si­ble for catas­trophic events such as con­flict or nat­u­ral dis­asters may fol­low a power law dis­tri­bu­tion where the vast ma­jor­ity of vi­o­lent deaths, dis­eases or famines re­sult from a com­par­a­tively small num­ber of par­tic­u­larly dev­as­tat­ing in­ci­dents. Power law dis­tri­bu­tions con­sti­tute a plau­si­ble (though ten­ta­tive) can­di­date for mod­el­ling the like­li­hood and sever­ity of suffer­ing risks. If this model is cor­rect, then ob­ser­va­tions such as that the world did not erupt in the vi­o­lence of a third world war, or that no dystopian world gov­ern­ment has been formed as of late, can­not count as very re­as­sur­ing, be­cause power law dis­tri­bu­tions be­come hard to as­sess pre­cisely to­wards the tail-end of the spec­trum where the stakes be­come al­to­gether high­est (New­man, 2006).

In or­der to now illus­trate the differ­ence be­tween down­side- and up­side-fo­cused views, I drew two graphs. To keep things sim­ple, I will limit the ex­am­ple sce­nar­ios to cases that ei­ther un­con­tro­ver­sially con­tain more suffer­ing than hap­piness, or only con­tain hap­piness. The BAU sce­nario from above will serve as refer­ence point. I’m de­scribing it again as re­minder be­low, alongside the other sce­nar­ios I will use in the illus­tra­tion (note that all sce­nar­ios are stipu­lated to last for equally long):

Busi­ness as usual (BAU)

Earth re­mains the only planet in the ob­serv­able uni­verse (as far as we know) where there is suffer­ing, and things con­tinue as they are. Some peo­ple re­main in ex­treme poverty, many peo­ple suffer from dis­ease or men­tal ill­ness, and our psy­cholog­i­cal makeup limits the amount of time we can re­main con­tent with things even if our lives are com­par­a­tively for­tu­nate. Fac­tory farms stay open, and most wild an­i­mals die be­fore they reach their re­pro­duc­tive age.

Astro­nom­i­cal suffer­ing (AS)

A sce­nario where space coloniza­tion re­sults in an out­come where as­tro­nom­i­cally many sen­tient minds ex­ist in con­di­tions that are eval­u­ated as bad by all plau­si­ble means of eval­u­a­tion. To make the sce­nario more con­crete, let us stipu­late that 90% of be­ings in this vast pop­u­la­tion have lives filled with medium-in­ten­sity suffer­ing, and 10% of the pop­u­la­tion suffers in strong or un­bear­able in­ten­sity. There is lit­tle or no hap­piness in this sce­nario.

Par­adise (small or as­tro­nom­i­cal; SP/​AP)

Things go as well as pos­si­ble, suffer­ing is abol­ished, all sen­tient be­ings are always happy and even ex­pe­rience heights of well-be­ing that are un­achiev­able with pre­sent-day tech­nol­ogy. We fur­ther dis­t­in­guish small par­adise (SP) from as­tro­nom­i­cal par­adise (AP): while the former stays earth-bound, the lat­ter spans across (max­i­mally) many galax­ies, op­ti­mized to turn available re­sources into flour­ish­ing lives and all the things peo­ple value.

Here is how I en­vi­sion typ­i­cal suffer­ing-fo­cused and up­side-fo­cused views rank­ing the above sce­nar­ios from “com­par­a­tively bad” on the left to “com­par­a­tively good” on the right:

The two graphs rep­re­sent the rel­a­tive value we can ex­pect from the classes of fu­ture sce­nar­ios I de­scribed above. The left­most point of a graph rep­re­sents not the worst pos­si­ble out­come, but the worst out­come amongst the fu­ture sce­nar­ios we are con­sid­er­ing. The im­por­tant thing is not whether a given sce­nario is more to­wards the right or left side of the graph, but rather how large the dis­tance is be­tween sce­nar­ios. The yel­low stretch sig­nifies the high­est im­por­tance or scope, and in­ter­ven­tions that move prob­a­bil­ity mass across that stretch are ei­ther ex­cep­tion­ally good or ex­cep­tion­ally bad, de­pend­ing on the di­rec­tion of the move­ment. (Of course, in prac­tice in­ter­ven­tions can also have com­plex effects that af­fect mul­ti­ple vari­ables at once.)

Note that the BAU sce­nario was cho­sen mostly for illus­tra­tion, as it seems pretty un­likely that hu­mans would con­tinue to ex­ist in the cur­rent state for ex­tremely long times­pans. Similarly, I should qual­ify that the SP sce­nario may be un­likely to ever hap­pen in prac­tice be­cause it seems rather un­sta­ble: Keep­ing value drift and Dar­wi­nian dy­nam­ics un­der con­trol and pre­serv­ing a small utopia for mil­lions of years or be­yond may re­quire tech­nol­ogy that is so ad­vanced that one may as well make the utopia as­tro­nom­i­cally large – un­less there are over­rid­ing rea­sons for fa­vor­ing the smaller utopia. From any strongly or ex­clu­sively down­side-fo­cused per­spec­tive, the smaller utopia may in­deed – fac­tor­ing out con­cerns about co­op­er­a­tion – be prefer­able, be­cause go­ing from SP to AP comes with some risks.[8] How­ever, for the pur­poses of the first graph above, I was stipu­lat­ing that AP is com­pletely flawless and risk-free.

A “near par­adise” or “flawed par­adise” mostly filled with happy lives but also with, say, 1% of lives in con­stant mis­ery, would for up­side-fo­cused views rank some­where close to AP on the far right end of the first graph. By con­trast, for down­side-fo­cused views on the sec­ond graph, “flawed par­adise” would stand more or less in the same po­si­tion as BAU in case the view in ques­tion is weakly down­side-fo­cused, and de­cid­edly more on the way to­wards AS on the left in case the view in ques­tion is strongly or ex­clu­sively down­side-fo­cused. Weakly down­side-fo­cused views would also have a rel­a­tively large gap be­tween SP and AP, re­flect­ing that cre­at­ing ad­di­tional happy be­ings is re­garded as morally quite im­por­tant but not suffi­ciently im­por­tant to be­come the top pri­or­ity. A view would still count as suffer­ing-fo­cused (at least within the re­stricted con­text of our vi­su­al­iza­tion where all sce­nar­ios are ar­tifi­cially treated as hav­ing the same prob­a­bil­ity of oc­cur­rence) as long as the gap be­tween BAU and AS would re­main larger than the gap be­tween BAU/​SP and AP.

In prac­tice, we are well-ad­vised to hold very large un­cer­tainty over what the right way is to con­cep­tu­al­ize the like­li­hood and plau­si­bil­ity of such fu­ture sce­nar­ios. Given this un­cer­tainty, there can be cases where a nor­ma­tive view falls some­where in-be­tween up­side- and down­side-fo­cused in our sub­jec­tive clas­sifi­ca­tion. All these things are very hard to pre­dict and other peo­ple may be sub­stan­tially more or sub­stan­tially less op­ti­mistic with re­gard to the qual­ity of the fu­ture. My own es­ti­mate is that a more re­al­is­tic ver­sion of AP, one that is al­lowed to con­tain some suffer­ing but is char­ac­ter­ized by con­tain­ing near-max­i­mal quan­tities of hap­piness or things of pos­i­tive value, is ~40 times less likely to hap­pen[9] than the vast range of sce­nar­ios (of which AS is just one par­tic­u­larly bad ex­am­ple) where space coloniza­tion leads to out­comes with a lot less hap­piness. I think sce­nar­ios as bad as AS or worse are also very rare, as most sce­nar­ios that in­volve a lot of suffer­ing may also con­tain some is­lands of hap­piness (or even have a sea of hap­piness and some is­lands of suffer­ing). See also these posts on why the fu­ture is likely to be net good in ex­pec­ta­tion ac­cord­ing to views where cre­at­ing hap­piness is similarly im­por­tant as re­duc­ing suffer­ing.

In­ter­est­ingly, var­i­ous up­side-fo­cused views may differ nor­ma­tively with re­spect to how frag­ile (or not) their con­cept of pos­i­tive value is. If utopia is very frag­ile, but dystopia comes in vastly many forms (re­lated: the Anna Karen­ina prin­ci­ple), this would im­ply greater pes­simism re­gard­ing the value of the av­er­age sce­nario with space coloniza­tion, which could push such views closer to be­com­ing down­side-fo­cused. On the other hand, some (idiosyn­cratic) up­side-fo­cused views may sim­ply place an over­rid­ing weight on the on­go­ing ex­is­tence of con­scious life, largely in­de­pen­dent of how things will go in terms of he­do­nist welfare.[10] Similarly, nor­ma­tively up­side-fo­cused views that count cre­at­ing hap­piness as more im­por­tant than re­duc­ing suffer­ing (though pre­sum­ably very few peo­ple would hold such views) would always come out as up­side-fo­cused in prac­tice, too, even if we had rea­son to be highly pes­simistic about the fu­ture.

To sum­ma­rize, what the graphs above try to con­vey is that for the ex­am­ple sce­nar­ios listed, down­side-fo­cused views are char­ac­ter­ized by hav­ing the largest gap in rel­a­tive im­por­tance be­tween AS and the other sce­nar­ios. By con­trast, up­side-fo­cused views place by far the most weight on mak­ing sure AP hap­pens, and SP would (for many up­side-fo­cused views at least) not even count as all that good, com­par­a­tively.[11]

Ex­tinc­tion risk re­duc­tion: Un­likely to be pos­i­tive ac­cord­ing to down­side-fo­cused views

Some fu­tures, such as ones where most peo­ple’s qual­ity of life is hellish, are worse than ex­tinc­tion. Many peo­ple with up­side-fo­cused views would agree. So the differ­ence be­tween up­side- and down­side-fo­cused views is not about whether there can be net nega­tive fu­tures, but about how read­ily a fu­ture sce­nario is ranked as worth bring­ing about in the face of the suffer­ing it con­tains or the down­side risks that lie on the way from here to there.

If hu­mans went ex­tinct, this would greatly re­duce the prob­a­bil­ity of space coloniza­tion and any as­so­ci­ated risks (as well as benefits). Without space coloniza­tion, there are no s-risks “by ac­tion,” no risks from the cre­ation of as­tro­nom­i­cal suffer­ing where hu­man ac­tivity makes things worse than they would oth­er­wise be.[12] Per­haps there would re­main some s-risks “by omis­sion,” i.e. risks cor­re­spond­ing to a failure to pre­vent as­tro­nom­i­cal dis­value. But such risks ap­pear un­likely given the ap­par­ent empti­ness of the ob­serv­able uni­verse.[13] Be­cause s-risks by ac­tion over­all ap­pear to be more plau­si­ble than s-risks by omis­sion, and be­cause the lat­ter can only be tack­led in an (ar­guably un­likely) sce­nario where hu­man­ity ac­com­plishes the feat of in­stal­ling com­pas­sion­ate val­ues to ro­bustly con­trol the fu­ture, it ap­pears as though down­side-fo­cused al­tru­ists have more to lose from space coloniza­tion than they have to gain.

It is how­ever not ob­vi­ous whether this im­plies that efforts to re­duce the prob­a­bil­ity of hu­man ex­tinc­tion in­di­rectly in­crease suffer­ing risks or down­side risks more gen­er­ally. It very much de­pends on the way this is done and what other effects are. For in­stance, there is a large and of­ten un­der­ap­pre­ci­ated differ­ence be­tween ex­is­ten­tial risks from bio- or nu­clear tech­nol­ogy, and ex­is­ten­tial risks re­lated to smarter-than-hu­man ar­tifi­cial in­tel­li­gence (su­per­in­tel­li­gence; see the next sec­tion). While the former set back tech­nolog­i­cal progress, pos­si­bly per­ma­nently so, the lat­ter drives it all the way up, likely – though maybe not always – cul­mi­nat­ing in space coloniza­tion with the pur­pose of benefit­ing what­ever goal(s) the su­per­in­tel­li­gent AI sys­tems are equipped with (Omo­hun­dro 2008; Arm­strong & Sand­berg, 2013). Be­cause space coloniza­tion may come with some as­so­ci­ated suffer­ing in this way, this means that a failure to re­duce ex­is­ten­tial risks from AI is of­ten also a failure to pre­vent s-risks from AI mis­al­ign­ment. There­fore, the next sec­tion will ar­gue that re­duc­ing such AI-re­lated risks is valuable from both up­side- and down­side-fo­cused per­spec­tives. By con­trast, the situ­a­tion is much less ob­vi­ous for other ex­is­ten­tial risks, ones that are not about ar­tifi­cial su­per­in­tel­li­gence.

Some­times efforts to re­duce these other ex­is­ten­tial risks also benefits s-risk re­duc­tion. For in­stance, efforts to re­duce non-AI-re­lated ex­tinc­tion risks may in­crease global sta­bil­ity and make par­tic­u­larly bad fu­tures less likely in those cir­cum­stances where hu­man­ity nev­er­the­less goes on to colonize space. Efforts to re­duce ex­tinc­tion risks from e.g. biotech­nol­ogy or nu­clear war in prac­tice also re­duce the risk of global catas­tro­phes where a small num­ber of hu­mans sur­vive and where civ­i­liza­tion is likely to even­tu­ally re­cover tech­nolog­i­cally, but per­haps at the cost of a worse geopoli­ti­cal situ­a­tion or with worse val­ues, which could then lead to in­creases in s-risks go­ing into the long-term fu­ture. This miti­gat­ing effect on s-risk re­duc­tion through a more sta­ble fu­ture is sub­stan­tial and pos­i­tive ac­cord­ing to down­side-fo­cused value sys­tems, which has to be weighed against the effects of mak­ing s-risks from space coloniza­tion more likely.

In­ter­est­ingly, if we care about the to­tal num­ber of sen­tient minds (and their qual­ity of life) that can at some point be cre­ated, then be­cause of some known facts about cos­mol­ogy,[14] any effects that near-ex­tinc­tion catas­tro­phes have on de­lay­ing space coloniza­tion are largely neg­ligible in the long run when com­pared to af­fect­ing the qual­ity of a fu­ture with space coloniza­tion – at least un­less the de­lay be­comes very long in­deed (e.g. mil­lions of years or longer).

What this means is that in or­der to de­ter­mine how re­duc­ing the prob­a­bil­ity of ex­tinc­tion from things other than su­per­in­tel­li­gent AI in ex­pec­ta­tion af­fects down­side risks, we can ap­prox­i­mate the an­swer by weigh­ing the fol­low­ing two con­sid­er­a­tions against each other:

  1. How likely is it that the averted catas­tro­phes merely de­lay space coloniza­tion rather than pre­vent­ing it com­pletely?

  2. How much bet­ter or worse would a sec­ond ver­sion of a tech­nolog­i­cally ma­ture civ­i­liza­tion (af­ter a global catas­tro­phe thwarted the the first at­tempt) fare with re­spect to down­side risks?[15]

The sec­ond ques­tion in­volves judg­ing where our cur­rent tra­jec­tory falls, qual­ity-wise, when com­pared to the dis­tri­bu­tion of post-re­build­ing sce­nar­ios – how much bet­ter or worse is our tra­jec­tory than a ran­dom re­set­ted one? It also re­quires es­ti­mat­ing the effects of post-catas­tro­phe con­di­tions on AI de­vel­op­ment – e.g., would a longer time un­til tech­nolog­i­cal ma­tu­rity (per­haps due to a lack of fos­sil fuels) cause a more uniform dis­tri­bu­tion of power, and what does that im­ply about the prob­a­bil­ity of arms races? It seems difficult to ac­count for all of these con­sid­er­a­tions prop­erly. It strikes me as more likely than not that things would be worse af­ter re­cov­ery, but be­cause there are so many things to con­sider,[16] I do not feel very con­fi­dent about this as­sess­ment.

This leaves us with the ques­tion of how likely a global catas­tro­phe is to merely de­lay space coloniza­tion rather than pre­vent­ing it. I have not thought about this in much de­tail, but af­ter hav­ing talked to some peo­ple (es­pe­cially at FHI) who have in­ves­ti­gated it, I up­dated that re­build­ing af­ter a catas­tro­phe seems quite likely. And while a civ­i­liza­tional col­lapse would set a prece­dent and rea­son to worry the sec­ond time around when civ­i­liza­tion reaches tech­nolog­i­cal ma­tu­rity again, it would take an un­likely con­stel­la­tion of col­lapse fac­tors to get stuck in a loop of re­cur­rent col­lapse, rather than at some point es­cap­ing the set­backs and reach­ing a sta­ble plateau (Bostrom, 2009), e.g. through space coloniza­tion. I would there­fore say that large-scale catas­tro­phes re­lated to biorisk or nu­clear war are quite likely (~80–90%) to merely de­lay space coloniza­tion in ex­pec­ta­tion.[17] (With more un­cer­tainty be­ing not on the like­li­hood of re­cov­ery, but on whether some out­lier-type catas­tro­phes might di­rectly lead to ex­tinc­tion.)

This would still mean that the suc­cess­ful pre­ven­tion of all biorisk and risks from nu­clear war makes space coloniza­tion 10-20% more likely. Com­par­ing this es­ti­mate to the pre­vi­ous, un­cer­tain es­ti­mate about the s-risks pro­file of a civ­i­liza­tion af­ter re­cov­ery, it ten­ta­tively seems to me that the effect of mak­ing cos­mic stakes (and there­fore down­side risks) more likely is not suffi­ciently bal­anced by pos­i­tive effects[18] on sta­bil­ity, arms race pre­ven­tion and civ­i­liza­tional val­ues (fac­tors which would make down­side risks less likely). How­ever, this is hard to as­sess and may change de­pend­ing on novel in­sights.

What looks slightly clearer to me is that mak­ing re­build­ing af­ter a civ­i­liza­tional col­lapse more likely comes with in­creased down­side risks. If this was the sole effect of an in­ter­ven­tion, I would es­ti­mate it as over­all nega­tive for down­side-fo­cused views (fac­tor­ing out con­sid­er­a­tions of moral un­cer­tainty or co­op­er­a­tion with other value sys­tems) – be­cause not only would it make it more likely that space will even­tu­ally be colonized, but it would also do so in a situ­a­tion where s-risks might be higher than in the cur­rent tra­jec­tory we are on.[19]

How­ever, in prac­tice it seems as though any in­ter­ven­tion that makes re­cov­ery af­ter a col­lapse more likely would also have many other effects, some of which might more plau­si­bly be pos­i­tive ac­cord­ing to down­side-fo­cused ethics. For in­stance, an in­ter­ven­tion such as de­vel­op­ing al­ter­nate foods might merely speed up re­build­ing af­ter civ­i­liza­tional col­lapse rather than mak­ing it al­to­gether more likely, and so would merely af­fect whether re­build­ing hap­pens from a low base or a high base. One could ar­gue that re­build­ing from a higher base is less risky also from a down­side-fo­cused per­spec­tive, which makes things more com­pli­cated to as­sess. In any case, what seems clear is that none of these in­ter­ven­tions look promis­ing for the pre­ven­tion of down­side risks.

We have seen that efforts to re­duce ex­tinc­tion risks (ex­cep­tion: AI al­ign­ment) are un­promis­ing in­ter­ven­tions for down­side-fo­cused value sys­tems, and some of the in­ter­ven­tions available in that space (es­pe­cially if they do not si­mul­ta­neously also im­prove the qual­ity of the fu­ture) may even be nega­tive when eval­u­ated purely from this per­spec­tive. This is a coun­ter­in­tu­itive con­clu­sion, maybe so much so that many peo­ple would rather choose to adopt moral po­si­tions where it does not fol­low. In this con­text, it is im­por­tant to point out that valu­ing hu­man­ity not go­ing ex­tinct is definitely com­pat­i­ble with a high de­gree of pri­or­ity for re­duc­ing suffer­ing or dis­value. I view moral­ity as in­clud­ing both con­sid­er­a­tions about du­ties to­wards other peo­ple (in­spired by so­cial con­tract the­o­ries or game the­o­retic re­ciproc­ity) as well as con­sid­er­a­tions of (un­con­di­tional) care or al­tru­ism. If both types of moral con­sid­er­a­tions are to be weighted similarly, then while the “care” di­men­sion could e.g. be down­side-fo­cused, the other di­men­sion, which is con­cerned with re­spect­ing and co­op­er­at­ing with other peo­ple’s life goals, would not be – at least not un­der the as­sump­tion that the fu­ture will be good enough that peo­ple want it to go on – and would cer­tainly not wel­come ex­tinc­tion.

Another way to bring to­gether both down­side-fo­cused con­cerns and a con­cern for hu­man­ity not go­ing ex­tinct would be through a moral­ity that eval­u­ates states of af­fairs holis­ti­cally, as op­posed to us­ing an ad­di­tive com­bi­na­tion for in­di­vi­d­ual welfare and a global eval­u­a­tion of ex­tinc­tion ver­sus no ex­tinc­tion. Un­der such a model, one would have a bounded value func­tion for the state of the world as a whole, so that a long his­tory with great heights of dis­cov­ery or con­ti­nu­ity could im­prove the eval­u­a­tion of the whole his­tory, as would prop­er­ties like highly fa­vor­able den­si­ties of good things ver­sus bad things.

Al­to­gether, be­cause more peo­ple seem to come to hold up­side-fo­cused or at least strongly ex­tinc­tion-averse val­ues af­ter grap­pling with the ar­gu­ments in pop­u­la­tion ethics, re­duc­ing ex­tinc­tion risk can be part of a fair com­pro­mise even though it is an un­promis­ing and pos­si­bly nega­tive in­ter­ven­tion from a down­side-fo­cused per­spec­tive. After all, the re­duc­tion of ex­tinc­tion risks is par­tic­u­larly im­por­tant from both an up­side-fo­cused per­spec­tive and from the per­spec­tive of (many) peo­ple’s self-, fam­ily- or com­mu­nity-ori­ented moral in­tu­itions – be­cause of the short-term death risks it in­volves.[20] Be­cause it is difficult to iden­tify in­ter­ven­tions that are ro­bustly pos­i­tive and highly im­pact­ful ac­cord­ing to down­side-fo­cused value sys­tems (as the length of this post and the un­cer­tain con­clu­sions in­di­cate), it is how­ever not a triv­ial is­sue that many com­monly recom­mended in­ter­ven­tions are un­likely to be pos­i­tive ac­cord­ing to these value sys­tems. To the ex­tent that down­side-fo­cused value sys­tems are re­garded as a plau­si­ble and fre­quently ar­rived at class of views, con­sid­er­a­tions from moral un­cer­tainty and moral co­op­er­a­tion (see the last sec­tion) recom­mend some de­gree of offset­ting ex­pected harms through tar­geted efforts to re­duce s-risks, e.g. in the space of AI risk (next sec­tion). Analo­gously, down­side-fo­cused al­tru­ists should not in­crease ex­tinc­tion risks and in­stead fo­cus on more co­op­er­a­tive ways to re­duce fu­ture dis­value.

AI al­ign­ment: (Prob­a­bly) pos­i­tive for down­side-fo­cused views; high variance

Smarter-than-hu­man ar­tifi­cial in­tel­li­gence will likely be par­tic­u­larly im­por­tant for how the long-term fu­ture plays out. There is a good chance that the goals of su­per­in­tel­li­gent AI would be much more sta­ble than the val­ues of in­di­vi­d­ual hu­mans or those en­shrined in any con­sti­tu­tion or char­ter, and su­per­in­tel­li­gent AIs would – at least with con­sid­er­able like­li­hood – re­main in con­trol of the fu­ture not only for cen­turies, but for mil­lions or even billions of years to come. In this sec­tion, I will sketch some cru­cial con­sid­er­a­tions for how work in AI al­ign­ment is to be eval­u­ated from a down­side-fo­cused per­spec­tive.

First, let’s con­sider a sce­nario with un­al­igned su­per­in­tel­li­gent AI sys­tems, where the fu­ture is shaped ac­cord­ing to goals that have noth­ing to do with what hu­mans value. Be­cause re­source ac­cu­mu­la­tion is in­stru­men­tally use­ful to most con­se­quen­tial­ist goals, it is likely to be pur­sued by a su­per­in­tel­li­gent AI no mat­ter its pre­cise goals. Taken to its con­clu­sion, the ac­qui­si­tion of ever more re­sources cul­mi­nates in space coloniza­tion where ac­cessible raw ma­te­rial is used to power and con­struct su­per­com­put­ers and other struc­tures that could help in the pur­suit of a con­se­quen­tial­ist goal. Even though ran­dom or “ac­ci­den­tal” goals are un­likely to in­trin­si­cally value the cre­ation of sen­tient minds, they may lead to the in­stan­ti­a­tion of sen­tient minds for in­stru­men­tal rea­sons. In the ab­sence of ex­plicit con­cern for suffer­ing re­flected in the goals of a su­per­in­tel­li­gent AI sys­tem, that sys­tem would in­stan­ti­ate suffer­ing minds for even the slight­est benefit to its ob­jec­tives. Suffer­ing may be re­lated to pow­er­ful ways of learn­ing (Daswani & Leike, 2015), and an AI in­differ­ent to suffer­ing might build vast quan­tities of sen­tient sub­rou­tines, such as robot over­seers, robot sci­en­tists or sub­agents in­side larger AI con­trol struc­tures. Another dan­ger is that, ei­ther dur­ing the strug­gle over con­trol over the fu­ture in a mul­ti­po­lar AI take­off sce­nario, or per­haps in the dis­tant fu­ture should su­per­in­tel­li­gent AIs ever en­counter other civ­i­liza­tions, con­flict or ex­tor­tion could re­sult in large amounts of dis­value. Fi­nally, su­per­in­tel­li­gent AI sys­tems might cre­ate vastly many sen­tient minds, in­clud­ing very many suffer­ing ones, by run­ning simu­la­tions of evolu­tion­ary his­tory for re­search pur­poses (“mind­crime;” Bostrom, 2014, pp. 125-26). (Or for other pur­poses; if hu­mans had the power to run al­ter­na­tive his­to­ries in large and fine-grained simu­la­tions, prob­a­bly we could think of all kinds of rea­sons for do­ing it.) Whether such his­tory simu­la­tions would be fine-grained enough to con­tain sen­tient minds, or whether simu­la­tions on a digi­tal medium can even qual­ify as sen­tient, are difficult and con­tro­ver­sial ques­tions. It should be noted how­ever that the stakes are high enough such that even com­par­a­tively small cre­dences such as 5% or lower would already go a long way in terms of the im­plied ex­pected value for the over­all sever­ity of s-risks from ar­tifi­cial sen­tience (see also foot­note 7).

While the ear­liest dis­cus­sions about the risks from ar­tifi­cial su­per­in­tel­li­gence have fo­cused pri­mar­ily on sce­nar­ios where a sin­gle goal and con­trol struc­ture de­cides the fu­ture (sin­gle­ton), we should also re­main open for sce­nar­ios that do not fit this con­cep­tu­al­iza­tion com­pletely. Per­haps what hap­pens in­stead could be sev­eral goals ei­ther com­pet­ing or act­ing in con­cert with each other, like an alien econ­omy that drifted fur­ther and fur­ther away from origi­nally hav­ing served the goals of its hu­man cre­ators.[21] Alter­na­tively, per­haps goal preser­va­tion be­comes more difficult the more ca­pa­ble AI sys­tems be­come, in which case the fu­ture might be con­trol­led by un­sta­ble goal func­tions tak­ing turns over the steer­ing wheel (see “dae­mons all the way down”). Th­ese sce­nar­ios where no proper sin­gle­ton emerges may per­haps be es­pe­cially likely to con­tain large num­bers of sen­tient sub­rou­tines. This is be­cause nav­i­gat­ing a land­scape with other highly in­tel­li­gent agents re­quires the abil­ity to con­tin­u­ously model other ac­tors and to re­act to chang­ing cir­cum­stances un­der time pres­sure – all of which are things that are plau­si­bly rele­vant for the de­vel­op­ment of sen­tience.

In any case, we can­not ex­pect with con­fi­dence that a fu­ture con­trol­led by non-com­pas­sion­ate goals will be a fu­ture that nei­ther con­tains hap­piness nor suffer­ing. In ex­pec­ta­tion, such fu­tures are in­stead likely to con­tain vast amounts of both hap­piness and suffer­ing, sim­ply be­cause these fu­tures would con­tain as­tro­nom­i­cal amounts of goal-di­rected ac­tivity in gen­eral.

Suc­cess­ful AI al­ign­ment could pre­vent most of the suffer­ing that would hap­pen in an AI-con­trol­led fu­ture, as a su­per­in­tel­li­gence with com­pas­sion­ate goals would be will­ing to make trade­offs that sub­stan­tially re­duce the amount of suffer­ing con­tained in any of its in­stru­men­tally use­ful com­pu­ta­tions. While a “com­pas­sion­ate” AI (com­pas­sion­ate in the sense that its goal in­cludes con­cern for suffer­ing, though not nec­es­sar­ily in the sense of ex­pe­rienc­ing emo­tions we as­so­ci­ate with com­pas­sion) might still pur­sue his­tory simu­la­tions or make use of po­ten­tially sen­tient sub­rou­tines, it would be much more con­ser­va­tive when it comes to risks of cre­at­ing suffer­ing on large scales. This means that it would e.g. con­tem­plate us­ing fewer or slightly less fine-grained simu­la­tions, slightly less effi­cient robot ar­chi­tec­tures (and ones that are par­tic­u­larly happy most of the time), and so on. This line of rea­son­ing sug­gests that AI al­ign­ment might be highly pos­i­tive ac­cord­ing to down­side-fo­cused value sys­tems be­cause it averts s-risks re­lated to in­stru­men­tally use­ful com­pu­ta­tions.

How­ever, work in AI al­ign­ment not only makes it more likely that fully al­igned AI is cre­ated and ev­ery­thing goes perfectly well, but it also af­fects the dis­tri­bu­tion of al­ign­ment failure modes. In par­tic­u­lar, progress in AI al­ign­ment could make it more likely that failure modes shift from “very far away from perfect in con­cep­tual space” to “close but slightly off the tar­get.” There are some rea­sons why such near misses might some­times end par­tic­u­larly badly.

What could loosely be clas­sified as a near miss is that cer­tain work in AI al­ign­ment makes it more likely that AIs would share whichever val­ues their cre­ators want to in­stall, but the cre­ators could be un­eth­i­cal or (meta-)philo­soph­i­cally and strate­gi­cally in­com­pe­tent.

For in­stance, if those in power of the fu­ture came to fol­low some kind of ide­ol­ogy that is un­com­pas­sion­ate or even hate­ful of cer­tain out-groups, or fa­vor a dis­torted ver­sion of liber­tar­i­anism where ev­ery per­son, in­clud­ing a few sadists, would be granted an as­tro­nom­i­cal quan­tity of fu­ture re­sources to use it at their dis­posal, the re­sult­ing fu­ture could be a bad one ac­cord­ing to down­side-fo­cused ethics.

A re­lated and per­haps more plau­si­ble dan­ger is that we might pre­ma­turely lock in a defi­ni­tion of suffer­ing and hap­piness into an AI’s goals that ne­glects sources of suffer­ing we would come to care about af­ter deeper re­flec­tion, such as not car­ing about the mind states of in­sect-like digi­tal minds (which may or may not be rea­son­able). A su­per­in­tel­li­gence with a ran­dom goal would also be in­differ­ent with re­gard to these sources of suffer­ing, but be­cause hu­mans value the cre­ation of sen­tience, or at least value pro­cesses re­lated to agency (which tend to cor­re­late with sen­tience), the like­li­hood is greater that a su­per­in­tel­li­gence with al­igned val­ues would cre­ate un­no­ticed or un­cared for sources of suffer­ing. Pos­si­ble such sources in­clude the suffer­ing of non-hu­man an­i­mals in na­ture simu­la­tions performed for aes­thetic rea­sons, or char­ac­ters in so­phis­ti­cated vir­tual re­al­ity games.

A fur­ther dan­ger is that, if our strate­gic or tech­ni­cal un­der­stand­ing is too poor, we might fail to spec­ify a recipe for get­ting hu­man val­ues right and end up with per­verse in­stan­ti­a­tion (Bostrom, 2014) or a failure mode where the re­ward func­tion ends up flawed. This could hap­pen e.g. to cases where an AI sys­tem starts to act in un­pre­dictable but op­ti­mized ways due to con­duct­ing searches far out­side its train­ing dis­tri­bu­tion.[22] Prob­a­bly most mis­takes at that stage would re­sult in about as much suffer­ing as in the typ­i­cal sce­nario where AI is un­al­igned and has (for all prac­ti­cal pur­poses) ran­dom goals. How­ever, one pos­si­bil­ity is that al­ign­ment failures sur­round­ing utopia-di­rected goals have a higher chance of lead­ing to dystopia than al­ign­ment failures around ran­dom goals. For in­stance, a failure to fully un­der­stand the goal ‘make max­i­mally many happy minds’ could lead to a dystopia where max­i­mally many minds are cre­ated in con­di­tions that do not re­li­ably pro­duce hap­piness, and may even lead to suffer­ing in some of the in­stances, or some of the time. This is an area for fu­ture re­search.

A fi­nal pos­si­ble out­come in the theme of “al­most get­ting ev­ery­thing right” is if one where we are able to suc­cess­fully in­stall hu­man val­ues into an AI, only to have the re­sult­ing AI com­pete with other, un­al­igned AIs for con­trol of the fu­ture and be threat­ened with things that are bad ac­cord­ing to hu­man val­ues, in the ex­pec­ta­tion that the hu­man-al­igned AI would then forfeit its re­sources and give up in the com­pe­ti­tion over con­trol­ling the fu­ture.

Try­ing to sum­ma­rize the above con­sid­er­a­tions, I drew a (sketchy) map with some ma­jor cat­e­gories of s-risks re­lated to space coloniza­tion. It high­lights that ar­tifi­cial in­tel­li­gence can be re­garded as a cause or cure for s-risks (So­tala & Gloor, 2017). That is, if su­per­in­tel­li­gent AI is suc­cess­fully al­igned, s-risks stem­ming from in­differ­ence to suffer­ing are pre­vented and a max­i­mally valuable fu­ture is in­stan­ti­ated (green). How­ever, the dan­ger of near misses (red) makes it non-ob­vi­ous whether efforts in AI al­ign­ment re­duce down­side risks over­all, as the worst near misses may e.g. con­tain more suffer­ing than the av­er­age s-risk sce­nario.

Note that no one should quote the above map out of con­text and call it “The likely fu­ture” or some­thing like that, be­cause some of the sce­nar­ios I listed may be highly im­prob­a­ble and be­cause the whole map is drawn with a fo­cus on things that could go wrong. If we wanted a map that also tracked out­comes with as­tro­nom­i­cal amounts of hap­piness, there would in ad­di­tion be many nodes for things like “happy sub­rou­tines,” “mind­crime-op­po­site,” “su­per­hap­piness-en­abling tech­nolo­gies,” or “un­al­igned AI trades with al­igned AI and does good things af­ter all.” There can be fu­tures in which sev­eral s-risk sce­nar­ios come to pass at the same time, as well as fu­tures that con­tain s-risk sce­nar­ios but also a lot of hap­piness (this seems pretty likely).

To elab­o­rate more on the cat­e­gories in the map above: Pre-AGI civ­i­liza­tion (blue) is the stage we are at now. Grey boxes re­fer to var­i­ous steps or con­di­tions that could be met, from which s-risks (or­ange and red), ex­tinc­tion (yel­low) or utopia (green) may fol­low. The map is crude and not ex­haus­tive. For in­stance “No AI Sin­gle­ton” is a some­what un­nat­u­ral cat­e­gory into which I threw both sce­nar­ios where AI sys­tems play a cru­cial role and sce­nar­ios where they do not. That is, the cat­e­gory con­tains fu­tures where space coloniza­tion is or­ches­trated by hu­mans or some biolog­i­cal suc­ces­sor species with­out AI sys­tems that are smarter than hu­mans, fu­tures where AI sys­tems are used as tools or or­a­cles for as­sis­tance, and fu­tures where hu­mans are out of the loop but no proper sin­gle­ton emerges in the com­pe­ti­tion be­tween differ­ent AI sys­tems.

Red boxes are s-risks that may be in­ter­twined with efforts in AI al­ign­ment (though not by log­i­cal ne­ces­sity): If one is care­less, work in AI al­ign­ment may ex­ac­er­bate these s-risks rather than alle­vi­ate them. While dystopia from ex­tor­tion would never be the re­sult of the ac­tivi­ties of an al­igned AI, it takes an AI with al­igned val­ues, e.g. alongside the un­al­igned AI in a mul­ti­po­lar sce­nario or alien AI en­coun­tered dur­ing space coloniza­tion, to even pro­voke such a threat (hence the dot­ted line link­ing this out­come to “value load­ing suc­cess”). I coined the term “al­igned-ish AI” to re­fer to the class of out­comes that efforts in AI al­ign­ment shifts prob­a­bil­ity mass to. This class in­cludes both very good out­comes (in­ten­tional) and neu­tral or very bad out­comes (ac­ci­den­tal). Flawed re­al­iza­tion – which stands for fu­tures where flaws in al­ign­ment pre­vent most of the value or even cre­ate dis­value – is split into two sub­cat­e­gories in or­der to high­light that the vast ma­jor­ity of such out­comes likely con­tains no more suffer­ing than the typ­i­cal out­come with un­al­igned AI, but that things go­ing wrong in a par­tic­u­larly un­for­tu­nate way could re­sult in ex­cep­tion­ally bad fu­tures. For views that care similarly strongly about achiev­ing utopia than pre­vent­ing very bad fu­tures, this trade­off seems most likely net pos­i­tive, whereas from a down­side-fo­cused per­spec­tive, this con­sid­er­a­tion makes it less clear whether efforts in AI al­ign­ment are over­all worth the risks.

For­tu­nately, not all work in AI al­ign­ment faces the same trade­offs. Many ap­proaches may be di­rected speci­fi­cally against avoid­ing cer­tain failure modes, which is ex­tremely pos­i­tive and im­pact­ful for down­side-fo­cused per­spec­tives. Worst-case AI safety is the idea that down­side-fo­cused value sys­tems recom­mend push­ing differ­en­tially the ap­proaches that ap­pear safest with re­spect to par­tic­u­larly bad failure modes. Given that many ap­proaches to­wards AI al­ign­ment are still at a very early stage, it may be hard to tell which com­po­nents to AI al­ign­ment are likely to benefit down­side-fo­cused per­spec­tives the most. Nev­er­the­less, I think we can already make some in­formed guesses, and our un­der­stand­ing will im­prove with time.

For in­stance, ap­proaches that make AI sys­tems cor­rigible (see here and here) would ex­tend the win­dow of time dur­ing which we can spot flaws and pre­vent out­comes with flawed re­al­iza­tion. Similarly, ap­proval-di­rected ap­proaches to AI al­ign­ment, where al­ign­ment is achieved by simu­lat­ing what a hu­man over­seer would de­cide if they were to think about the situ­a­tion for a very long time, would go fur­ther to­wards avoid­ing bad de­ci­sions than ap­proaches with im­me­di­ate, un­am­plified feed­back from hu­man over­seers. And rather than try­ing to solve AI al­ign­ment in one swoop, a promis­ing and par­tic­u­larly “s-risk-proof” strat­egy might be to first build a low-im­pact AI sys­tems that in­creases global sta­bil­ity and pre­vent arms races with­out ac­tu­ally rep­re­sent­ing fully speci­fied hu­man val­ues. This would give ev­ery­one more time to think about how to pro­ceed and avoid failure modes where hu­man val­ues are (par­tially) in­verted.

In gen­eral, es­pe­cially from a down­side-fo­cused per­spec­tive, it strikes me as very im­por­tant that early and pos­si­bly flawed or in­com­plete AI de­signs should not yet at­tempt to fully spec­ify hu­man val­ues. Eliezer Yud­kowsky re­cently ex­pressed the same point in this Ar­bital post on the worst failure modes in AI al­ign­ment.

Fi­nally, what could also be highly effec­tive for re­duc­ing down­side risks, as well as be­ing im­por­tant for many other rea­sons, is some of the foun­da­tional work in bar­gain­ing and de­ci­sion the­ory for AI sys­tems, done at e.g. the Ma­chine In­tel­li­gence Re­search In­sti­tute, which could help us un­der­stand how to build AI sys­tems that re­li­ably steer things to­wards out­comes that are always pos­i­tive-sum.

I have a gen­eral in­tu­ition that, at least as long as the AI safety com­mu­nity does not face a strong pres­sure from (per­ceived) short timelines where the differ­ences be­tween down­side-fo­cused and up­side-fo­cused views may be­come more pro­nounced, there is likely to be a lot of over­lap in terms of the most promis­ing ap­proaches fo­cused on achiev­ing the high­est prob­a­bil­ity of suc­cess (utopia cre­ation) and ap­proaches that are par­tic­u­larly ro­bust against failing in the most re­gret­ful ways (dystopia pre­ven­tion). Heuris­tics like ‘Make AI sys­tems cor­rigible,’ ‘Buy more time to think,’ or ‘If there is time, figure out some foun­da­tional is­sues to spot unan­ti­ci­pated failure modes’ all seem as though they would more likely be use­ful from both per­spec­tives, es­pe­cially when all good guidelines are fol­lowed with­out ex­cep­tion. I also ex­pect that rea­son­ably many peo­ple work­ing in AI al­ign­ment will grav­i­tate to­wards ap­proaches that are ro­bust in all these re­spects, be­cause mak­ing your ap­proach multi-lay­ered and foolproof sim­ply is a smart strat­egy when the prob­lem in ques­tion is un­fa­mil­iar and highly com­plex. Fur­ther­more, I an­ti­ci­pate that more peo­ple will come to think more ex­plic­itly about the trade­offs be­tween the down­side risks from near misses and utopian fu­tures, and some of them might put de­liber­ate efforts into find­ing AI al­ign­ment meth­ods or al­ign­ment com­po­nents that fail grace­fully and thereby make down­side risks less likely (worst-case AI safety), ei­ther be­cause of in­trin­sic con­cern or for rea­sons of co­op­er­a­tion with down­side-fo­cused al­tru­ists.[23] All of these things make me op­ti­mistic about AI al­ign­ment as a cause area be­ing roughly neu­tral or slightly pos­i­tive when done with lit­tle fo­cus on down­side-fo­cused con­sid­er­a­tions, and strongly pos­i­tive when pur­sued with with strong con­cern for avoid­ing par­tic­u­larly bad out­comes.

I also want to men­tion that I think the en­tire field of AI policy and strat­egy strikes me as par­tic­u­larly pos­i­tive for down­side-fo­cused value sys­tems. Mak­ing sure that AI de­vel­op­ment is done care­fully and co­op­er­a­tively, with­out the threat of arms races lead­ing to ill-con­sid­ered, rushed ap­proaches, seems like it would be ex­cep­tion­ally pos­i­tive from all per­spec­tives, and so I recom­mend that peo­ple who fulfill the re­quire­ments for such work should pri­ori­tize it very highly.

Mo­ral un­cer­tainty and cooperation

Pop­u­la­tion ethics, which is the area in philos­o­phy most rele­vant for de­cid­ing be­tween up­side- and down­side-fo­cused po­si­tions, is a no­to­ri­ously con­tested topic. Many peo­ple who have thought about it a great deal be­lieve that the ap­pro­pri­ate epistemic state with re­gard to a solu­tion to pop­u­la­tion ethics is one of sub­stan­tial moral un­cer­tainty or of valu­ing fur­ther re­flec­tion on the topic. Let us sup­pose there­fore that, rather than be­ing con­vinced that some form of suffer­ing-fo­cused ethics or down­side-fo­cused moral­ity is the stance we want to take, we con­sider it a plau­si­ble stance we very well might want to take, alongside other po­si­tions that re­main in con­tention.

Analo­gous to situ­a­tions with high em­piri­cal un­cer­tainty, there are two steps to con­sider for de­cid­ing un­der moral un­cer­tainty:

  • (1) Es­ti­mate the value of in­for­ma­tion from at­tempts to re­duce un­cer­tainty, and the time costs of such attempts

and com­pare that with

  • (2) the un­cer­tainty-ad­justed value from pur­su­ing those in­ter­ven­tions that are best from our cur­rent epistemic perspective

With re­gard to (1), we can re­duce our moral un­cer­tainty on two fronts. The ob­vi­ous one is pop­u­la­tion ethics: We can learn more about the ar­gu­ments for differ­ent po­si­tions, come up with new ar­gu­ments and po­si­tions, and as­sess them crit­i­cally. The sec­ond front con­cerns meta-level ques­tions about the na­ture of ethics it­self, what our un­cer­tainty is ex­actly about, and in which ways more re­flec­tion or a so­phis­ti­cated re­flec­tion pro­ce­dure with the help of fu­ture tech­nol­ogy would change our think­ing. While some peo­ple be­lieve that it is fu­tile to even try reach­ing con­fi­dent con­clu­sions in the epistemic po­si­tion we are in cur­rently, one could also ar­rive at a view where we sim­ply have to get started at some point, or else we risk get­ting stuck in a state of un­der­de­ter­mi­na­tion and judg­ment calls all the way down.[24]

If we con­clude that the value of in­for­ma­tion is in­suffi­ciently high to jus­tify more re­flec­tion, then we can turn to­wards get­ting value from work­ing on di­rect in­ter­ven­tions (2) in­formed by those moral per­spec­tives we have sub­stan­tial cre­dence in. For in­stance, a port­fo­lio for effec­tive al­tru­ists in the light of to­tal un­cer­tainty over down­side- vs. up­side-fo­cused views (which may not be an ac­cu­rate rep­re­sen­ta­tion of the EA land­scape cur­rently, where up­side-fo­cused views ap­pear to be in the ma­jor­ity) would in­clude many in­ter­ven­tions that are valuable from both per­spec­tives, and few in­ter­ven­tions where there is a large mis­match such that one side is harmed with­out the other side at­tain­ing a much greater benefit. Can­di­date in­ter­ven­tions where the over­lap be­tween down­side-fo­cused and up­side-fo­cused views is high in­clude AI strat­egy and AI safety (per­haps with a care­ful fo­cus on the avoidance of par­tic­u­larly bad failure modes), as well as grow­ing healthy com­mu­ni­ties around these in­ter­ven­tions. Many other things might be pos­i­tive from both per­spec­tives too, such as (to name only a few) efforts to in­crease in­ter­na­tional co­op­er­a­tion, rais­ing aware­ness and con­cern for for the suffer­ing of non-hu­man sen­tient minds, or im­prov­ing in­sti­tu­tional de­ci­sion-mak­ing.

It is some­times ac­cept­able or even ra­tio­nally man­dated to do some­thing that is nega­tive ac­cord­ing to some plau­si­ble moral views, pro­vided that the benefits ac­corded to other views are suffi­ciently large. Ideally, one would con­sider all of these con­sid­er­a­tions and in­te­grate the available in­for­ma­tion ap­pro­pri­ately with some de­ci­sion pro­ce­dure for act­ing un­der moral un­cer­tainty, such as one that in­cludes var­i­ance vot­ing (MacAskill, 2014, chpt. 3) and an imag­ined moral par­li­a­ment.[25] For in­stance, if some­one leaned more to­wards up­side-fo­cused views, or had rea­sons to be­lieve that the low-hang­ing fruit in the field of non-AI ex­tinc­tion risk re­duc­tion are ex­cep­tion­ally im­por­tant from the per­spec­tive of these views (and un­likely to dam­age down­side-fo­cused views more than they can be benefit­ted el­se­where), or gives a lot of weight to the ar­gu­ment from op­tion value (see the next para­graph), then these in­ter­ven­tions should be added at high pri­or­ity to the port­fo­lio as well.

Some peo­ple have ar­gued that even (very) small cre­dences in up­side-fo­cused views, such as 1-20% for in­stance, would in it­self already speak in fa­vor of mak­ing ex­tinc­tion risk re­duc­tion a top pri­or­ity be­cause mak­ing sure there will still be de­ci­sion-mak­ers in the fu­ture pro­vides high op­tion value. I think this gives by far too much weight to the ar­gu­ment from op­tion value. Op­tion value does play a role, but not nearly as strong a role as it is some­times made out to be. To elab­o­rate, let’s look at the ar­gu­ment in more de­tail: The naive ar­gu­ment from op­tion value says, roughly, that our de­scen­dants will be in a much bet­ter po­si­tion to de­cide than we are, and if suffer­ing-fo­cused ethics or some other down­side-fo­cused view is in­deed the out­come of their moral de­liber­a­tions, they can then de­cide to not colonize space, or only do so in an ex­tremely care­ful and con­trol­led way. If this pic­ture is cor­rect, there is al­most noth­ing to lose and a lot to gain from mak­ing sure that our de­scen­dants get to de­cide how to pro­ceed.

I think this ar­gu­ment to a large ex­tent misses the point, but see­ing that even some well-in­formed effec­tive al­tru­ists seem to be­lieve that it is very strong led me re­al­ize that I should write a post ex­plain­ing the land­scape of cause pri­ori­ti­za­tion for down­side-fo­cused value sys­tems. The prob­lem with the naive ar­gu­ment from op­tion value is that the de­ci­sion al­gorithm that is im­plic­itly be­ing recom­mended in the ar­gu­ment, namely fo­cus­ing on ex­tinc­tion risk re­duc­tion and leav­ing moral philos­o­phy (and s-risk re­duc­tion in case the out­come is a down­side-fo­cused moral­ity) to fu­ture gen­er­a­tions, makes sure that peo­ple fol­low the im­pli­ca­tions of down­side-fo­cused moral­ity in pre­cisely the one in­stance where it is least needed, and never oth­er­wise. If the fu­ture is go­ing to be con­trol­led by philo­soph­i­cally so­phis­ti­cated al­tru­ists who are also mod­est and will­ing to change course given new in­sights, then most bad fu­tures will already have been averted in that sce­nario. An out­come where we get long and care­ful re­flec­tion with­out down­sides is far from the only pos­si­ble out­come. In fact, it does not even seem to me to be the most likely out­come (al­though oth­ers may dis­agree). No one is most wor­ried about a sce­nario where epistem­i­cally care­ful thinkers with their heart in the right place con­trol the fu­ture; the dis­cus­sion is in­stead about whether the prob­a­bil­ity that things will ac­ci­den­tally go off the rails war­rants ex­tra-care­ful at­ten­tion. (And it is not as though it looks like we are par­tic­u­larly on the rails cur­rently ei­ther.) Re­duc­ing non-AI ex­tinc­tion risk does not pre­serve much op­tion value for down­side-fo­cused value sys­tems be­cause most of the ex­pected fu­ture suffer­ing prob­a­bly comes not from sce­nar­ios where peo­ple de­liber­ately im­ple­ment a solu­tion they think is best af­ter years of care­ful re­flec­tion, but in­stead from cases where things un­ex­pect­edly pass a point of no re­turn and com­pas­sion­ate forces do not get to have con­trol over the fu­ture. Down­side risks by ac­tion likely loom larger than down­side risks by omis­sion, and we are plau­si­bly in a bet­ter po­si­tion to re­duce the most press­ing down­side risks now than later. (In part be­cause “later” may be too late.)

This sug­gests that if one is un­cer­tain be­tween up­side- and down­side-fo­cused views, as op­posed to be­ing un­cer­tain be­tween all kinds of things ex­cept down­side-fo­cused views, the ar­gu­ment from op­tion value is much weaker than it is of­ten made out to be. Hav­ing said that, non-naively, op­tion value still does up­shift the im­por­tance of re­duc­ing ex­tinc­tion risks quite a bitjust not by an over­whelming de­gree. In par­tic­u­lar, ar­gu­ments for the im­por­tance of op­tion value that do carry force are for in­stance:

  • There is still some down­side risk to re­duce af­ter long reflection

  • Our de­scen­dants will know more about the world, and cru­cial con­sid­er­a­tions in e.g. in­finite ethics or an­throp­ics could change the way we think about down­side risks (in that we might for in­stance re­al­ize that down­side risks by omis­sion loom larger than we thought)

  • One’s adop­tion of (e.g.) up­side-fo­cused views af­ter long re­flec­tion may cor­re­late fa­vor­ably with the ex­pected amount of value or dis­value in the fu­ture (mean­ing: con­di­tional on many peo­ple even­tu­ally adopt­ing up­side-fo­cused views, the fu­ture is more valuable ac­cord­ing to up­side-fo­cused views than it ap­pears dur­ing an ear­lier state of un­cer­tainty)

The dis­cus­sion about the benefits from op­tion value is in­ter­est­ing and im­por­tant, and a lot more could be said on both sides. I think it is safe to say that the non-naive case for op­tion value is not strong enough to make ex­tinc­tion risk re­duc­tion a top pri­or­ity given only small cre­dences in up­side-fo­cused views, but it does start to be­come a highly rele­vant con­sid­er­a­tion once the cre­dences be­come rea­son­ably large. Hav­ing said that, one can also make a case that im­prov­ing the qual­ity of the fu­ture (more hap­piness/​value and less suffer­ing/​dis­value) con­di­tional on hu­man­ity not go­ing ex­tinct is prob­a­bly go­ing to be at least as im­por­tant for up­side-fo­cused views and is more ro­bust un­der pop­u­la­tion eth­i­cal un­cer­tainty – which speaks par­tic­u­larly in fa­vor of highly pri­ori­tiz­ing ex­is­ten­tial risk re­duc­tion through AI policy and AI al­ign­ment.

We saw that in­te­grat­ing pop­u­la­tion-eth­i­cal un­cer­tainty means that one should of­ten act to benefit both up­side- and down­side-fo­cused value sys­tems – at least in case such un­cer­tainty ap­plies in one’s own case and epistemic situ­a­tion. Mo­ral co­op­er­a­tion pre­sents an even stronger and more uni­ver­sally ap­pli­ca­ble rea­son to pur­sue a port­fo­lio of in­ter­ven­tions that is al­to­gether pos­i­tive ac­cord­ing both per­spec­tives. The case for moral co­op­er­a­tion is very broad and con­vinc­ing, as it ranges from com­mon­sen­si­cal heuris­tics to the­ory-backed prin­ci­ples found in Kan­tian moral­ity or through­out Parfit’s work, as well as in the liter­a­ture on de­ci­sion the­ory.[26] It im­plies that one should give ex­tra weight to in­ter­ven­tions that are pos­i­tive for value sys­tems differ­ent from one’s own, and sub­tract some weight from in­ter­ven­tions that are nega­tive ac­cord­ing to other value sys­tems – all to the ex­tent in which the value sys­tems in ques­tion are en­dorsed promi­nently or en­dorsed by po­ten­tial al­lies.[27]

Con­sid­er­a­tions from moral co­op­er­a­tion may even make moral re­flec­tion ob­so­lete on the level of in­di­vi­d­u­als: Sup­pose we knew that peo­ple tended to grav­i­tate to­wards a small num­ber of at­trac­tor states in pop­u­la­tion ethics, and that once a per­son ten­ta­tively set­tles on a po­si­tion, they are very un­likely to change their mind. Rather than ev­ery­one go­ing through this pro­cess in­di­vi­d­u­ally, peo­ple could col­lec­tively adopt a de­ci­sion rule where they value the out­come of a hy­po­thet­i­cal pro­cess of moral re­flec­tion. They would then work on in­ter­ven­tions that are benefi­cial for all the com­monly en­dorsed po­si­tions, weighted by the prob­a­bil­ity that peo­ple would adopt them if they were to go through long-winded moral re­flec­tion. Such a de­ci­sion rule would save ev­ery­one time that could be spent on di­rect work rather than philoso­phiz­ing, but per­haps more im­por­tantly, it would also make it much eas­ier for peo­ple to benefit differ­ent value sys­tems co­op­er­a­tively. After all, when one is gen­uinely un­cer­tain about val­ues, there is no in­cen­tive to at­tain un­co­op­er­a­tive benefits for one’s own value sys­tem.

So while I think the po­si­tion that valu­ing re­flec­tion is always the epistem­i­cally pru­dent thing to do rests on du­bi­ous as­sump­tions (be­cause of the ar­gu­ment from op­tion value be­ing weak, as well as the rea­sons al­luded to in foot­note 24), I think there is an in­trigu­ing ar­gu­ment that a norm for valu­ing re­flec­tion is ac­tu­ally best from a moral co­op­er­a­tion per­spec­tive – pro­vided that ev­ery­one is aware of what the differ­ent views on pop­u­la­tion ethics im­ply for cause pri­ori­ti­za­tion, and that we have a roughly ac­cu­rate sense of which at­trac­tor states peo­ple’s moral re­flec­tion would seek out.

Even if ev­ery­one went on to pri­mar­ily fo­cus on in­ter­ven­tions that are fa­vored by their own value sys­tem or their best guess moral­ity, small steps into the di­rec­tion of co­op­er­a­tively tak­ing other per­spec­tives into ac­count can already cre­ate a lot of ad­di­tional value for all par­ties. To this end, ev­ery­one benefits from try­ing to bet­ter un­der­stand and ac­count for the cause pri­ori­ti­za­tion im­plied by differ­ent value sys­tems.


[1] Speak­ing of “bad things” or “good things” has the po­ten­tial to be mis­lead­ing, as one might think of cases such as “It is good to pre­vent bad things” or “It is bad if good things are pre­vented.” To be clear, what I mean by “bad things” are states of af­fairs that are in them­selves dis­valuable, as op­posed to states of af­fairs that are dis­valuable only in terms of wasted op­por­tu­nity costs. Analo­gously, by “good things” I mean states of af­fairs that are in them­selves worth bring­ing about (po­ten­tially at a cost) rather than just states of af­fairs that fail to be bad.

[2] It is a bit more com­pli­cated: If one thought that for ex­ist­ing peo­ple, there is a vast amount of value to be achieved by en­sur­ing that peo­ple live very long and very happy lives, then even views that fo­cus pri­mar­ily on the well-be­ing of cur­rently ex­ist­ing peo­ple or peo­ple who will ex­ist re­gard­less of one’s ac­tions may come out as up­side-fo­cused. For this to be the case, one would need to have rea­son to be­lieve that these vast up­sides are re­al­is­ti­cally within reach, that they are nor­ma­tively suffi­ciently im­por­tant when com­pared with down­side risks for the peo­ple in ques­tion (such as peo­ple end­ing up with long, un­happy lives or ex­tremely un­happy lives), and that these down­side risks are suffi­ciently un­likely (or in­tractable) em­piri­cally to count as the more press­ing pri­or­ity when com­pared to the up­side op­por­tu­ni­ties at hand.

It is worth not­ing that a lot of peo­ple, when asked e.g. about their goals in life, do not state it as a pri­or­ity or go to great lengths to live for ex­tremely long, or to ex­pe­rience lev­els of hap­piness that are not sus­tain­ably pos­si­ble with cur­rent tech­nol­ogy. This may serve as a coun­ter­ar­gu­ment against this sort of up­side-fo­cused po­si­tion. On the other hand, ad­vo­cates of that po­si­tion could ar­gue that peo­ple may sim­ply not be­lieve that such sce­nar­ios are re­al­is­tic, and that e.g. some peo­ple’s long­ing to go to heaven does ac­tu­ally speak in fa­vor of there be­ing a strong de­sire for en­sur­ing an ex­cep­tion­ally good, ex­cep­tion­ally long per­sonal fu­ture.

[3] I es­ti­mate that a view is suffi­ciently “nega­tive-lean­ing” to qual­ify as down­side-fo­cused if it says that re­duc­ing ex­treme suffer­ing is much more im­por­tant (say 100 times or maybe 1,000 times more im­por­tant) than cre­at­ing op­ti­mized hap­piness. For nor­ma­tive views with lower ex­change rates be­tween ex­treme suffer­ing and op­ti­mal hap­piness (or gen­er­ally pos­i­tive value), pri­ori­ti­za­tion be­comes less clear and will of­ten de­pend on ad­di­tional speci­fics of the view in ques­tion. There does not ap­pear to be a uniquely cor­rect way to mea­sure hap­piness and suffer­ing, so talk about it be­ing x times more im­por­tant to pre­vent suffer­ing than to cre­ate hap­piness always has to be ac­com­panied with in­struc­tions for what ex­actly is be­ing com­pared. Be­cause of the pos­si­bil­ity that ad­vanced fu­ture civ­i­liza­tions may be able to effi­ciently in­stan­ti­ate mind states that are much more (dis)valuable than the highs and lows of our biol­ogy, what seems par­tic­u­larly rele­vant for es­ti­mat­ing the value of the long-term fu­ture is the way one com­pares max­i­mally good hap­piness and max­i­mally bad suffer­ing. Be­cause pre­sum­ably we have ex­pe­rienced nei­ther one ex­treme nor the other, it may be difficult to form a strong opinion on this ques­tion, which ar­guably gives us grounds for sub­stan­tial moral un­cer­tainty. One thing that seems to be the case is that a lot of peo­ple – though not ev­ery­one – would in­tro­spec­tively agree that suffer­ing is a lot stronger than hap­piness at least within the limits of our biol­ogy. How­ever, one can draw differ­ent in­ter­pre­ta­tions as to why this is the case. Some peo­ple be­lieve that this differ­ence would be­come a lot less pro­nounced if all pos­si­ble states of mind could be ac­cessed effi­ciently with the help of ad­vanced tech­nol­ogy. Evolu­tion­ary ar­gu­ments lend some sup­port to this po­si­tion: It was plau­si­bly more im­por­tant for nat­u­ral se­lec­tion to pre­vent or­ganisms from ar­tifi­cially max­ing out their feel­ings of pos­i­tive re­ward (wire­head­ing) than to pre­vent or­ganisms from be­ing able to be max out their nega­tive re­wards (self-tor­ture). This line of rea­son­ing sug­gests that us ob­serv­ing that it is rare and very difficult in prac­tice – given our biol­ogy – to ex­pe­rience ex­tremely pos­i­tive states of mind (es­pe­cially for pro­longed pe­ri­ods of time) should not give us a lot of rea­son to think that such states are elu­sive in the­ory. On the other hand, an­other plau­si­ble in­ter­pre­ta­tion of the asym­me­try we per­ceive be­tween suffer­ing and hap­piness (as we can ex­pe­rience and en­vi­sion them) would be one that points to an un­der­ly­ing differ­ence in the na­ture of the two, one that won’t go away even with the help of ad­vanced tech­nol­ogy. Pro­po­nents of this sec­ond in­ter­pre­ta­tion be­lieve that no mat­ter how op­ti­mized states of hap­piness may be­come in the fu­ture, cre­at­ing hap­piness will always lack the kind of moral ur­gency that comes with avoid­ing ex­treme suffer­ing.

[4] Note that pri­ori­tar­i­anism or egal­i­tar­i­anism are not up­side-fo­cused views even though they may share the similar­ity with clas­si­cal he­do­nis­tic util­i­tar­i­anism that they ac­cept ver­sions of the re­pug­nant con­clu­sion. There can be down­side-fo­cused views which ac­cept the re­pug­nant con­clu­sion. As soon as lives of nega­tive welfare are at stake, pri­ori­tar­i­anism and welfare-based egal­i­tar­i­anism ac­cord es­pe­cially high moral im­por­tance to­wards pre­vent­ing these lives, which most likely makes the views down­side-fo­cused. (The bent to­wards giv­ing pri­or­ity to those worse off would have to be rather mild in or­der to not come out as down­side-fo­cused in prac­tice, es­pe­cially since one of the rea­sons peo­ple are drawn to these views in the first place might be that they in­cor­po­rate down­side-fo­cused moral in­tu­itions.)

[5] Per­son-af­fect­ing re­stric­tions are gen­er­ally con­sid­ered un­promis­ing as a solu­tion to pop­u­la­tion ethics. For in­stance, ver­sions of per­son-af­fect­ing views that eval­u­ate it as neu­tral to add well-off be­ings to the world, yet bad to add be­ings whose welfare is be­low zero, suffer from what Hilary Greaves (2017) has called a ““re­mark­abl[e] difficult[y] to for­mu­late any re­motely ac­cept­able ax­iol­ogy that cap­tures this idea of ‘neu­tral­ity.’” Ver­sions of per­son-af­fect­ing views that eval­u­ate it as nega­tive to add lives to the world with even just a lit­tle suffer­ing (e.g. Be­natar’s anti-na­tal­ism) do not have this prob­lem, but they ap­pear coun­ter­in­tu­itive to most peo­ple be­cause of how strongly they count such suffer­ing in oth­er­wise well-off lives.

[6] Within the set of in­ter­ven­tions that plau­si­bly have a large pos­i­tive im­pact on the long-term fu­ture, it is also im­por­tant to con­sider one’s com­par­a­tive ad­van­tages. Ta­lent, ex­per­tise and mo­ti­va­tion for a par­tic­u­lar type of work can have a vast effect on the qual­ity of one’s out­put and can make up for effec­tive­ness differ­ences of one or two or­ders of mag­ni­tude.

[7] Com­pu­ta­tion­ally in­effi­cient in the sense that, with ad­vanced com­puter tech­nol­ogy, one could speed up what func­tion­ally goes on in biolog­i­cal brains by a vast fac­tor and cre­ate large num­bers of copies of such brain em­u­la­tions on a com­puter sub­strate – see for in­stance this re­port or Robin Han­son’s The Age of Em. One premise that strongly bears on the like­li­hood of sce­nar­ios where the fu­ture con­tains as­tro­nom­i­cal quan­tities of suffer­ing is whether ar­tifi­cial sen­tience, sen­tience im­ple­mented on com­puter sub­strates, is pos­si­ble. Be­cause most philoso­phers of mind be­lieve that digi­tal sen­tience is pos­si­ble, only hav­ing very small cre­dences (say 5% or smaller) in this propo­si­tion is un­likely to be epistem­i­cally war­ranted. More­over, even if digi­tal sen­tience was im­pos­si­ble, an­other route to as­tro­nom­i­cal fu­ture suffer­ing is that what­ever sub­strates can pro­duce con­scious­ness would be used/​re­cruited for in­stru­men­tal pur­poses. I have also writ­ten about this is­sue here.

[8] And some peo­ple might think that SP, un­der some spec­u­la­tive as­sump­tions on the philos­o­phy of mind and how one morally val­ues differ­ent com­pu­ta­tions, may con­tain suffer­ing that is tied up with the par­adise re­quire­ments and there­fore hard to avoid.

[9] My in­side view says they are two or­ders of mag­ni­tude less likely, but I am try­ing to ac­count for some peo­ple be­ing more op­ti­mistic. My pes­simism de­rives from there be­ing so many more macro­scop­i­cally dis­tinct fu­tures that qual­ify as AS rather than AP, and while I do think AP rep­re­sents a strong at­trac­tor, my in­tu­ition is that Molochian forces are difficult to over­come. See also Brian To­masik’s dis­cus­sion of utopia here. Ev­i­dence that might con­tribute to­wards mak­ing me re­vise my es­ti­mate (up to e.g. 10% like­li­hood of some kind of utopia given space coloniza­tion) would be things such as:

  • A coun­try im­ple­ment­ing a scheme to re­dis­tribute wealth in a way that elimi­nates poverty or oth­er­wise gen­er­ates large welfare benefits with­out re­ally bad side-effects (in­clud­ing side-effects in other coun­tries).

  • Tech­nolog­i­cally lead­ing coun­tries achiev­ing a vastly con­se­quen­tial agree­ment re­gard­ing cli­mate change or in­ter­na­tional co­or­di­na­tion with re­spect to AI de­vel­op­ment.

  • De­mo­graphic trends sug­gest­ing a sta­ble de­crease in fer­til­ity rates for to­tal pop­u­la­tion prog­noses down to sus­tain­abil­ity lev­els or be­low (es­pe­cially if sta­ble with­out highly nega­tive side-effects on a long-term per­spec­tive of cen­turies and be­yond).

  • Many coun­tries com­pletely out­law­ing fac­tory farm­ing in the next cou­ple of decades, es­pe­cially if non-eco­nomic rea­sons play a ma­jor role.

  • Any ma­jor, re­search-rele­vant ju­ris­dic­tion out­law­ing or very tightly reg­u­lat­ing the use of cer­tain AI al­gorithms for fear of need­lessly cre­at­ing sen­tient minds.

  • Break­throughs in AI al­ign­ment re­search that make most AI safety re­searchers (in­clud­ing groups with a track record of be­ing on the more pes­simistic side) sub­stan­tially more op­ti­mistic about the fu­ture than they cur­rently are.

[10] Some peo­ple would say that as long as a be­ing prefers ex­is­tence over non-ex­is­tence (and if they would not, there is usu­ally the op­tion of suicide), their ex­is­tence can­not be net nega­tive even if it con­tained a lot more suffer­ing than hap­piness. I would counter that, while I see a strong case from co­op­er­a­tion and re­spect­ing some­one’s goals for not ter­mi­nat­ing the ex­is­tence of a be­ing who wants to con­tinue to go on ex­ist­ing, this is not the same as say­ing that the be­ing’s ex­is­tence, as­sum­ing it con­tains only suffer­ing, adds value in­de­pen­dently of the be­ing’s drives or goals. Goals may not be about only one’s own welfare – they can also in­clude all kinds of ob­jec­tives, such as liv­ing a mean­ingful life or car­ing about per­sonal achieve­ments or the state of the world. After all, one would think that Dar­wi­nian forces would be un­likely to se­lect for the kind of goals or eval­u­a­tion mechanisms that eas­ily eval­u­ate one’s olife as all things con­sid­ered not worth liv­ing. So rather than e.g. only tak­ing mo­ment-by-mo­ment in­di­ca­tors of one’s ex­pe­rienced well-be­ing, peo­ple’s life satis­fac­tion judg­ment may in­clude many ad­di­tional, gen­er­ally life-sup­port­ing com­po­nents that de­rive from our re­la­tion to oth­ers and to the world. Cor­re­spond­ingly, not com­mit­ting suicide is in­suffi­cient ev­i­dence that, from a purely well-be­ing-ori­ented per­spec­tive, some­one’s life is be­ing – even just sub­jec­tively – eval­u­ated as net valuable. (If alongside suicide, there was also the op­tion to take a magic pill that turned peo­ple into p-zom­bies, how many peo­ple would take it?) We can also think of it this way: If a prefer­ence for con­tinued life or no-longer-ex­is­tence was enough to es­tab­lish that start­ing a life is neu­tral or pos­i­tive, then some­one could en­g­ineer ar­tifi­cial be­ings that always pre­fer con­scious­ness over non-con­scious­ness, even if they ex­pe­rienced noth­ing but agony for ev­ery sec­ond of their ex­is­tence. It per­son­ally strikes me as un­ac­cept­able to re­gard this situ­a­tion as any­thing but very bad, but in­tu­itions may differ.

[11] Tech­ni­cally, one could hold an up­side-fo­cused eth­i­cal view where SP is similarly good as AP. But this raises the ques­tion whether one would have the same in­tu­ition for a small dystopia ver­sus a large dystopia. If the large dystopia is or­ders of mag­ni­tude worse than the small dystopia, yet the scope of a po­ten­tial par­adise runs into diminish­ing re­turns, then the view in ques­tion is ei­ther fully down­side-fo­cused if the large dystopia is stipu­lated to be much worse than SP is good, or im­plau­si­bly in­differ­ent to­wards down­side risks in case SP is la­bel­led as a lot bet­ter than a much larger dystopia. Per­haps one could con­struct a holis­tic view that is ex­actly in-be­tween down­side- and up­side-fo­cused, based on the stipu­la­tion that half of one’s moral car­ing ca­pac­ity goes into utopia cre­ation and the other half into suffer­ing pre­ven­tion. This could in prac­tice be equiv­a­lent to us­ing a moral par­li­a­ment ap­proach to moral un­cer­tainty while plac­ing 50% cre­dence on each view.

[12] One ex­cep­tion is that in the­ory, one could have a civ­i­liza­tion that re­mains on earth yet uses up as much en­ergy and re­sources as phys­i­cally available to in­stan­ti­ate digi­tal sen­tience at “as­tro­nom­i­cal” scales on su­per­com­put­ers built on earth. Of course, the stakes in this sce­nario are still dwar­fed by a sce­nario where the same thing hap­pens around ev­ery star in the ac­cessible uni­verse, but it would still con­sti­tute an s-risk in the sense that, if some of these su­per­com­put­ers run a lot of sen­tient minds and some of them would suffer, the to­tal amount of suffer­ing could quickly be­come larger than all the sen­tience that had ex­isted in life’s his­tory up to that point.

[13] Though one plau­si­ble ex­am­ple might be if a value-al­igned su­per­in­tel­li­gent AI could trade with AIs in other parts of the uni­verse and bar­gain for suffer­ing re­duc­tion.

[14] From Bostrom’s Astro­nom­i­cal Waste (2003) pa­per:

“Be­cause the lifes­pan of galax­ies is mea­sured in billions of years, whereas the time-scale of any de­lays that we could re­al­is­ti­cally af­fect would rather be mea­sured in years or decades, the con­sid­er­a­tion of risk trumps the con­sid­er­a­tion of op­por­tu­nity cost. For ex­am­ple, a sin­gle per­centage point of re­duc­tion of ex­is­ten­tial risks would be worth (from a util­i­tar­ian ex­pected util­ity point-of-view) a de­lay of over 10 mil­lion years.”

[15] For the pur­poses of this post, I am pri­mar­ily eval­u­at­ing the ques­tion from the per­spec­tive of min­i­miz­ing s-risks. Of course, when one eval­u­ates the same ques­tion ac­cord­ing to other desider­ata, it turns out that most plau­si­ble moral views very strongly fa­vor the cur­rent tra­jec­tory. Any view that places over­rid­ing im­por­tance on avoid­ing in­vol­un­tary deaths for cur­rently-ex­ist­ing peo­ple au­to­mat­i­cally eval­u­ates the cur­rent tra­jec­tory as bet­ter, be­cause the col­lapse of civ­i­liza­tion would bring an end to any hopes of de­lay­ing peo­ple’s deaths with the help of ad­vanced tech­nol­ogy. Gen­er­ally, a cen­tral con­sid­er­a­tion here is also that the more one val­ues parochial fea­tures of our civ­i­liza­tion, such as the sur­vival of pop­u­lar taste in art, peo­ple re­mem­ber­ing and rever­ing Shake­speare, or the con­tinued ex­is­tence of pizza as a food item, etc., the more ob­vi­ous it be­comes that the cur­rent tra­jec­tory would be bet­ter than a civ­i­liza­tional re­set sce­nario.

[16] Fur­ther things to con­sider in­clude that a civ­i­liza­tion re­cov­er­ing with de­pleted fos­sil fuels would plau­si­bly tend to be poorer for any given level of tech­nol­ogy com­pared to our his­tor­i­cal record. What would this mean for val­ues? By the time they caught up with us in en­ergy pro­duc­tion, might they also have been civ­i­lized for longer? (This could add so­phis­ti­ca­tion in some ways but also in­crease so­cietal con­for­mity, which could be both good or bad.)

[17] This es­ti­mate may ap­pear sur­pris­ingly high, but that might be be­cause when we think about our own life, our close ones, and all the things we see in the news and know about other coun­tries and cul­tures, it makes lit­tle differ­ence whether all of civ­i­liza­tion ends and a few hu­mans sur­vive, or whether all of civ­i­liza­tion ends and no one sur­vives. So when ex­perts talk about e.g. the dan­gers of nu­clear war, they may fo­cus on the “this could very well end civ­i­liza­tion” as­pects, with­out em­pha­siz­ing the part that some few hu­mans at least are prob­a­bly go­ing to sur­vive. And af­ter the worst part is sur­vived, pop­u­la­tion growth could kick in again and, even though the ab­sence of fos­sil fuels in such a sce­nario would likely mean that tech­nolog­i­cal progress takes longer, this “dis­ad­van­tage” could quite plau­si­bly be over­come given longer timescales and larger pop­u­la­tions per level of tech­nol­ogy (closer to the Malthu­sian limits).

[18] One path­way for pos­i­tive spillover effects would be if in­creased global co­or­di­na­tion with re­spect to non-AI tech­nolo­gies also im­proves im­por­tant co­or­di­na­tion efforts for the de­vel­op­ment of smarter-than-hu­man AI sys­tems. And maybe lower risks of catas­tro­phes gen­er­ally lead to a more sta­ble geopoli­ti­cal cli­mate where arms races are less pro­nounced, which could be highly benefi­cial for s-risk re­duc­tion. See also the points made here.

[19] Fur­ther­more, be­cause a sce­nario where civ­i­liza­tion col­lapsed would already be ir­re­versibly very bad for all the peo­ple who cur­rently ex­ist or would ex­ist at that time, mak­ing an even­tual re­cov­ery more likely (af­ter sev­eral gen­er­a­tions of post-Apoca­lyp­tic hard­ship) would sud­denly be­come some­thing that is only fa­vored by some peo­ple’s al­tru­is­tic con­cerns for the long-term fu­ture, and not any­more by self-ori­ented or fam­ily- or com­mu­nity-ori­ented con­cerns.

[20] I think the situ­a­tion is such that ex­tremely few po­si­tions are com­fortable with the thought of ex­tinc­tion, but also few per­spec­tives come down to re­gard­ing the avoidance of ex­tinc­tion as top pri­or­ity. In this pod­cast, Toby Ord makes the claim that non-con­se­quen­tial­ist value sys­tems should place par­tic­u­lar em­pha­sis on mak­ing sure hu­man­ity does not go ex­tinct. I agree with this only to the small ex­tent that it is true that such views largely would not wel­come ex­tinc­tion. How­ever, over­all I dis­agree and think it is hasty to pocket these views as hav­ing ex­tinc­tion-risk re­duc­tion as their pri­mary pri­or­ity. Non-con­se­quen­tial­ist value sys­tems ad­mit­tedly are of­ten some­what un­der­de­ter­mined, but in­so­far as they do say any­thing about the long-term fu­ture be­ing im­por­tant, it strikes me as a more nat­u­ral ex­ten­sion to count them as ex­tinc­tion-averse but down­side-fo­cused rather than ex­tinc­tion-averse and up­side-fo­cused. I think this would be most pro­nounced when it comes to down­side risks by ac­tion, i.e. moral catas­tro­phes hu­man­ity could be re­spon­si­ble for. A cen­tral fea­ture dis­t­in­guish­ing con­se­quen­tial­ism from other moral the­o­ries is that the lat­ter of­ten treat harm­ful ac­tions differ­ently from harm­ful omis­sions. For in­stance, a rights-based the­ory might say that it is par­tic­u­larly im­por­tant to pre­vent large-scale rights vi­o­la­tions in the fu­ture (‘do no harm’), though not par­tic­u­larly im­por­tant to make large-scale pos­i­tive sce­nar­ios ac­tual (‘cre­ate max­i­mal good’). The in­tu­ition that causes some peo­ple to walk away from Ome­las, in this fic­tional dilemma that is struc­turally similar to some of the moral dilem­mas we might face re­gard­ing the promises and dan­gers from ad­vanced tech­nol­ogy and space coloniza­tion, are largely non-con­se­quen­tial­ist in­tu­itions. Rather than fight­ing for Ome­las, I think many non-con­se­quen­tial­ist value sys­tems would in­stead recom­mend a fo­cus on helping those be­ings who are (cur­rently) worst-off, and per­haps (if one ex­tends the scope of such moral­ities to also in­clude the long-term fu­ture) also a fo­cus on helping to pre­vent fu­ture suffer­ing and down­side risks.

[21] For a poin­ter on how to roughly en­vi­sion such a sce­nario, I recom­mend this alle­gor­i­cal de­scrip­tion of a fu­ture where hu­mans lose their in­fluence over an au­to­mated econ­omy.

[22] See the de­scrip­tion un­der Prob­lem #2 here.

[23] Of course, ev­ery de­cent per­son has in­trin­sic con­cern for avoid­ing the worst failure modes in AI al­ign­ment, but the differ­ence is that, from an down­side-fo­cused per­spec­tive, bring­ing down the prob­a­bil­ities from very small to ex­tremely small is as im­por­tant as it gets, whereas for up­side-fo­cused views, other as­pects of AI al­ign­ment may look more promis­ing to work on (on the mar­gin) as soon as the ob­vi­ous worst failure modes are patched.

[24] Valu­ing re­flec­tion means to value not one’s cur­rent best guess about one’s (moral) goals, but what one would want these goals to be af­ter a long pe­riod of philo­soph­i­cal re­flec­tion un­der ideal­ized con­di­tions. For in­stance, I might en­vi­sion plac­ing copies of my brain state now into differ­ent vir­tual re­al­ities, where these brain em­u­la­tions get to con­verse with the world’s best philoso­phers, have any well-formed ques­tions an­swered by su­per­in­tel­li­gent or­a­cles, and, with ad­vanced tech­nol­ogy, ex­plore what it might be like to go through life with differ­ent moral in­tu­itions, or ex­plore a range differ­ent ex­pe­riences. Of course, for the re­flec­tion pro­ce­dure to give a well-defined an­swer, I would have to also spec­ify how I will get a moral view out of what­ever the differ­ent copies end up con­clud­ing, es­pe­cially if there won’t be a con­sen­sus or if the con­sen­sus strongly de­pends on or­der­ing effects of the thought ex­per­i­ments be­ing pre­sented. A suit­able re­flec­tion pro­ce­dure would al­low the right amount of flex­i­bil­ity in or­der to provide us with a chance of ac­tu­ally chang­ing our cur­rent best guess moral­ity, but at the same time, that flex­i­bil­ity should not be di­aled up too high, as this might then fail to pre­serve in­tu­itions or prin­ci­ples we are – right now – cer­tain we would want to pre­serve. The art lies in set­ting just the right amount of guidance, or se­lect­ing some (sub)ques­tions where re­flec­tion is wel­come, and oth­ers (such as maybe whether we want to make al­tru­ism a ma­jor part of our iden­tity or in­stead go solely with ra­tio­nal ego­ism, which af­ter all also has so­phis­ti­cated philo­soph­i­cal ad­her­ents) that are not open for change. Per­son­ally, I find my­self hav­ing much stronger and more in­tro­spec­tively trans­par­ent or rea­son­ing-backed in­tu­itions about pop­u­la­tion ethics than e.g. the ques­tion of which en­tities I morally care about, or to what ex­tent I care about them. Cor­re­spond­ingly, I am more open for chang­ing my mind on the lat­ter. I would be more re­luc­tant to trust my own thoughts and in­tu­itions on pop­u­la­tion ethics if I thought it is likely that there was a sin­gle cor­rect solu­tion. How­ever, based on my model of how peo­ple form con­se­quen­tial­ist goals – which are a new and weird thing for an­i­mal minds to get hi­jacked by – I feel con­fi­dent that at best, we will find sev­eral con­sis­tent solu­tions that serve as at­trac­tors for hu­mans un­der­go­ing so­phis­ti­cated re­flec­tion pro­ce­dures, but no vol­un­tar­ily-con­sulted re­flec­tion pro­ce­dure can per­suade some­one to adopt a view that they rule out as some­thing they would ex ante not want the se­lec­tion pro­ce­dure to se­lect for them.

[25] An al­ter­na­tive to the par­li­a­men­tary ap­proach that is some­times men­tioned, at least for deal­ing with moral un­cer­tainty be­tween con­se­quen­tial­ist views that seem com­pa­rable, would be to choose whichever ac­tion makes the largest pos­i­tive differ­ence in terms of ex­pected util­ity be­tween views, or choose the ac­tion for which “more seems at stake.” While I can see some in­tu­itions (re­lated to up­date­less­ness) for this ap­proach, I find it over­all not very con­vinc­ing be­cause it so strongly and pre­dictably priv­ileges up­side-fo­cused views, so that down­side-fo­cused views would get vir­tu­ally no weight. For a re­lated dis­cus­sion (which may only make sense if peo­ple are fa­mil­iar with the con­text of mul­ti­verse-wide co­op­er­a­tion) on when the case for up­date­less­ness be­comes par­tic­u­larly shaky, see the sec­tion “Up­date­less com­pro­mise” here. Al­to­gether, I would maybe give some weight to ex­pected value con­sid­er­a­tions for deal­ing with moral un­cer­tainty, but more weight to a par­li­a­men­tary ap­proach. Note also that the ex­pected value ap­proach can cre­ate moral un­cer­tainty wa­gers for idiosyn­cratic views, which one may be forced to ac­cept for rea­sons of con­sis­tency. Depend­ing on how one defines how much is at stake ac­cord­ing to a view, an ex­pected value ap­proach in our situ­a­tion gives the strongest boost to nor­ma­tively up­side-fo­cused views such as pos­i­tive util­i­tar­i­anism (“only hap­piness counts”) or to views which af­fect as many in­ter­ests as pos­si­ble (such as count­ing the in­ter­ests of all pos­si­ble peo­ple, which un­der some in­ter­pre­ta­tions should swamp ev­ery other moral con­sid­er­a­tion by the amount of what is at stake).

[26] Both Kant and Parfit no­ticed that one’s de­ci­sions may change if we re­gard our­selves as de­cid­ing over not just our own policy over ac­tions, but over a col­lec­tive policy shared by agents similar to us. This prin­ci­ple is taken up more for­mally by many of the al­ter­na­tives to causal de­ci­sion the­ory that recom­mend co­op­er­a­tion (un­der vary­ing cir­cum­stances) even in one-shot, pris­oner-dilemma-like situ­a­tions. In On What Mat­ters (2011), Parfit in ad­di­tion ar­gued that the proper in­ter­pre­ta­tion for both Kan­ti­anism and con­se­quen­tial­ism sug­gests that they are de­scribing the same kind of moral­ity, “Kan­tian” in the sense that naive acts-jus­tify-the-means rea­son­ing of­ten does not ap­ply, and con­se­quen­tial­ist in that, ul­ti­mately, what mat­ters is things be­ing good.

[27] I am pri­mar­ily talk­ing about peo­ple we can in­ter­act with, though some ap­proaches to think­ing about de­ci­sion the­ory sug­gest that po­ten­tial al­lies may also in­clude peo­ple in other parts of the mul­ti­verse, or fu­ture peo­ple good at Parfit’s Hitch­hiker prob­lem.


This piece benefit­ted from com­ments by To­bias Bau­mann, David Althaus, De­nis Drescher, Jesse Clif­ton, Max Daniel, Cas­par Oester­held, Jo­hannes Treut­lein, Brian To­masik, To­bias Pul­ver, Kaj So­tala (who also al­lowed me to use a map on s-risks he made and adapt it for my pur­poses in the sec­tion on AI) and Jonas Vol­lmer. The sec­tion on ex­tinc­tion risks benefit­ted from in­puts by Owen Cot­ton-Bar­ratt, and I am also thank­ful for valuable com­ments and crit­i­cal in­puts in a sec­ond round of feed­back by Jan Brauner, Gre­gory Lewis and Carl Shul­man.


Arm­strong, S. & Sand­berg, A. (2003). Eter­nity in Six Hours: In­ter­galac­tic spread­ing of in­tel­li­gent life and sharp­en­ing the Fermi para­dox. Ars Acta 89:1-13.

Bostrom, N. (2003). Astro­nom­i­cal Waste: The Op­por­tu­nity Cost of De­layed Tech­nolog­i­cal Devel­op­ment. Utili­tas 15(3):308-314.

Bostrom, N. (2014). Su­per­in­tel­li­gence: Paths, Danger, Strate­gies. Oxford: Oxford Univer­sity Press.

Daswani, M. & Leike, J. (2015). A Defi­ni­tion of Hap­piness for Re­in­force­ment Learn­ing Agents. arXiv:1505.04497.

Greaves, H. (2017). Pop­u­la­tion ax­iol­ogy. Philos­o­phy Com­pass, 12:e12442.​​10.1111/​​phc3.12442.

Hastie, R., & Dawes, R. (2001). Ra­tional choice in an un­cer­tain world: The psy­chol­ogy of judg­ment and de­ci­sion mak­ing. Thou­sand Oaks: Sage Publi­ca­tions.

MacAskill, W. (2014). Nor­ma­tive Uncer­tainty. PhD diss., St Anne’s Col­lege, Univer­sity of Oxford.

New­man, E. (2006). Power laws, Pareto dis­tri­bu­tions and Zipf’s law. arXiv:cond-mat/​0412004.

Omo­hun­dro, SM. (2008). The Ba­sic AI Drives. In P. Wang, B. Go­ertzel, and S. Fran­klin (eds.). Pro­ceed­ings of the First AGI Con­fer­ence, 171, Fron­tiers in Ar­tifi­cial In­tel­li­gence and Ap­pli­ca­tions. Am­s­ter­dam: IOS Press.

Parfit, D. (2011). On What Mat­ters. Oxford: Oxford Univer­sity Press.

Pinker, S. (2011). The Bet­ter An­gels of our Na­ture. New York, NY: Vik­ing.

So­tala, K. & Gloor, L. (2017). Su­per­in­tel­li­gence as a Cause or Cure for Risks of Astro­nom­i­cal Suffer­ing. In­for­mat­ica 41(4):389–400.