Why I prioritize moral circle expansion over artificial intelligence alignment

Why I pri­ori­tize moral cir­cle ex­pan­sion over ar­tifi­cial in­tel­li­gence alignment

This blog post is writ­ten for a very spe­cific au­di­ence: peo­ple in­volved in the effec­tive al­tru­ism com­mu­nity who are fa­mil­iar with cause pri­ori­ti­za­tion and ar­gu­ments for the over­whelming im­por­tance of the far fu­ture. It might read as strange and con­fus­ing to peo­ple with­out that do­main knowl­edge. Please con­sider read­ing the ar­ti­cles linked in the Con­text sec­tion to get your bear­ings. This post is also very long, but the sec­tions are fairly in­de­pen­dent as each cov­ers a fairly dis­tinct con­sid­er­a­tion.

Many thanks for helpful feed­back to Jo An­der­son, To­bias Bau­mann, Jesse Clif­ton, Max Daniel, Michael Dick­ens, Per­sis Eskan­der, Daniel Filan, Kieran Greig, Zach Groff, Amy Halpern-Laff, Jamie Har­ris, Josh Ja­cob­son, Gre­gory Lewis, Cas­par Oester­held, Carl Shul­man, Gina Stuessy, Brian To­masik, Jo­hannes Treut­lein, Mag­nus Vind­ing, Ben West, and Kelly Witwicki. I also for­warded Ben Todd and Rob Wiblin a small sec­tion of the draft that dis­cusses an 80,000 Hours ar­ti­cle.


When peo­ple in the effec­tive al­tru­ism (EA) com­mu­nity have worked to af­fect the far fu­ture, they’ve typ­i­cally fo­cused on re­duc­ing ex­tinc­tion risk, es­pe­cially risks as­so­ci­ated with su­per­in­tel­li­gence or gen­eral ar­tifi­cial in­tel­li­gence al­ign­ment (AIA). I agree with the ar­gu­ments for the far fu­ture be­ing ex­tremely im­por­tant in our EA de­ci­sions, but I ten­ta­tively fa­vor im­prov­ing the qual­ity of the far fu­ture by ex­pand­ing hu­man­ity’s moral cir­cle more than in­creas­ing the like­li­hood of the far fu­ture or hu­man­ity’s con­tinued ex­is­tence by re­duc­ing AIA-based ex­tinc­tion risk be­cause: (1) the far fu­ture seems to not be very good in ex­pec­ta­tion, and there’s a sig­nifi­cant like­li­hood of it be­ing very bad, and (2) moral cir­cle ex­pan­sion seems highly ne­glected both in EA and in so­ciety at large. Also, I think con­sid­er­a­tions of bias are very im­por­tant here, given how nec­es­sar­ily in­tu­itive and sub­jec­tive judg­ment calls make up the bulk of differ­ences in opinion on far fu­ture cause pri­ori­ti­za­tion. I find the ar­gu­ment in fa­vor of AIA that tech­ni­cal re­search might be more tractable than so­cial change to be the most com­pel­ling coun­ter­ar­gu­ment to my po­si­tion.


This post largely ag­gre­gates ex­ist­ing con­tent on the topic, rather than mak­ing origi­nal ar­gu­ments. I offer my views, mostly in­tu­itions, on the var­i­ous ar­gu­ments, but of course I re­main highly un­cer­tain given the limited amount of em­piri­cal ev­i­dence we have on far fu­ture cause pri­ori­ti­za­tion.

Many in the effec­tive al­tru­ism (EA) com­mu­nity think the far fu­ture is a very im­por­tant con­sid­er­a­tion when work­ing to do the most good. The ba­sic ar­gu­ment is that hu­man­ity could con­tinue to ex­ist for a very long time and could ex­pand its civ­i­liza­tion to the stars, cre­at­ing a very large amount of moral value. The main nar­ra­tive has been that this civ­i­liza­tion could be a very good one, and that in the com­ing decades, we face siz­able risks of ex­tinc­tions that could pre­vent us from ob­tain­ing this “cos­mic en­dow­ment.” The ar­gu­ment goes that these risks also seem like they can be re­duced with a fairly small amount of ad­di­tional re­sources (e.g. time, money), and there­fore ex­tinc­tion risk re­duc­tion is one of the most im­por­tant pro­jects of hu­man­ity and the EA com­mu­nity.

(This ar­gu­ment also de­pends on a moral view that bring­ing about the ex­is­tence of sen­tient be­ings can be a morally good and im­por­tant ac­tion, com­pa­rable to helping sen­tient be­ings who cur­rently ex­ist live bet­ter lives. This is a con­tentious view in aca­demic philos­o­phy. See, for ex­am­ple, “‘Mak­ing Peo­ple Happy, Not Mak­ing Happy Peo­ple’: A Defense of the Asym­me­try In­tu­ition in Pop­u­la­tion Ethics.”)

How­ever, one can ac­cept the first part of this ar­gu­ment — that there is a very large amount of ex­pected moral value in the far fu­ture and it’s rel­a­tively easy to make a differ­ence in that value — with­out de­cid­ing that ex­tinc­tion risk is the most im­por­tant pro­ject. In slightly differ­ent terms, one can de­cide not to work on re­duc­ing pop­u­la­tion risks, risks that could re­duce the num­ber of morally rele­vant in­di­vi­d­u­als in the far fu­ture (of course, these are only risks of harm if one be­lieves more in­di­vi­d­u­als is a good thing), and in­stead work on re­duc­ing qual­ity risks, risks that could re­duce the qual­ity of morally rele­vant in­di­vi­d­u­als’ ex­is­tence. One spe­cific type of qual­ity risk of­ten dis­cussed is a risk of as­tro­nom­i­cal suffer­ing (s-risk), defined as “events that would bring about suffer­ing on an as­tro­nom­i­cal scale, vastly ex­ceed­ing all suffer­ing that has ex­isted on Earth so far.”

This blog post makes the case for fo­cus­ing on qual­ity risks over pop­u­la­tion risks. More speci­fi­cally, though also more ten­ta­tively, it makes the case for fo­cus­ing on re­duc­ing qual­ity risk through moral cir­cle ex­pan­sion (MCE), the strat­egy of im­pact­ing the far fu­ture through in­creas­ing hu­man­ity’s con­cern for sen­tient be­ings who cur­rently re­ceive lit­tle con­sid­er­a­tion (i.e. widen­ing our moral cir­cle so it in­cludes them), over AI al­ign­ment (AIA), the strat­egy of im­pact­ing the far fu­ture through in­creas­ing the like­li­hood that hu­man­ity cre­ates an ar­tifi­cial gen­eral in­tel­li­gence (AGI) that be­haves as its de­sign­ers want it to (known as the al­ign­ment prob­lem).[1][2]

The ba­sic case for MCE is very similar to the case for AIA. Hu­man­ity could con­tinue to ex­ist for a very long time and could ex­pand its civ­i­liza­tion to the stars, cre­at­ing a very large num­ber of sen­tient be­ings. The sort of civ­i­liza­tion we cre­ate, how­ever, seems highly de­pen­dent on our moral val­ues and moral be­hav­ior. In par­tic­u­lar, it’s un­cer­tain whether many of those sen­tient be­ings will re­ceive the moral con­sid­er­a­tion they de­serve based on their sen­tience, i.e. whether they will be in our “moral cir­cle” or not, like the many sen­tient be­ings who have suffered in­tensely over the course of hu­man his­tory (e.g. from tor­ture, geno­cide, op­pres­sion, war). It seems the moral cir­cle can be ex­panded with a fairly small amount of ad­di­tional re­sources (e.g. time, money), and there­fore MCE is one of the most im­por­tant pro­jects of hu­man­ity and the EA com­mu­nity.

Note that MCE is a spe­cific kind of val­ues spread­ing, the par­ent cat­e­gory of MCE that de­scribes any effort to shift the val­ues and moral be­hav­ior of hu­man­ity and its de­cen­dants (e.g. in­tel­li­gent ma­chines) in a pos­i­tive di­rec­tion to benefit the far fu­ture. (Of course, some peo­ple at­tempt to spread val­ues in or­der to benefit the near fu­ture, but in this post we’re only con­sid­er­ing far fu­ture im­pact.)

I’m speci­fi­cally com­par­ing MCE and AIA be­cause AIA is prob­a­bly the most fa­vored method of re­duc­ing ex­tinc­tion risk in the EA com­mu­nity. AIA seems to be the de­fault cause area to fa­vor if one wants to have an im­pact on the far fu­ture, and I’ve been asked sev­eral times why I fa­vor MCE in­stead.

This dis­cus­sion risks con­flat­ing AIA with re­duc­ing ex­tinc­tion risk. Th­ese are two sep­a­rate ideas, since an un­al­igned AGI could still lead to a large num­ber of sen­tient be­ings, and an al­igned AGI could still po­ten­tially cause ex­tinc­tion or pop­u­la­tion stag­na­tion (e.g. if ac­cord­ing to the de­sign­ers’ val­ues, even the best civ­i­liza­tion the AGI could help build is still worse than nonex­is­tence). How­ever, most EAs fo­cused on AIA seem to be­lieve that the main risk is some­thing quite like ex­tinc­tion, such as the text­book ex­am­ple of an AI that seeks to max­i­mize the num­ber of pa­per­clips in the uni­verse. I’ll note when the dis­tinc­tion be­tween AIA and re­duc­ing ex­tinc­tion risk is rele­vant. Similarly, there are some­times im­por­tant pri­ori­ti­za­tion differ­ences be­tween MCE and other types of val­ues spread­ing, and those will be noted when they mat­ter. (This para­graph is an im­por­tant qual­ifi­ca­tion for the whole post. The pos­si­bil­ity of un­al­igned AGI that in­volves a civ­i­liza­tion (and, less so be­cause it seems quite un­likely, the pos­si­bil­ity of an AGI that causes ex­tinc­tion) is im­por­tant to con­sider for far fu­ture cause pri­ori­ti­za­tion. Un­for­tu­nately, elab­o­rat­ing on this would make this post far more com­pli­cated and far less read­able, and would not change many of the con­clu­sions. Per­haps I’ll be able to make a sec­ond post that adds this dis­cus­sion at some point.)

It’s also im­por­tant to note that I’m dis­cussing speci­fi­cally AIA here, not all AI safety work in gen­eral. AI safety, which just means in­creas­ing the like­li­hood of benefi­cial AI out­comes, could be in­ter­preted as in­clud­ing MCE, since MCE plau­si­bly makes it more likely that an AI would be built with good val­ues. How­ever, MCE doesn’t seem like a very plau­si­ble route to in­creas­ing the like­li­hood that AI is sim­ply al­igned with the in­ten­tions of its de­sign­ers, so I think MCE and AIA are fairly dis­tinct cause ar­eas.

AI safety can also in­clude work on re­duc­ing s-risks, such as speci­fi­cally re­duc­ing the like­li­hood of an un­al­igned AI that causes as­tro­nom­i­cal suffer­ing, rather than re­duc­ing the like­li­hood of all un­al­igned AI. I think this is an in­ter­est­ing cause area, though I am un­sure about its tractabil­ity and am not con­sid­er­ing it in the scope of this blog post.

The post’s pub­li­ca­tion was sup­ported by Greg Lewis, who was in­ter­ested in this topic and donated $1,000 to Sen­tience In­sti­tute, the think tank I co-founded which re­searches effec­tive strate­gies to ex­pand hu­man­ity’s moral cir­cle, con­di­tional on this post be­ing pub­lished to the Effec­tive Altru­ism Fo­rum. Lewis doesn’t nec­es­sar­ily agree with any of its con­tent. He de­cided on the con­di­tional dona­tion prior to the post be­ing writ­ten, and I did ask him to re­view the post prior to pub­li­ca­tion and it was ed­ited based on his feed­back.

The ex­pected value of the far future

Whether we pri­ori­tize re­duc­ing ex­tinc­tion risk partly de­pends on how good or bad we ex­pect hu­man civ­i­liza­tion to be in the far fu­ture, given it con­tinues to ex­ist. In my opinion, the as­sump­tion that it will be very good is a trag­i­cally un­ex­am­ined as­sump­tion in the EA com­mu­nity.

What if it’s close to zero?

If we think the far fu­ture is very good, that clearly makes re­duc­ing ex­tinc­tion risk more promis­ing. And if we think the far fu­ture is very bad, that makes re­duc­ing ex­tinc­tion risk not just un­promis­ing, but ac­tively very harm­ful. But what if it’s near the mid­dle, i.e. close to zero?[3] 80,000 Hours wrote that to be­lieve re­duc­ing ex­tinc­tion risk is not an EA pri­or­ity on the ba­sis of the ex­pected moral value of the far fu­ture,

...even if you’re not sure how good the fu­ture will be, or sus­pect it will be bad, you may want civil­i­sa­tion to sur­vive and keep its op­tions open. Peo­ple in the fu­ture will have much more time to study whether it’s de­sir­able for civil­i­sa­tion to ex­pand, stay the same size, or shrink. If you think there’s a good chance we will be able to act on those moral con­cerns, that’s a good rea­son to leave any fi­nal de­ci­sions to the wis­dom of fu­ture gen­er­a­tions. Over­all, we’re highly un­cer­tain about these big-pic­ture ques­tions, but that gen­er­ally makes us more con­cerned to avoid mak­ing any ir­re­versible com­mit­ments...

This rea­son­ing seems mis­taken to me be­cause want­ing “civil­i­sa­tion to sur­vive and keep its op­tions open” de­pends on op­ti­mism that civ­i­liza­tion will do re­search, make good[4] de­ci­sions based on that re­search, and be ca­pa­ble of im­ple­ment­ing those de­ci­sions.[5] In other words, while pre­vent­ing ex­tinc­tion keeps op­tions open for good things to hap­pen, it also keeps op­tions open for bad things to hap­pen, and de­siring this op­tion value de­pends on an op­ti­mism that the good things are more likely. In other words, the rea­son­ing as­sumes the op­ti­mism (think­ing the far fu­ture is good, or at least that hu­mans will make good de­ci­sions and be able to im­ple­ment them[6]), which is also its con­clu­sion.

Hav­ing that op­ti­mism makes sense in many de­ci­sions, which is why keep­ing op­tions open is of­ten a good heuris­tic. In EA, for ex­am­ple, peo­ple tend to do good things with their ca­reers, which means ca­reer op­tion value is a use­ful thing. This doesn’t read­ily trans­late to de­ci­sions where it’s not clear whether the ac­tors in­volved will have a pos­i­tive or nega­tive im­pact. (Note 80,000 Hours isn’t mak­ing this com­par­i­son. I’m just mak­ing it to ex­plain my own view here.)

There’s also a sense in which pre­vent­ing ex­tinc­tion risk de­creases op­tion value be­cause if hu­man­ity pro­gresses past cer­tain civ­i­liza­tional mile­stones that make ex­tinc­tion more un­likely — say, the rise of AGI or ex­pan­sion be­yond our own so­lar sys­tem — it might be­come harder or even im­pos­si­ble to press the “off switch” (end­ing civ­i­liza­tion). How­ever, I think most would agree that there’s more over­all op­tion value in a civ­i­liza­tion that has got­ten past these mile­stones be­cause there’s a much wider va­ri­ety of non-ex­tinct civ­i­liza­tions than ex­tinct civ­i­liza­tions.[7]

If you think that the ex­pected moral value of the far fu­ture is close to zero, even if you think it’s slightly pos­i­tive, then re­duc­ing ex­tinc­tion risk is a less promis­ing EA strat­egy than if you think it’s very pos­i­tive.

Key considerations

I think the con­sid­er­a­tions on this topic are best rep­re­sented as ques­tions where peo­ple’s be­liefs (mostly just in­tu­itions) vary on a long spec­trum. I’ll list these in or­der of where I would guess I have the strongest dis­agree­ment with peo­ple who be­lieve the far fu­ture is highly pos­i­tive in ex­pected value (short­ened as HPEV-EAs), and I’ll note where I don’t think I would dis­agree or might even have a more pos­i­tive-lean­ing be­lief than the av­er­age such per­son.

  1. I think there’s a sig­nifi­cant[8] chance that the moral cir­cle will fail to ex­pand to reach all sen­tient be­ings, such as ar­tifi­cial/​small/​weird minds (e.g. a so­phis­ti­cated com­puter pro­gram used to mine as­ter­oids, but one that doesn’t have the nor­mal fea­tures of sen­tient minds like fa­cial ex­pres­sions). In other words, I think there’s a sig­nifi­cant chance that pow­er­ful be­ings in the far fu­ture will have low will­ing­ness to pay for the welfare of many of the small/​weird minds in the fu­ture.[9]

  2. I think it’s likely that the pow­er­ful be­ings in the far fu­ture (analo­gous to hu­mans as the pow­er­ful be­ings on Earth in 2018) will use large num­bers of less pow­er­ful sen­tient be­ings, such as for recre­ation (e.g. safaris, war games), a la­bor force (e.g. colon­ists to dis­tant parts of the galaxy, con­struc­tion work­ers), sci­en­tific ex­per­i­ments, threats, (e.g. threat­en­ing to cre­ate and tor­ture be­ings that a ri­val cares about), re­venge, jus­tice, re­li­gion, or even pure sadism.[10] I be­lieve this be­cause there have been less pow­er­ful sen­tient be­ings for all of hu­man­ity’s ex­is­tence and well be­fore (e.g. pre­da­tion), many of whom are ex­ploited and harmed by hu­mans and other an­i­mals, and there seems to be lit­tle rea­son to think such power dy­nam­ics won’t con­tinue to ex­ist.

    Alter­na­tive uses of re­sources in­clude sim­ply work­ing to in­crease one’s own hap­piness di­rectly (e.g. chang­ing one’s neu­ro­phys­iol­ogy to be ex­tremely happy all the time), and con­struct­ing large non-sen­tient pro­jects like a work of art. Though each of these types of pro­ject could still in­clude sen­tient be­ings, such as for ex­per­i­men­ta­tion or a la­bor force.

    With the ex­cep­tion of threats and sadism, the less pow­er­ful minds seem like they could suffer in­tensely be­cause their in­tense suffer­ing could be in­stru­men­tally use­ful. For ex­am­ple, if the recre­ation is nos­talgic, or hu­man psy­chol­ogy per­sists in some form, we could see pow­er­ful be­ings caus­ing in­tense suffer­ing in or­der to see good triumph over evil or in or­der to satisfy cu­ri­os­ity about situ­a­tions that in­volve in­tense suffer­ing (of course, the pow­er­ful be­ings might not ac­knowl­edge the suffer­ing as suffer­ing, in­stead con­ceiv­ing of it as simu­lated but not ac­tu­ally ex­pe­rienced by the simu­lated en­tities). For an­other ex­am­ple, with a sen­tient la­bor force, pun­ish­ment could be a stronger mo­ti­va­tor than re­ward, as in­di­cated by the his­tory of evolu­tion on Earth.[11][12]

  3. I place sig­nifi­cant moral value on ar­tifi­cial/​small/​weird minds.

  4. I think it’s quite un­likely that hu­man de­scen­dants will find the cor­rect moral­ity (in the sense of moral re­al­ism, find­ing these mind-in­de­pen­dent moral facts), and I don’t think I would care much about that cor­rect moral­ity even if it ex­isted. For ex­am­ple, I don’t think I would be com­pel­led to cre­ate suffer­ing if the cor­rect moral­ity said this is what I should do. Of course, such moral facts are very difficult to imag­ine, so I’m quite un­cer­tain about what my re­ac­tion to them would be.[13]

  5. I’m skep­ti­cal about the view that tech­nol­ogy and effi­ciency will re­move the need for pow­er­less, high-suffer­ing, in­stru­men­tal moral pa­tients. An ex­am­ple of this pre­dicted trend is that fac­tory farmed an­i­mals seem un­likely to be nec­es­sary in the far fu­ture be­cause of their in­effi­ciency at pro­duc­ing an­i­mal prod­ucts. There­fore, I’m not par­tic­u­larly con­cerned about the fac­tory farm­ing of biolog­i­cal an­i­mals con­tin­u­ing into the far fu­ture. I am, how­ever, con­cerned about similar but less in­effi­cient sys­tems.

    An ex­am­ple of how tech­nol­ogy might not ren­der sen­tient la­bor forces and other in­stru­men­tal sen­tient be­ings ob­so­lete is how hu­mans seem mo­ti­vated to have power and con­trol over the world, and in par­tic­u­lar seem more satis­fied by hav­ing power over other sen­tient be­ings than by hav­ing power over non-sen­tient things like bar­ren land­scapes.

    I do still be­lieve there’s a strong ten­dency to­wards effi­ciency and that this has the po­ten­tial to ren­der much suffer­ing ob­so­lete; I just have more skep­ti­cism about it than I think is of­ten as­sumed by HPEV-EAs.[14]

  6. I’m skep­ti­cal about the view that hu­man de­scen­dants will op­ti­mize their re­sources for hap­piness (i.e. cre­ate he­do­nium) rel­a­tive to op­ti­miz­ing for suffer­ing (i.e. cre­ate do­lorium).[15] Hu­mans cur­rently seem more de­liber­ately driven to cre­ate he­do­nium, but cre­at­ing do­lorium might be more in­stru­men­tally use­ful (e.g. as a threat to ri­vals[16]).

    On this topic, I similarly do still be­lieve there’s a higher like­li­hood of cre­at­ing he­do­nium; I just have more skep­ti­cism about it than I think is of­ten as­sumed by EAs.

  7. I’m largely in agree­ment with the av­er­age HPEV-EA in my moral ex­change rate be­tween hap­piness and suffer­ing. How­ever, I think those EAs tend to greatly un­der­es­ti­mate how much the em­piri­cal ten­dency to­wards suffer­ing over hap­piness (e.g. wild an­i­mals seem to en­dure much more suffer­ing than hap­piness) is ev­i­dence of a fu­ture em­piri­cal asym­me­try.

    My view here is partly in­formed by the ca­pac­i­ties for hap­piness and suffer­ing that have evolved in hu­mans and other an­i­mals, the ca­pac­i­ties that seem to be driven by cul­tural forces (e.g. cor­po­ra­tions seem to care more about down­sides than up­sides, per­haps be­cause it’s eas­ier in gen­eral to de­stroy and harm things than to cre­ate and grow them), and spec­u­la­tion about what could be done in more ad­vanced civ­i­liza­tions, such as my best guess on what a planet op­ti­mized for hap­piness and a planet op­ti­mized for suffer­ing would look like. For ex­am­ple, I think a given amount of do­lorium/​dystopia (say, the amount that can be cre­ated with 100 joules of en­ergy) is far larger in ab­solute moral ex­pected value than he­do­nium/​utopia made with the same re­sources.

  8. I’m un­sure of how much I would dis­agree with HPEV-EAs about the ar­gu­ment that we should be highly un­cer­tain about the like­li­hood of differ­ent far fu­ture sce­nar­ios be­cause of how highly spec­u­la­tive our ev­i­dence is, which pushes my es­ti­mate of the ex­pected value of the far fu­ture to­wards the mid­dle of the pos­si­ble range, i.e. to­wards zero.

  9. I’m un­sure of how much I would dis­agree with HPEV-EAs about the per­sis­tence of evolu­tion­ary forces into the fu­ture (i.e. how much fu­ture be­ings will be de­ter­mined by fit­ness, rather than char­ac­ter­is­tics we might hope for like al­tru­ism and hap­piness).[17]

  10. From the his­tor­i­cal per­spec­tive, it wor­ries me that many his­tor­i­cal hu­mans seem like they would be quite un­happy with the way hu­man moral­ity changed af­ter them, such as the way Western coun­tries are less con­cerned about pre­vi­ously-con­sid­ered-im­moral be­hav­ior like ho­mo­sex­u­al­ity and glut­tony than their an­ces­tors were in 500 CE. (Of course, one might think his­tor­i­cal hu­mans would agree with mod­ern hu­mans upon re­flec­tion, or think that much of hu­man­ity’s moral changes have been due to im­proved em­piri­cal un­der­stand­ing of the world.)[18]

  11. I’m largely in agree­ment with HPEV-EAs that hu­man­ity’s moral cir­cle has a track record of ex­pan­sion and seems likely to con­tinue ex­pand­ing. For ex­am­ple, I think it’s quite likely that pow­er­ful be­ings in the far fu­ture will care a lot about charis­matic biolog­i­cal an­i­mals like elephants or chim­panzees, or what­ever be­ings have a similar re­la­tion­ship to those pow­er­ful be­ings as hu­man­ity has to elephants and chim­panzees. (As men­tioned above, my pes­simism about the con­tinued ex­pan­sion is largely due to con­cern about the mag­ni­tude of bad-but-un­likely out­comes and the harms that could oc­cur due to MCE stag­na­tion.)

Un­for­tu­nately, we don’t have much em­piri­cal data or solid the­o­ret­i­cal ar­gu­ments on these top­ics, so the dis­agree­ments I’ve had with HPEV-EAs have mostly just come down to differ­ences in in­tu­ition. This is a com­mon theme for pri­ori­ti­za­tion among far fu­ture efforts. We can out­line the rele­vant fac­tors and a lit­tle em­piri­cal data, but the cru­cial fac­tors seem to be left to spec­u­la­tion and in­tu­ition.

Most of these con­sid­er­a­tions are about how so­ciety will de­velop and uti­lize new tech­nolo­gies, which sug­gests we can de­velop rele­vant in­tu­itions and spec­u­la­tive ca­pac­ity by study­ing so­cial and tech­nolog­i­cal change. So even though these judg­ments are in­tu­itive, we could po­ten­tially im­prove them with more study of big-pic­ture so­cial and tech­nolog­i­cal change, such as Sen­tience In­sti­tute’s MCE re­search or Robin Han­son’s book on The Age of Em that an­a­lyzes what a fu­ture of brain em­u­la­tions would look like. (This sort of em­piri­cal re­search is what I see as the most promis­ing fu­ture re­search av­enue for far fu­ture cause pri­ori­ti­za­tion. I worry EAs overem­pha­size arm­chair re­search (like most of this post, ac­tu­ally) for var­i­ous rea­sons.[19])

I’d per­son­ally be quite in­ter­ested in a sur­vey of peo­ple with ex­per­tise in the rele­vant fields of so­cial, tech­nolog­i­cal, and philo­soph­i­cal re­search, in which they’re asked about each of the con­sid­er­a­tions above, though it might be hard to get a de­cent sam­ple size, and I think it would be quite difficult to de­bias the re­spon­dents (see the Bias sec­tion of this post).

I’m also in­ter­ested in quan­ti­ta­tive analy­ses of these con­sid­er­a­tions — calcu­la­tions in­clud­ing all of these po­ten­tial out­comes and as­so­ci­ated like­li­hoods. As far as I know, this kind of anal­y­sis has only been at­tempted so far by Michael Dick­ens in “A Com­plete Quan­ti­ta­tive Model for Cause Selec­tion,” in which Dick­ens notes that, “Values spread­ing may be bet­ter than ex­is­ten­tial risk re­duc­tion.” While this quan­tifi­ca­tion might seem hope­lessly spec­u­la­tive, I think it’s highly use­ful even in such situ­a­tions. Of course, rigor­ous de­bi­as­ing is also very im­por­tant here.

Over­all, I think the far fu­ture is close to zero in ex­pected moral value, mean­ing it’s not nearly as good as is com­monly as­sumed, im­plic­itly or ex­plic­itly, in the EA com­mu­nity.


Range of outcomes

It’s difficult to com­pare the scale of far fu­ture im­pacts since they are all as­tro­nom­i­cal, and I find the con­sid­er­a­tion of scale here to over­all not be very use­ful.

Tech­ni­cally, it seems like MCE in­volves a larger range of po­ten­tial out­comes than re­duc­ing ex­tinc­tion risk through AIA be­cause, at least from a clas­si­cal con­se­quen­tial­ist per­spec­tive (giv­ing weight to both nega­tive and pos­i­tive out­comes), it could make the differ­ence be­tween some of the worst far fu­tures imag­in­able and the best far fu­tures. Re­duc­ing ex­tinc­tion risk through AIA only makes the differ­ence be­tween nonex­is­tence (a far fu­ture of zero value) and what­ever world comes to ex­ist. If one be­lieves the far fu­ture is highly pos­i­tive, this could still be a very large range, but it would still be less than the po­ten­tial change from MCE.

How much less de­pends on one’s views of how bad the worst fu­ture is rel­a­tive to the best fu­ture. If the ab­solute value is the same, then MCE has a range twice as large as ex­tinc­tion risk.

As men­tioned in the Con­text sec­tion above, the change in the far fu­ture that AIA could achieve might not ex­actly be ex­tinc­tion ver­sus non-ex­tinc­tion. While an al­igned AI would prob­a­bly not in­volve the ex­tinc­tion of all sen­tient be­ings, since that would re­quire the val­ues of its cre­ators to pre­fer ex­tinc­tion over all other op­tions, an un­al­igned AI might not nec­es­sar­ily in­volve ex­tinc­tion. To use the canon­i­cal AIA ex­am­ple of a “pa­per­clip max­i­mizer” (used to illus­trate how an AI could eas­ily have a harm­ful goal with­out any mal­i­cious in­ten­tion), the rogue AI might cre­ate sen­tient be­ings as a la­bor force to im­ple­ment its goal of max­i­miz­ing the num­ber of pa­per­clips in the uni­verse, or cre­ate sen­tient be­ings for some other goal.[20]

This means that the range of AIA is the differ­ence be­tween the po­ten­tial uni­verses with al­igned AI and un­al­igned AI, which could be very good fu­tures con­trasted with very bad fu­tures, rather than just very good fu­tures con­trasted with nonex­is­tence.

Brian To­masik has writ­ten out a thought­ful (though nec­es­sar­ily spec­u­la­tive and highly un­cer­tain) break­down of the risks of suffer­ing in both al­igned and un­al­igned AI sce­nar­ios, which weakly sug­gests that an al­igned AI would lead to more suffer­ing in ex­pec­ta­tion.

All things con­sid­ered, it seems that the range of qual­ity risk re­duc­tion (in­clud­ing MCE) is larger than that of ex­tinc­tion risk re­duc­tion (in­clud­ing AIA, de­pend­ing on one’s view of what differ­ence AI al­ign­ment makes), but this seems like a fairly weak con­sid­er­a­tion to me be­cause (i) it’s a differ­ence of roughly two-fold, which is quite small rel­a­tive to the differ­ences of ten-times, a thou­sand-times, etc. that we fre­quently see in cause pri­ori­ti­za­tion, (ii) there are nu­mer­ous fairly ar­bi­trary judg­ment calls (like con­sid­er­ing re­duc­ing ex­tinc­tion risk from AI ver­sus AIA ver­sus AI safety) that lead to differ­ent re­sults.[21]

Like­li­hood of differ­ent far fu­ture sce­nar­ios[22][23]

MCE is rele­vant for many far fu­ture sce­nar­ios where AI doesn’t un­dergo the sort of “in­tel­li­gence ex­plo­sion” or similar pro­gres­sion that makes AIA im­por­tant; for ex­am­ple, if AGI is de­vel­oped by an in­sti­tu­tion like a for­eign coun­try that has lit­tle in­ter­est in AIA, or if AI is never de­vel­oped, or if it’s de­vel­oped slowly in a way that makes safety ad­just­ments quite easy as that de­vel­op­ment oc­curs. In each of these sce­nar­ios, the way so­ciety treats sen­tient be­ings, es­pe­cially those cur­rently out­side the moral cir­cle, seems like it could still be af­fected by MCE. As men­tioned ear­lier, I think there is a sig­nifi­cant chance that the moral cir­cle will fail to ex­pand to reach all sen­tient be­ings, and I think a small moral cir­cle could very eas­ily lead to sub­op­ti­mal or dystopian far fu­ture out­comes.

On the other hand, some pos­si­ble far fu­ture civ­i­liza­tions might not in­volve moral cir­cles, such as if there is an egal­i­tar­ian so­ciety where each in­di­vi­d­ual is able to fully rep­re­sent their own in­ter­ests in de­ci­sion-mak­ing and this so­cietal struc­ture was not reached through MCE be­cause these be­ings are all equally pow­er­ful for tech­nolog­i­cal rea­sons (and no other be­ings ex­ist and they have no in­ter­est in cre­at­ing ad­di­tional be­ings). Some AI out­comes might not be af­fected by MCE, such as an un­al­igned AI that does some­thing like max­i­miz­ing the num­ber of pa­per­clips for rea­sons other than hu­man val­ues (such as a pro­gram­ming er­ror) or one whose de­sign­ers cre­ate its value func­tion with­out re­gard for hu­man­ity’s cur­rent moral views (“co­her­ent ex­trap­o­lated vo­li­tion” could be an ex­am­ple of this, though I agree with Brian To­masik that cur­rent moral views will likely be im­por­tant in this sce­nario).

Given my cur­rent, highly un­cer­tain es­ti­mates of the like­li­hood of var­i­ous far fu­ture sce­nar­ios, I would guess that MCE is ap­pli­ca­ble in some­what more cases than AIA, sug­gest­ing it’s eas­ier to make a differ­ence to the far fu­ture through MCE. (This is analo­gous to say­ing the risk of MCE-failure seems greater than the risk of AIA-failure, though I’m try­ing to avoid sim­plify­ing these into bi­nary out­comes.)


How much of an im­pact can we ex­pect our marginal re­sources to have on the prob­a­bil­ity of ex­tinc­tion risk, or on the moral cir­cle of the far fu­ture?

So­cial change ver­sus tech­ni­cal research

One may be­lieve chang­ing peo­ple’s at­ti­tudes and be­hav­ior is quite difficult, and di­rect work on AIA in­volves a lot less of that. While AIA likely in­volves in­fluenc­ing some peo­ple (e.g. poli­cy­mak­ers, re­searchers, and cor­po­rate ex­ec­u­tives), MCE is al­most en­tirely in­fluenc­ing peo­ple’s at­ti­tudes and be­hav­ior.[24]

How­ever, one could in­stead be­lieve that tech­ni­cal re­search is more difficult in gen­eral, point­ing to po­ten­tial ev­i­dence such as the large amount of money spent on tech­ni­cal re­search (e.g. by Sili­con Valley) with of­ten very lit­tle to show for it, while huge so­cial change seems to some­times be effected by small groups of ad­vo­cates with rel­a­tively lit­tle money (e.g. or­ga­niz­ers of rev­olu­tions in Egypt, Ser­bia, and Turkey). (I don’t mean this as a very strong or per­sua­sive ar­gu­ment, just as a pos­si­bil­ity. There are plenty of ex­am­ples of tech done with few re­sources and so­cial change done with many.)

It’s hard to speak so gen­er­ally, but I would guess that tech­ni­cal re­search tends to be eas­ier than caus­ing so­cial change. And this seems like the strongest ar­gu­ment in fa­vor of work­ing on AIA over work­ing on MCE.

Track record

In terms of EA work ex­plic­itly fo­cused on the goals of AIA and MCE, AIA has a much bet­ter track record. The past few years have seen sig­nifi­cant tech­ni­cal re­search out­put from or­ga­ni­za­tions like MIRI and FHI, as doc­u­mented by user Larks on the EA Fo­rum for 2016 and 2017. I’d defer read­ers to those posts, but as a brief ex­am­ple, MIRI had an ac­claimed pa­per on “Log­i­cal In­duc­tion,” which used a fi­nan­cial mar­ket pro­cess to es­ti­mate the like­li­hood of log­i­cal facts (e.g. math­e­mat­i­cal propo­si­tions like the Rie­mann hy­poth­e­sis) that we aren’t yet sure of. This is analo­gous to how we use prob­a­bil­ity the­ory to es­ti­mate the like­li­hood of em­piri­cal facts (e.g. a dice roll). In the big­ger pic­ture of AIA, this re­search could help lay the tech­ni­cal foun­da­tion for build­ing an al­igned AGI. See Larks’ post for a dis­cus­sion of more pa­pers like this, as well as non-tech­ni­cal work done by AI-fo­cused or­ga­ni­za­tions such as the Fu­ture of Life In­sti­tute’s open let­ter on AI safety signed by lead­ing AI re­searchers and cited by the White House’s “Re­port on the Fu­ture of Ar­tifi­cial In­tel­li­gence.”

Us­ing an analo­gous defi­ni­tion for MCE, EA work ex­plic­itly fo­cused on MCE (mean­ing ex­pand­ing the moral cir­cle in or­der to im­prove the far fu­ture) ba­si­cally only started in 2017 with the found­ing of Sen­tience In­sti­tute (SI), though there were var­i­ous blog posts and ar­ti­cles dis­cussing it be­fore then. SI has ba­si­cally finished four re­search pro­jects: (1) Foun­da­tional Ques­tion Sum­maries that sum­ma­rize ev­i­dence we have on im­por­tant effec­tive an­i­mal ad­vo­cacy (EAA) ques­tions, in­clud­ing a sur­vey of EAA re­searchers, (2) a case study of the Bri­tish an­ti­s­lav­ery move­ment to bet­ter un­der­stand how they achieved one of the first ma­jor moral cir­cle ex­pan­sions in mod­ern his­tory, (3) a case study of nu­clear power to bet­ter un­der­stand how some coun­tries (e.g. France) en­thu­si­as­ti­cally adopted this new tech­nol­ogy, but oth­ers (e.g. the US) didn’t, (4) a na­tion­ally rep­re­sen­ta­tive poll of US at­ti­tudes to­wards an­i­mal farm­ing and an­i­mal-free food.

With a broader defi­ni­tion of MCE that in­cludes ac­tivi­ties that peo­ple pri­ori­tiz­ing MCE tend to think are quite in­di­rectly effec­tive (see the Ne­glect­ed­ness sec­tion for dis­cus­sion of defi­ni­tions), we’ve seen EA achieve quite a lot more, such as the work done by The Hu­mane League, Mercy For An­i­mals, An­i­mal Equal­ity, and other or­ga­ni­za­tions on cor­po­rate welfare re­forms to an­i­mal farm­ing prac­tices, and the work done by The Good Food In­sti­tute and oth­ers on sup­port­ing a shift away from an­i­mal farm­ing, es­pe­cially through sup­port­ing new tech­nolo­gies like so-called “clean meat.”

Since I fa­vor the nar­rower defi­ni­tion, I think AIA out­performs MCE on track record, but the differ­ence in track record seems largely ex­plained by the greater re­sources spent on AIA, which makes it a less im­por­tant con­sid­er­a­tion. (Also, when I per­son­ally de­cided to fo­cus on MCE, SI did not yet ex­ist, so the lack of track record was an even stronger con­sid­er­a­tion in fa­vor of AIA (though MCE was also more ne­glected at that time).)

To be clear, the track records of all far fu­ture pro­jects tend to be weaker than near-term pro­jects where we can di­rectly see the re­sults.


If one val­ues ro­bust­ness, mean­ing a higher cer­tainty that one is hav­ing a pos­i­tive im­pact, ei­ther for in­stru­men­tal or in­trin­sic rea­sons, then AIA might be more promis­ing be­cause once we de­velop an al­igned AI (that con­tinues to be al­igned over time), the work of AIA is done and won’t need to be re­done in the fu­ture. With MCE, as­sum­ing the ad­vent of AI or similar de­vel­op­ments won’t fix so­ciety’s val­ues in place (known as “value lock-in”), then MCE progress could more eas­ily be un­done, es­pe­cially if one be­lieves there’s a so­cial set­point that hu­man­ity drifts back to­wards when moral progress is made.[25]

I think the as­sump­tions of this ar­gu­ment make it quite weak: I’d guess an “in­tel­li­gence ex­plo­sion” has a sig­nifi­cant chance of value lock-in,[26][27] and I don’t think there’s a set­point in the sense that pos­i­tive moral change in­creases the risk of nega­tive moral change. I also don’t value ro­bust­ness in­trin­si­cally at all or in­stru­men­tally very much; I think that there is so much un­cer­tainty in all of these strate­gies and such weak prior be­liefs[28] that differ­ences in cer­tainty of im­pact mat­ter rel­a­tively lit­tle.


Work on ei­ther cause area runs the risk of back­firing. The main risk for AIA seems to be that the tech­ni­cal re­search done to bet­ter un­der­stand how to build an al­igned AI will in­crease AI ca­pa­bil­ities gen­er­ally, mean­ing it’s also eas­ier for hu­man­ity to pro­duce an un­al­igned AI. The main risk for MCE seems to be that cer­tain ad­vo­cacy strate­gies will end up hav­ing the op­po­site effect as in­tended, such as a con­fronta­tional protest for an­i­mal rights that ends up putting peo­ple off of the cause.

It’s un­clear which pro­ject has bet­ter near-term prox­ies and feed­back loops to as­sess and in­crease long-term im­pact. AIA has tech­ni­cal prob­lems with solu­tions that can be math­e­mat­i­cally proven, but these might end up hav­ing lit­tle bear­ing on fi­nal AIA out­comes, such as if an AGI isn’t de­vel­oped us­ing the method that was ad­vised or if tech­ni­cal solu­tions aren’t im­ple­mented by policy-mak­ers. MCE has met­rics like pub­lic at­ti­tudes and prac­tices. My weak in­tu­ition here, and the weak in­tu­ition of other rea­son­able peo­ple I’ve dis­cussed this with, is that MCE has bet­ter near-term prox­ies.

It’s un­clear which pro­ject has more his­tor­i­cal ev­i­dence that EAs can learn from to be more effec­tive. AIA has pre­vi­ous sci­en­tific, math­e­mat­i­cal, and philo­soph­i­cal re­search and tech­nolog­i­cal suc­cesses and failures, while MCE has pre­vi­ous psy­cholog­i­cal, so­cial, poli­ti­cal, and eco­nomic re­search and ad­vo­cacy suc­cesses and failures.

Fi­nally, I do think that we learn a lot about tractabil­ity just by work­ing di­rectly on an is­sue. Given how lit­tle effort has gone into MCE it­self (see Ne­glect­ed­ness be­low), I think we could re­solve a sig­nifi­cant amount of un­cer­tainty with more work in the field.

Over­all, con­sid­er­ing only di­rect tractabil­ity (i.e. ig­nor­ing in­for­ma­tion value due to ne­glect­ed­ness, which would help other EAs with their cause pri­ori­ti­za­tion), I’d guess AIA is a lit­tle more tractable.


With ne­glect­ed­ness, we also face a challenge of how broadly to define the cause area. In this case, we have a fairly clear goal with our defi­ni­tion: to best as­sess how much low-hang­ing fruit is available. To me, it seems like there are two sim­ple defi­ni­tions that meet this goal: (i) or­ga­ni­za­tions or in­di­vi­d­u­als work­ing ex­plic­itly on the cause area, (ii) or­ga­ni­za­tions or in­di­vi­d­u­als work­ing on the strate­gies that are seen as top-tier by peo­ple fo­cused on the cause area. How much one fa­vors (i) ver­sus (ii) de­pends largely on whether one thinks the top-tier strate­gies are fairly well-es­tab­lished and thus (ii) makes sense, or whether they will change over time such that one should fa­vor (i) be­cause those or­ga­ni­za­tions and in­di­vi­d­u­als will be bet­ter able to ad­just.[29]

With the ex­plicit fo­cus defi­ni­tions of AIA and MCE (re­call this in­cludes hav­ing a far fu­ture fo­cus), it seems that MCE is much more ne­glected and has more low-hang­ing fruit.[30] For ex­am­ple, there is only one or­ga­ni­za­tion that I know of ex­plic­itly com­mit­ted to MCE in the EA com­mu­nity (SI), while nu­mer­ous or­ga­ni­za­tions (MIRI, CHAI, part of FHI, part of CSER, even parts of AI ca­pa­bil­ities or­ga­ni­za­tions like Mon­treal In­sti­tute for Learn­ing Al­gorithms, Deep­Mind, and OpenAI, etc.) are ex­plic­itly com­mit­ted to AIA. Be­cause MCE seems more ne­glected, we could learn a lot about MCE through SI’s ini­tial work, such as how eas­ily ad­vo­cates have achieved MCE through­out his­tory.

If we in­clude those work­ing on the cause area with­out an ex­plicit fo­cus, then that seems to widen the defi­ni­tion of MCE to in­clude some of the top strate­gies be­ing used to ex­pand the moral cir­cle in the near-term, such as farmed an­i­mal work done by An­i­mal Char­ity Eval­u­a­tors and it’s top-recom­mended char­i­ties, which have a com­bined bud­get of around $7.5 mil­lion in 2016. The com­bined bud­gets of top-tier AIA work is harder to es­ti­mate, but the Cen­tre for Effec­tive Altru­ism es­ti­mates all AIA work in 2016 was around $6.6 mil­lion. The AIA bud­gets seem to be in­creas­ing more quickly than the MCE bud­gets, es­pe­cially given the grant-mak­ing of the Open philan­thropy pro­ject. We could also in­clude EA move­ment-build­ing or­ga­ni­za­tions that place a strong fo­cus on re­duc­ing ex­tinc­tion risk, and even AIA speci­fi­cally, such as 80,000 Hours. The cat­e­go­riza­tion for MCE seems to have more room to broaden, per­haps all the way to main­stream an­i­mal ad­vo­cacy strate­gies like the work of Peo­ple for the Eth­i­cal Treat­ment of An­i­mals (PETA), which might make AIA more ne­glected. (It could po­ten­tially go even farther, such as ad­vo­cat­ing for hu­man sweat­shop la­bor­ers, but that seems too far re­moved and I don’t know any MCE ad­vo­cates who think it’s plau­si­bly top-tier.)

I think there’s a differ­ence in ap­ti­tude that sug­gests MCE is more ne­glected. Mo­ral ad­vo­cacy seems like a field which, while quite crowded, seems rel­a­tively easy for de­liber­ate, thought­ful peo­ple to vastly out­perform the av­er­age ad­vo­cate,[31] which can lead to sur­pris­ingly large im­pact (e.g. EAs have already had far more suc­cess in pub­lish­ing their writ­ing, such as books and op-eds, than most writ­ers hope for).[32] Ad­di­tion­ally, de­spite cen­turies of ad­vo­cacy, very lit­tle qual­ity re­search has been done to crit­i­cally ex­am­ine what ad­vo­cacy is effec­tive and what’s not, while the fields of math, com­puter sci­ence, and ma­chine learn­ing in­volve sub­stan­tial self-re­flec­tion and are largely worked on by aca­demics who seem to use more crit­i­cal think­ing than the av­er­age ac­tivist (e.g. there’s far more skep­ti­cism in these aca­demic com­mu­ni­ties, a de­mand for rigor and ex­per­i­men­ta­tion that’s rarely seen among ad­vo­cates). In gen­eral, I think the ap­ti­tude of the av­er­age so­cial change ad­vo­cate is much lower than that of the av­er­age tech­nolog­i­cal re­searcher, sug­gest­ing MCE is more ne­glected, though of course other fac­tors also count.

The rel­a­tive ne­glect­ed­ness of MCE also seems likely to con­tinue, given the greater self-in­ter­est hu­man­ity has in AIA rel­a­tive to MCE and, in my opinion, the net bi­ases to­wards AIA de­scribed in the Bi­ases sec­tion of this blog post. (This self-in­ter­est ar­gu­ment is a par­tic­u­larly im­por­tant con­sid­er­a­tion for pri­ori­tiz­ing MCE over AIA in my view.[33])

How­ever, while ne­glect­ed­ness is typ­i­cally thought to make a pro­ject more tractable, it seems that ex­ist­ing work in the ex­tinc­tion risk space has made marginal con­tri­bu­tions more im­pact­ful in some ways. For ex­am­ple, tal­ented AI re­searchers can find work rel­a­tively eas­ily at an or­ga­ni­za­tion ded­i­cated to AIA, while the path for tal­ented MCE re­searchers is far less clear and easy. This al­ludes to the differ­ence in tractabil­ity that might ex­ist be­tween la­bor re­sources and fund­ing re­sources, as it cur­rently seems like MCE is much more fund­ing-con­strained[34] while AIA is largely tal­ent-con­strained.

As an­other ex­am­ple, there are already solid in­roads be­tween the AIA com­mu­nity and the AI de­ci­sion-mak­ers, and AI de­ci­sion-mak­ers have already ex­pressed in­ter­est in AIA, sug­gest­ing that in­fluenc­ing them with re­search re­sults will be fairly easy once those re­search re­sults are in hand. This means both that our es­ti­ma­tion of AIA’s ne­glect­ed­ness should de­crease, and that our es­ti­ma­tion of its non-ne­glect­ed­ness tractabil­ity should in­crease, in the sense that ne­glect­ed­ness is a part of tractabil­ity. (The defi­ni­tions in this frame­work vary.)

All things con­sid­ered, I find MCE to be more com­pel­ling from a ne­glect­ed­ness per­spec­tive, par­tic­u­larly due to the cur­rent EA re­source al­lo­ca­tion and the self-in­ter­est hu­man­ity has, and will most likely con­tinue to have, in AIA. When I de­cided to fo­cus on MCE, there was an even stronger case for ne­glect­ed­ness be­cause no or­ga­ni­za­tion ex­isted com­mit­ted to that goal (SI was founded in 2017), though there was an in­creased down­side to MCE — the even more limited track record.


Values spread­ing as a far fu­ture in­ter­ven­tion has been crit­i­cized on the fol­low­ing grounds: Peo­ple have very differ­ent val­ues, so try­ing to pro­mote your val­ues and change other peo­ple’s could be seen as un­co­op­er­a­tive. Co­op­er­a­tion seems to be use­ful both di­rectly (e.g. how will­ing are other peo­ple to help us out if we’re fight­ing them?) and in a broader sense be­cause of su­per­ra­tional­ity, an ar­gu­ment that one should help oth­ers even when there’s no causal mechanism for re­cip­ro­ca­tion.[35]

I think this is cer­tainly a good con­sid­er­a­tion against some forms of val­ues spread­ing. For ex­am­ple, I don’t think it’d be wise for an MCE-fo­cused EA to dis­rupt the Effec­tive Altru­ism Global con­fer­ences (e.g. yell on stage and try to keep the con­fer­ence from con­tin­u­ing) if they have an in­suffi­cient fo­cus on MCE. This seems highly in­effec­tive be­cause of how un­co­op­er­a­tive it is, given the EA space is sup­posed to be one for hav­ing challeng­ing dis­cus­sions and solv­ing prob­lems, not merely ad­vo­cat­ing one’s po­si­tions like a poli­ti­cal rally.

How­ever, I don’t think it holds much weight against MCE in par­tic­u­lar for two rea­sons: First, be­cause I don’t think MCE is par­tic­u­larly un­co­op­er­a­tive. For ex­am­ple, I never bring up MCE with some­one and hear, “But I like to keep my moral cir­cle small!” I think this is be­cause there are many differ­ent com­po­nents of our at­ti­tudes and wor­ld­view that we re­fer to as val­ues and morals. Peo­ple have some deeply-held val­ues that seem strongly re­sis­tant to change, such as their re­li­gion or the welfare of their im­me­di­ate fam­ily, but very few peo­ple seem to have small moral cir­cles as a deeply-held value. In­stead, the small moral cir­cle seems to mostly be a su­perfi­cial, ca­sual value (though it’s of­ten con­nected to the deeper val­ues) that peo­ple are okay with — or even happy about — chang­ing.[36]

Se­cond, in­so­far as MCE is un­co­op­er­a­tive, I think a large num­ber of other EA in­ter­ven­tions, in­clud­ing AIA, are similarly un­co­op­er­a­tive. Many peo­ple even in the EA com­mu­nity are con­cerned with, or even op­posed to, AIA. For ex­am­ple, if one be­lieves an al­igned AI would cre­ate a worse far fu­ture than an un­al­igned AI, or if one thinks AIA is harm­fully dis­tract­ing from more im­por­tant is­sues and gives EA a bad name. This isn’t to say I think AIA is bad be­cause it’s un­co­op­er­a­tive — on the con­trary, this seems like a level of un­co­op­er­a­tive­ness that’s of­ten nec­es­sary for ded­i­cated EAs. (In a triv­ial way, ba­si­cally all ac­tion in­volves un­co­op­er­a­tive­ness be­cause it’s always about chang­ing the sta­tus quo or pre­vent­ing the sta­tus quo from chang­ing.[37] Even in­ac­tion can in­volve un­co­op­er­a­tive­ness if it means not work­ing to help some­one who would like your help.)

I do think it’s more im­por­tant to be co­op­er­a­tive in some other situ­a­tions, such as if one has a very differ­ent value sys­tem than some of their col­leagues, as might be the case for the Foun­da­tional Re­search In­sti­tute, which ad­vo­cates strongly for co­op­er­a­tion with other EAs.

Co­op­er­a­tion with fu­ture do-gooders

Another ar­gu­ment against val­ues spread­ing goes some­thing like, “We can worry about val­ues af­ter we’ve safely de­vel­oped AGI. Our trade­off isn’t, ‘Should we work on val­ues or AI?’ but in­stead ‘Should we work on AI now and val­ues later, or val­ues now and maybe AI later if there’s time?’”

I agree with one in­ter­pre­ta­tion of the first part of this ar­gu­ment, that ur­gency is an im­por­tant fac­tor and AIA does seem like a time-sen­si­tive cause area. How­ever, I think MCE is similarly time-sen­si­tive be­cause of risks of value lock-in where our de­scen­dants’ moral­ity be­comes much harder to change, such as if AI de­sign­ers choose to fix the val­ues of an AGI, or at least to make them in­de­pen­dent of other peo­ple’s opinions (they could still be amenable to self-re­flec­tion of the de­signer and new em­piri­cal data about the uni­verse other than peo­ple’s opinions)[38]; if hu­man­ity sends out coloniza­tion ves­sels across the uni­verse that are trav­el­ing too fast for us to ad­just based on our chang­ing moral views; or if so­ciety just be­comes too wide and dis­parate to have effec­tive so­cial change mechanisms like we do to­day on Earth.

I dis­agree with the stronger in­ter­pre­ta­tion, that we can count on some sort of co­op­er­a­tion with or con­trol over fu­ture peo­ple. There might be some ex­tent to which we can do this, such as via su­per­ra­tional­ity, but that seems like a fairly weak effect. In­stead, I think we’re largely on our own, de­cid­ing what we do in the next few years (or per­haps in our whole ca­reer), and just mak­ing our best guess of what fu­ture peo­ple will do. It sounds very difficult to strike a deal with them that will en­sure they work on MCE in ex­change for us work­ing on AIA.


I’m always cau­tious about bring­ing con­sid­er­a­tions of bias into an im­por­tant dis­cus­sion like this. Con­sid­er­a­tions eas­ily turn into messy, per­sonal at­tacks, and of­ten you can fling roughly-equal con­sid­er­a­tions of counter-bi­ases when ac­cu­sa­tions of bias are hurled at you. How­ever, I think we should give them se­ri­ous con­sid­er­a­tion in this case. First, I want to be ex­haus­tive in this blog post, and that means throw­ing ev­ery con­sid­er­a­tion on the table, even messy ones. Se­cond, my own cause pri­ori­ti­za­tion “jour­ney” led me first to AIA and other non-MCE/​non-an­i­mal-ad­vo­cacy EA pri­ori­ties (mainly EA move­ment-build­ing), and it was con­sid­er­a­tions of bias that al­lowed me to look at the ob­ject-level ar­gu­ments with fresh eyes and de­cide that I had been way off in my pre­vi­ous as­sess­ment.

Third and most im­por­tantly, peo­ple’s views on this topic are in­evitably driven mostly by in­tu­itive, sub­jec­tive judg­ment calls. One could eas­ily read ev­ery­thing I’ve writ­ten in this post and say they lean in the MCE di­rec­tion on ev­ery topic, or the AIA di­rec­tion, and there would be lit­tle ob­ject-level crit­i­cism one could make against that if they just based their view on a differ­ent in­tu­itive syn­the­sis of the con­sid­er­a­tions. This sub­jec­tivity is dan­ger­ous, but it is also hum­bling. It re­quires us to take an hon­est look at our own thought pro­cesses in or­der to avoid the sub­tle, ir­ra­tional effects that might push us in ei­ther di­rec­tion. It also re­quires cau­tion when eval­u­at­ing “ex­pert” judg­ment, given how much ex­perts could be af­fected by per­sonal and so­cial bi­ases them­selves.

The best way I know of to think about bias in this case is to con­sider the bi­ases and other fac­tors that fa­vor ei­ther cause area and see which case seems more pow­er­ful, or which par­tic­u­lar bi­ases might be af­fect­ing our own views. The fol­low­ing lists are pre­sum­ably not ex­haus­tive but lay out what I think are some com­mon key parts of peo­ple’s jour­neys to AIA or MCE. Of course, these fac­tors are not en­tirely de­ter­minis­tic and prob­a­bly not all will ap­ply to you, nor do they nec­es­sar­ily mean that you are wrong in your cause pri­ori­ti­za­tion. Based on the cir­cum­stances that ap­ply more to you, con­sider tak­ing a more skep­ti­cal look at the pro­ject you fa­vor and your cur­rent views on the ob­ject-level ar­gu­ments for it.

One might be bi­ased to­wards AIA if...

  1. They eat an­i­mal prod­ucts, and thus are as­sign lower moral value and less men­tal fac­ul­ties to an­i­mals.

  2. They haven’t ac­counted for the bias of speciesism.

  3. They lack per­sonal con­nec­tions to an­i­mals, such as grow­ing up with pets.

  4. They are or have been a fan of sci­ence fic­tion and fan­tasy liter­a­ture and me­dia, es­pe­cially if they dreamed of be­ing the hero.

  5. They have a ten­dency to­wards tech­ni­cal re­search over so­cial pro­jects.

  6. They lack so­cial skills.

  7. They are in­clined to­wards philos­o­phy and math­e­mat­ics.

  8. They have a nega­tive per­cep­tion of ac­tivists, per­haps see­ing them as hip­pies, ir­ra­tional, ideal­is­tic, “so­cial jus­tice war­riors,” or overly emo­tion-driven.

  9. They are a part of the EA com­mu­nity, and there­fore drift to­wards the sta­tus quo of EA lead­ers and peers. (The views of EA lead­ers can of course be gen­uine ev­i­dence of the cor­rect cause pri­ori­ti­za­tion, but they can also lead to bias.)

  10. The idea of “sav­ing the world” ap­peals to them.

  11. They take pride in their in­tel­li­gence, and would love if they could save the world just by do­ing brilli­ant tech­ni­cal re­search.

  12. They are com­pet­i­tive, and like the feel­ing/​mind­set of do­ing as­tro­nom­i­cally more good than the av­er­age do-gooder, or even the av­er­age EA. (I’ve ar­gued in this post that MCE has this as­tro­nom­i­cal im­pact, but it lacks the feel­ing of liter­ally “sav­ing the world” or oth­er­wise hav­ing a clear im­pact that makes a good hero’s jour­ney cli­max, and it’s closely tied to lesser, near-term im­pacts.)

  13. They have lit­tle per­sonal ex­pe­rience of ex­treme suffer­ing, the sort that makes one pes­simistic about the far fu­ture, es­pe­cially re­gard­ing s-risks. (Per­sonal ex­pe­rience could be one’s own ex­pe­rience or the ex­pe­riences of close friends and fam­ily.)

  14. They have lit­tle per­sonal ex­pe­rience of op­pres­sion, such as due to their gen­der, race, dis­abil­ities, etc.

  15. They are gen­er­ally a happy per­son.

  16. They are gen­er­ally op­ti­mistic, or at least averse to think­ing about bad out­comes like how hu­man­ity could cause as­tro­nom­i­cal suffer­ing. (Though some pes­simism is re­quired for AIA in the sense that they don’t count on AI ca­pa­bil­ities re­searchers end­ing up with an al­igned AI with­out their help.)

One might be bi­ased to­wards MCE if...

  1. They are ve­gan, es­pe­cially if they went ve­gan for non-an­i­mal or non-far-fu­ture rea­sons, such as for bet­ter per­sonal health.

  2. Their gut re­ac­tion when they hear about ex­tinc­tion risk or AI risk is to judge it non­sen­si­cal.

  3. They have per­sonal con­nec­tions to an­i­mals, such as grow­ing up with pets.

  4. They are or have been a fan of so­cial move­ment/​ac­tivism liter­a­ture and me­dia, es­pe­cially if they dreamed of be­ing a move­ment leader.

  5. They have a ten­dency to­wards so­cial pro­jects over tech­ni­cal re­search.

  6. They have benefit­ted from above-av­er­age so­cial skills.

  7. They are in­clined to­wards so­cial sci­ence.

  8. They have a pos­i­tive per­cep­tion of ac­tivists, per­haps see­ing them as the true lead­ers of his­tory.

  9. They have so­cial ties to ve­g­ans and an­i­mal ad­vo­cates. (The views of these peo­ple can of course be gen­uine ev­i­dence of the cor­rect cause pri­ori­ti­za­tion, but they can also lead to bias.)

  10. The idea of “helping the worst off” ap­peals to them.

  11. They take pride in their so­cial skills, and would love if they could help the worst off just by be­ing so­cially savvy.

  12. They are not com­pet­i­tive, and like the thought of be­ing a part of a friendly so­cial move­ment.

  13. They have a lot of per­sonal ex­pe­rience of ex­treme suffer­ing, the sort that makes one pes­simistic about the far fu­ture, es­pe­cially re­gard­ing s-risks. (Per­sonal ex­pe­rience could be one’s own ex­pe­rience or the ex­pe­riences of close friends and fam­ily.)

  14. They have a lot of per­sonal ex­pe­rience of op­pres­sion, such as due to their gen­der, race, dis­abil­ities, etc.

  15. They are gen­er­ally an un­happy per­son.

  16. They are gen­er­ally pes­simistic, or at least don’t like think­ing about good out­comes. (Though some op­ti­mism is re­quired for MCE in the sense that they be­lieve work on MCE can make a large pos­i­tive differ­ence in so­cial at­ti­tudes and be­hav­ior.)

  17. They care a lot about di­rectly see­ing the im­pact of their work, even if the bulk of their im­pact is hard to see. (E.g. see­ing im­prove­ments in the con­di­tions of farmed an­i­mals, which can be seen as a proxy for helping farmed-an­i­mal-like be­ings in the far fu­ture.)


I per­son­ally found my­self far more com­pel­led to­wards AIA in my early in­volve­ment with EA be­fore I had thought in de­tail about the is­sues dis­cussed in this post. I think the list items in the AIA sec­tion ap­ply to me much more strongly than the MCE list. When I con­sid­ered these bi­ases, in par­tic­u­lar speciesism and my de­sire to fol­low the sta­tus quo of my EA friends, a fresh look at the ob­ject-level ar­gu­ments changed my mind.

From my read­ing and con­ver­sa­tions in EA, I think the bi­ases in fa­vor of AIA are also quite a bit stronger in the com­mu­nity, though of course some EAs — mainly those already work­ing on an­i­mal is­sues for near-term rea­sons — prob­a­bly feel a stronger pull in the other di­rec­tion.

How you think about these bias con­sid­er­a­tions also de­pends on how bi­ased you think the av­er­age EA is. If you, for ex­am­ple, think EAs tend to be quite bi­ased in an­other way like “mea­sure­ment bias” or “quan­tifi­a­bil­ity bias” (a ten­dency to fo­cus too much on eas­ily-quan­tifi­able, low-risk in­ter­ven­tions), then con­sid­er­a­tions of bi­ases on this topic should prob­a­bly be more com­pel­ling to you than they will be to peo­ple who think EAs are less bi­ased.


[1] This post at­tempts to com­pare these cause ar­eas over­all, but since that’s some­times too vague, I speci­fi­cally mean the strate­gies within each cause area that seem most promis­ing. I think this is ba­si­cally equal to “what EAs work­ing on MCE most strongly pri­ori­tize” and “what EAs work­ing on AIA most strongly pri­ori­tize.”

[2] There’s a sense in which AIA is a form of MCE sim­ply be­cause AIA will tend to lead to cer­tain val­ues. I’m ex­clud­ing that AIA ap­proach of MCE from my anal­y­sis here to avoid over­lap be­tween these two cause ar­eas.

[3] Depend­ing on how close we’re talk­ing about, this could be quite un­likely. If we’re dis­cussing the range of out­comes from dystopia across the uni­verse to utopia across the uni­verse, then a range like “be­tween mod­ern earth and the op­po­site value of mod­ern earth” seems like a very tiny frac­tion of the to­tal pos­si­ble range.

[4] I mean “good” in a “pos­i­tive im­pact” sense here, so it in­cludes not just ra­tio­nal­ity ac­cord­ing to the de­ci­sion-maker but also value al­ign­ment, luck, be­ing em­piri­cally well-in­formed, be­ing ca­pa­ble of do­ing good things, etc.

[5] One rea­son for op­ti­mism is that you might think most ex­tinc­tion risk is in the next few years, such that you and other EAs you know to­day will still be around to do this re­search your­selves and make good de­ci­sions af­ter those risks are avoided.

[6] Tech­ni­cally one could be­lieve the far fu­ture is nega­tive but also that hu­mans will make good de­ci­sions about ex­tinc­tion, such as if one be­lieves the far fu­ture (given non-ex­tinc­tion) will be bad only due to non­hu­man forces, such as aliens or evolu­tion­ary trends, but has op­ti­mism about hu­man de­ci­sion-mak­ing, in­clud­ing both that hu­mans will make good de­ci­sions about ex­tinc­tion and that they will be lo­gis­ti­cally able to make those de­ci­sions. I think this is an un­likely view to set­tle on, but it would make op­tion value a good thing in a “close to zero” sce­nario.
Non-ex­tinct civ­i­liza­tions could be max­i­mized for hap­piness, max­i­mized for in­ter­est­ing­ness, set up like Star Wars or an­other sci-fi sce­nario, etc. while ex­tinct civ­i­liza­tions would all be de­void of sen­tient be­ings, per­haps with some vari­a­tion in phys­i­cal struc­ture like differ­ent planets or rem­nant struc­tures of hu­man civ­i­liza­tion.
My views on this are cur­rently largely qual­i­ta­tive, but if I had to put a num­ber on the word “sig­nifi­cant” in this con­text, it’d be some­where around 5-30%. This is a very in­tu­itive es­ti­mate, and I’m not pre­pared to jus­tify it.
Paul Chris­ti­ano made a gen­eral ar­gu­ment in fa­vor of hu­man­ity reach­ing good val­ues in the long run due to re­flec­tion in his post “Against Mo­ral Ad­vo­cacy” (see the “Op­ti­mism about re­flec­tion” sec­tion) though he doesn’t speci­fi­cally ad­dress con­cern for all sen­tient be­ings as a po­ten­tial out­come, which might be less likely than other good val­ues that are more driven by co­op­er­a­tion.”
Nick Bostrom has con­sid­ered some of these risks of ar­tifi­cial suffer­ing us­ing the term “mind crime,” which speci­fi­cally refers to harm­ing sen­tient be­ings cre­ated in­side a su­per­in­tel­li­gence. See his book, Su­per­in­tel­li­gence.
The Foun­da­tional Re­search In­sti­tute has writ­ten about risks of as­tro­nom­i­cal suffer­ing in “Re­duc­ing Risks of Astro­nom­i­cal Suffer­ing: A Ne­glected Pri­or­ity.” The TV se­ries Black Mir­ror is an in­ter­est­ing dra­matic ex­plo­ra­tion of how the far fu­ture could in­volve vasts amounts of suffer­ing, such as the epi­sodes “White Christ­mas” and “USS Cal­lister.” Of course, the de­tails of these situ­a­tions of­ten veer to­wards en­ter­tain­ment over re­al­ism, but their ex­plo­ra­tion of the po­ten­tial for dystopias in which peo­ple abuse sen­tient digi­tal en­tities is thought-pro­vok­ing.
I’m highly un­cer­tain about what sort of mo­ti­va­tions (like hap­piness and suffer­ing in hu­mans) fu­ture digi­tal sen­tient be­ings will have. For ex­am­ple, is pun­ish­ment be­ing a stronger mo­ti­va­tor in earth-origi­nat­ing life just an evolu­tion­ary fluke that we can ex­pect to dis­si­pate in ar­tifi­cial be­ings? Could they be just as mo­ti­vated to at­tain re­ward as we are to avoid pun­ish­ment? I think this is a promis­ing av­enue for fu­ture re­search, and I’m glad it’s be­ing dis­cussed by some EAs.
Brian To­masik dis­cusses this in his es­say on “Values Spread­ing is Often More Im­por­tant than Ex­tinc­tion Risk,” sug­gest­ing that, “there’s not an ob­vi­ous similar mechanism push­ing or­ganisms to­ward the things that I care about.” How­ever, Paul Chris­ti­ano notes in “Against Mo­ral Ad­vo­cacy” that he ex­pects “[c]on­ver­gence of val­ues” be­cause “the space of all hu­man val­ues is not very broad,” though this seems quite de­pen­dent on how one defines the pos­si­ble space of val­ues.
This effi­ciency ar­gu­ment is also dis­cussed in Ben West’s ar­ti­cle on “An Ar­gu­ment for Why the Fu­ture May Be Good.”
The term “re­sources” is in­ten­tion­ally quite broad. This means what­ever the limi­ta­tions are on the abil­ity to pro­duce hap­piness and suffer­ing, such as en­ergy or com­pu­ta­tion.
[16] One can also cre­ate he­do­nium as a promise to get things from ri­vals, but promises seem less com­mon than threats be­cause threats tend to be more mo­ti­vat­ing and eas­ier to im­ple­ment (e.g it’s eas­ier to de­stroy than cre­ate). How­ever, some so­cial norms en­courage promises over threats be­cause promises are bet­ter for so­ciety as a whole. Ad­di­tion­ally, threats against pow­er­ful be­ings (e.g. other cit­i­zens in the same coun­try) do less than threats against less pow­er­ful, or more dis­tant be­ings, and the lat­ter cat­e­gory might be in­creas­ingly com­mon in the fu­ture. Ad­di­tion­ally, threats and promises mat­ter less when one con­sid­ers that they are of­ten un­fulfilled be­cause the other party doesn’t do the ac­tion that was the sub­ject of the threat or promise.
Paul Chris­ti­ano’s blog post on “Why might the fu­ture be good?” ar­gues that “the fu­ture will be char­ac­ter­ized by much higher in­fluence for al­tru­is­tic val­ues [than self-in­ter­est],” though he seems to just be dis­cussing the po­ten­tial of al­tru­ism and self-in­ter­est to cre­ate pos­i­tive value, rather than their po­ten­tial to cre­ate nega­tive value.

Brian To­masik dis­cusses Chris­ti­ano’s ar­gu­ment and oth­ers in “The Fu­ture of Dar­winism” and con­cludes, “Whether the fu­ture will be de­ter­mined by Dar­winism or the de­liber­ate de­ci­sions of a unified gov­ern­ing struc­ture re­mains un­clear.”
One dis­cus­sion of changes in moral­ity on a large scale is Robin Han­son’s blog post, “For­ager, Farmer Mo­rals.”

[19] Arm­chair re­search is rel­a­tively easy, in the sense that all it re­quires is writ­ing and think­ing rather than also dig­ging through his­tor­i­cal texts, run­ning sci­en­tific stud­ies, or en­gag­ing in sub­stan­tial con­ver­sa­tion with ad­vo­cates, re­searchers, and/​or other stake­hold­ers. It’s also more similar to the math­e­mat­i­cal and philo­soph­i­cal work that most EAs are used to do­ing. And it’s more at­trac­tive as a demon­stra­tion of per­sonal prowess to think your way into a cru­cial con­sid­er­a­tion than to ar­rive at one through the te­dious work of re­search. (Th­ese rea­sons are similar to the rea­sons I feel most far-fu­ture-fo­cused EAs are bi­ased to­wards AIA over MCE.)
Th­ese sen­tient be­ings prob­a­bly won’t be the biolog­i­cal an­i­mals we know to­day, but in­stead digi­tal be­ings who can more effi­ciently achieve the AI’s goals.
The ne­glect­ed­ness heuris­tic in­volves a similar messi­ness of defi­ni­tions, but the choices seem less ar­bi­trary to me, and the differ­ent defi­ni­tions lead to more similar re­sults.
Ar­guably this con­sid­er­a­tion should be un­der Tractabil­ity rather than Scale.
There’s a re­lated fram­ing here of “lev­er­age,” with the ba­sic ar­gu­ment be­ing that AIA seems more com­pel­ling than MCE be­cause AIA is speci­fi­cally tar­geted at an im­por­tant, nar­row far fu­ture fac­tor (the de­vel­op­ment of AGI) while MCE is not as speci­fi­cally tar­geted. This also sug­gests that we should con­sider spe­cific MCE tac­tics fo­cused on im­por­tant, nar­row far fu­ture fac­tors, such as en­sur­ing the AI de­ci­sion-mak­ers have wide moral cir­cles even if the rest of so­ciety lags be­hind. I find this ar­gu­ment fairly com­pel­ling, in­clud­ing the im­pli­ca­tion that MCE ad­vo­cates should fo­cus more on ad­vo­cat­ing for digi­tal sen­tience and ad­vo­cat­ing in the EA com­mu­nity than they would oth­er­wise.
Though plau­si­bly MCE in­volves only in­fluenc­ing a few de­ci­sion-mak­ers, such as the de­sign­ers of an AGI.
Brian To­masik dis­cusses this in, “Values Spread­ing is Often More Im­por­tant than Ex­tinc­tion Risk,” ar­gu­ing that, “Very likely our val­ues will be lost to en­tropy or Dar­wi­nian forces be­yond our con­trol. How­ever, there’s some chance that we’ll cre­ate a sin­gle­ton in the next few cen­turies that in­cludes goal-preser­va­tion mechanisms al­low­ing our val­ues to be “locked in” in­definitely. Even ab­sent a sin­gle­ton, as long as the vast­ness of space al­lows for dis­tinct re­gions to ex­e­cute on their own val­ues with­out take-over by other pow­ers, then we don’t even need a sin­gle­ton; we just need goal-preser­va­tion mechanisms.”
Brian To­masik dis­cusses the like­li­hood of value lock-in in his es­say, “Will Fu­ture Civ­i­liza­tion Even­tu­ally Achieve Goal Preser­va­tion?”

[27] The ad­vent of AGI seems like it will have similar effects on the lock-in of val­ues and al­ign­ment, so if you think AI timelines are shorter (i.e. ad­vanced AI will be de­vel­oped sooner), then that in­creases the ur­gency of both cause ar­eas. If you think timelines are so short that we will strug­gle to suc­cess­fully reach AI al­ign­ment, then that de­creases the tractabil­ity of AIA, but MCE seems like it could more eas­ily have a par­tial effect on AI out­comes than AIA could.
In the case of near-term, di­rect in­ter­ven­tions, one might be­lieve that “most so­cial pro­grammes don’t work,” which sug­gests that we should have low, strong pri­ors for in­ter­ven­tion effec­tive­ness that we need ro­bust­ness to over­come.
Cas­par Oester­held dis­cusses the am­bi­guity of ne­glect­ed­ness defi­ni­tions in his blog post, “Com­pli­ca­tions in eval­u­at­ing ne­glect­ed­ness.” Other EAs have also raised con­cern about this com­monly-used heuris­tic, and I al­most in­cluded this con­tent in this post un­der the “Tractabil­ity” sec­tion for this rea­son.
This is a fairly in­tu­itive sense of the word “matched.” I’m tak­ing the topic of ways to af­fect the far fu­ture, di­vid­ing it into pop­u­la­tion risk and qual­ity risk cat­e­gories, then treat­ing AIA and MCE as sub­cat­e­gories of each. I’m also think­ing in terms of each pro­ject (AIA and MCE) be­ing in the cat­e­gory of “cause ar­eas with at least pretty good ar­gu­ments in their fa­vor,” and I think “put de­cent re­sources into all such pro­jects un­til the ar­gu­ments are re­but­ted” is a good ap­proach for the EA com­mu­nity.

[31] I mean “ad­vo­cate” quite broadly here, just any­one work­ing to effect so­cial change, such as peo­ple sub­mit­ting op-eds to news­pa­pers or try­ing to get pedes­tri­ans to look at their protest or take their leaflets.
It’s un­clear what the ex­pla­na­tion is for this. It could just be de­mo­graphic differ­ences such as high IQ, go­ing to elite uni­ver­si­ties, etc. but it could also be ex­cep­tional “ra­tio­nal­ity skills” like find­ing loop­holes in the pub­lish­ing sys­tem.
In Brian To­masik’s es­say on “Values Spread­ing is Often More Im­por­tant than Ex­tinc­tion Risk,” he ar­gues that “[m]ost peo­ple want to pre­vent ex­tinc­tion” while, “In con­trast, you may have par­tic­u­lar things that you value that aren’t widely shared. Th­ese things might be easy to cre­ate, and the in­tu­ition that they mat­ter is prob­a­bly not too hard to spread. Thus, it seems likely that you would have higher lev­er­age in spread­ing your own val­ues than in work­ing on safety mea­sures against ex­tinc­tion.”
This is just my per­sonal im­pres­sion from work­ing in MCE, es­pe­cially with my or­ga­ni­za­tion Sen­tience In­sti­tute. With in­di­rect work, The Good Food In­sti­tute is a po­ten­tial ex­cep­tion since they have strug­gled to quickly hired tal­ented peo­ple af­ter their large amounts of fund­ing.
See “Su­per­ra­tional­ity” in “Rea­sons to Be Nice to Other Value Sys­tems” for an EA in­tro­duc­tion to the idea. See “In fa­vor of ‘be­ing nice’” in “Against Mo­ral Ad­vo­cacy” as ex­am­ple of co­op­er­a­tion as an ar­gu­ment against val­ues spread­ing. In “Mul­ti­verse-wide Co­op­er­a­tion via Cor­re­lated De­ci­sion Mak­ing,” Cas­par Oester­held ar­gues that su­per­ra­tional co­op­er­a­tion makes MCE more im­por­tant.
This dis­cus­sion is com­pli­cated by the widely vary­ing de­grees of MCE. While, for ex­am­ple, most US res­i­dents seem perfectly okay with ex­pand­ing con­cern to ver­te­brates, there would be more op­po­si­tion to ex­pand­ing to in­sects, and even more to some sim­ple com­puter pro­grams that some ar­gue should fit into the edges of our moral cir­cles. I do think the farthest ex­pan­sions are much less co­op­er­a­tive in this sense, though if the mes­sage is just framed as, “ex­pand our moral cir­cle to all sen­tient be­ings,” I still ex­pect strong agree­ment.
One ex­cep­tion is a situ­a­tion where ev­ery­one wants a change to hap­pen, but no­body else wants it badly enough to put the work into chang­ing the sta­tus quo.
My im­pres­sion is that the AI safety com­mu­nity cur­rently wants to avoid fix­ing these val­ues, though they might still be try­ing to make them re­sis­tant to ad­vo­cacy from other peo­ple, and in gen­eral I think many peo­ple to­day would pre­fer to fix the val­ues of an AGI when they con­sider that they might not agree with po­ten­tial fu­ture val­ues.


The value
is not of type