2016 AI Risk Literature Review and Charity Comparison


I’ve long been con­cerned about AI Risk. Now that there are a few char­i­ties work­ing on the prob­lem, it seems de­sir­able to com­pare them, to de­ter­mine where scarce dona­tions should be sent. This is a similar role to that which GiveWell performs for global health char­i­ties, and some­what similar to an se­cu­ri­ties an­a­lyst with re­gard pos­si­ble in­vest­ments. How­ever, while peo­ple have eval­u­ated in­di­vi­d­ual or­gani­sa­tions, I haven’t seen any­one else at­tempt to com­pare them, so hope­fully this is valuable to oth­ers.

I’ve at­tempted to do so. This is a very big un­der­tak­ing, and I am very con­scious of the many ways in which this is not up to the task. The only thing I wish more than the skill and time to do it bet­ter is that some­one else would do it! If peo­ple find this use­ful enough to war­rant do­ing again next year I should be able to do it much more effi­ciently, and spend more time on the un­der­ly­ing model of how pa­pers trans­late into risk-re­duc­tion value.

My aim is ba­si­cally to judge the out­put of each or­gani­sa­tion in 2016 and com­pare it to their bud­get. This should give a sense for the or­gani­sa­tions’ av­er­age cost-effec­tive­ness. Then we can con­sider fac­tors that might in­crease or de­crease the marginal cost-effec­tive­ness go­ing for­ward.

This or­gani­sa­tion-cen­tric ap­proach is in con­trast to a re­searcher-cen­tric ap­proach, where we would analyse which re­searchers do good work, and then donate wher­ever they are. An ex­treme ver­sion of the other ap­proach would be to sim­ply give money di­rectly to re­searchers—e.g if I like Log­i­cal In­duc­tion, I would sim­ply fund Scott Garrabrant di­rectly and ig­nore MIRI. I favour the or­gani­sa­tion-cen­tric ap­proach be­cause it helps keep or­gani­sa­tions ac­countable. Ad­di­tion­ally, if re­searcher skill is the only thing that mat­ters for re­search out­put, it doesn’t re­ally mat­ter which or­gani­sa­tions end up get­ting the money and em­ploy­ing the re­searchers, as­sum­ing broadly the same re­searchers are hired. Differ­ent or­gani­sa­tions might hire differ­ent re­searchers, but then we are back at judg­ing in­sti­tu­tional qual­ity rather than in­di­vi­d­ual re­searcher qual­ity.

Judg­ing or­gani­sa­tions on their his­tor­i­cal out­put is nat­u­rally go­ing to favour more ma­ture or­gani­sa­tions. A new startup, whose value all lies in the fu­ture, will be dis­ad­van­taged. How­ever, many of the newer or­gani­sa­tions seem to have sub­stan­tial fund­ing, which should be suffi­cient for them to op­er­ate for a few years and pro­duce a de­cent amount of re­search. This re­search can then form the ba­sis of a more in­formed eval­u­a­tion af­ter a year or two of op­er­a­tion, where we de­cide whether it is the best place for fur­ther fund­ing. If the ini­tial en­dow­ment is in­suffi­cient to pro­duce a track record of re­search, in­cre­men­tal fund­ing seems un­likely to make a differ­ence. You might dis­agree here if you thought there were strong thresh­old effects; maybe $6m will al­low a crit­i­cal mass of top tal­ent to be hired, but $5.9m will not.

Un­for­tu­nately there is a lack of a stan­dard­ised met­ric—as we can’t di­rectly mea­sure the change in risk, so there is no equiv­a­lent for in­cre­men­tal life-years (as for GiveWell), or prof­its (for In­vest­ment). So this is go­ing to in­volve a lot of judge­ment.

This judge­ment in­volves analysing a large num­ber pa­pers re­lat­ing to Xrisk that were pro­duced dur­ing 2016. Hope­fully the year-to-year volatility of out­put is suffi­ciently low that this is a rea­son­able met­ric. I also at­tempted to in­clude pa­pers dur­ing De­cem­ber 2015, to take into ac­count the fact that I’m miss­ing the last month’s worth of out­put from 2016, but I can’t be sure I did this suc­cess­fully. I then eval­u­ated them for their con­tri­bu­tion to a va­ri­ety of differ­ent is­sues—for ex­am­ple, how much new in­sight does this pa­per add to our think­ing about the strate­gic land­scape, how much progress does this pa­per make on moral un­cer­tainty is­sues an AGI is likely to face. I also at­tempted to judge how re­place­able a pa­per was—is this a pa­per that would likely have been cre­ated any­way by non-safety-con­cerned AI re­searchers?

I have not made any at­tempt to com­pare AI Xrisk re­search to any other kind of re­search; this ar­ti­cle is aimed at peo­ple who have already de­cided to fo­cus on AI Risk. It also fo­cuses on pub­lished ar­ti­cles, at the ex­pense of both other kinds of writ­ing (like blog posts or out­reach) and non-tex­tual out­put, like con­fer­ences. There are a lot of rab­bit holes and it is hard enough to get to the bot­tom of even one!

This ar­ti­cle fo­cuses on AI risk work. Some of the or­gani­sa­tions sur­veyed (FHI, FLI, GCRI etc.) also work on other risks. As such it if you think other types of Ex­is­ten­tial Risk re­search are similarly im­por­tant to AI risk work you should give or­gani­sa­tions like GCRI or GPP some credit for this.

Prior Literature

The Open Philan­thropy Pro­ject did a re­view of MIRI here. The ver­dict was some­what equiv­o­cal—there were many crit­i­cisms, but they ended up giv­ing MIRI $500,000, which while less than they could have given, was nonethe­less rather more than the de­fault, zero. How­ever, the dis­clo­sures sec­tion is painful to read. Typ­i­cally we would hope that an­a­lysts and sub­jects would not live in the same house—or be dat­ing cowork­ers. This is in ac­cor­dance with their anti-prin­ci­ples, which ex­plic­itly de-pri­ori­tise ex­ter­nal com­pre­hen­si­bil­ity and avoid­ing con­flicts of in­ter­est. Worse from our per­spec­tive, the re­port makes no at­tempt to com­pare dona­tions to MIRI to dona­tions to any other or­gani­sa­tion.

Owen Cot­ton-Bar­ratt re­cently wrote a piece ex­plain­ing his choice of donat­ing to MIRI. How­ever this also con­tained rel­a­tively lit­tle eval­u­a­tion of al­ter­na­tives to MIRI—while there is an im­plicit en­dorse­ment through his de­ci­sion to donate, as a Re­search Fel­low at FHI is is in­ap­pro­pri­ate for him to donate to FHI, so his de­ci­sion has lit­tle in­for­ma­tion value with re­gard the FHI-MIRI trade­off.

Method­ol­ogy & Gen­eral Remarks

Here are some tech­ni­cal de­tails which arose dur­ing this pro­ject.

Tech­ni­cal vs Non-Tech­ni­cal Papers

Public Policy /​ Public Outreach

In some ways AI Xrisk fits very nat­u­rally into a policy dis­cus­sion. It’s ba­si­cally con­cerned with a nega­tive ex­ter­nal­ity of AI re­search, which sug­gests the stan­dard eco­nomics toolkit of pi­go­vian taxes and prop­erty rights /​ li­a­bil­ity al­lo­ca­tion. The un­usual as­pects of the is­sue (like ir­re­versibil­ity) sug­gest out­right reg­u­la­tion could be war­ranted. This is cer­tainly close to many peo­ple’s in­tu­itions about try­ing to use the state as a vec­tor to solve mas­sive co­or­di­na­tion prob­lems.

How­ever, I now think this is a mis­take.

My im­pres­sion is that policy on tech­ni­cal sub­jects (as op­posed to is­sues that at­tract strong views from the gen­eral pop­u­la­tion) is gen­er­ally made by the gov­ern­ment and civil ser­vants in con­sul­ta­tion with, and be­ing lob­bied by, out­side ex­perts and in­ter­ests. Without ex­pert (e.g. top ML re­searchers at Google, CMU & Baidu) con­sen­sus, no use­ful policy will be en­acted.

Push­ing di­rectly for policy seems if any­thing likely to hin­der ex­pert con­sen­sus. At­tempts to di­rectly in­fluence the gov­ern­ment to reg­u­late AI re­search seem very ad­ver­sar­ial, and risk be­ing pat­tern-matched to ig­no­rant op­po­si­tion to GM foods or nu­clear power. We don’t want the ‘us-vs-them’ situ­a­tion, that has oc­curred with cli­mate change, to hap­pen here. AI re­searchers who are dis­mis­sive of safety law, re­gard­ing it as an im­po­si­tion and en­cum­brance to be en­dured or evaded, will prob­a­bly be harder to con­vince of the need to vol­un­tar­ily be ex­tra-safe—es­pe­cially as the reg­u­la­tions may ac­tu­ally be to­tally in­effec­tive. The only case I can think of where sci­en­tists are rel­a­tively happy about puni­tive safety reg­u­la­tions, nu­clear power, is one where many of those ini­tially con­cerned were sci­en­tists them­selves.

Given this, I ac­tu­ally think policy out­reach to the gen­eral pop­u­la­tion is prob­a­bly nega­tive in ex­pec­ta­tion.

‘Papers to point peo­ple to’

Sum­mary ar­ti­cles like Con­crete Prob­lems can be use­ful both for es­tab­lish­ing which ideas are the most im­por­tant to work on (vs a pre­vi­ous in­for­mal un­der­stand­ing thereof) and to provide some­thing to point new re­searchers to­wards, pro­vid­ing ac­tion­able re­search top­ics. How­ever progress here is sig­nifi­cantly less ad­di­tive than for ob­ject-level re­search: New the­o­rems don’t gen­er­ally de­tract from old the­o­rems, but new pieces-to-be-pointed-to of­ten re­place older pieces-to-be-pointed-to.

Tech­ni­cal papers

Gen­er­ally I value tech­ni­cal pa­pers—math­e­mat­ics re­search—highly. I think this are a cred­ible sig­nal of qual­ity, both to me as a donor and also to main­stream ml re­searchers, and ad­di­tive—each re­sult can build on pre­vi­ous work. What is more, they are vi­tal—mankind can­not live on strat­egy white pa­pers alone!

Marginal Employees

In their re­cent eval­u­a­tion of MIRI, the Open Philan­thropy Pro­ject asked to fo­cus on pa­pers writ­ten by re­cent em­ploy­ees, to get an im­pres­sion of the qual­ity of marginal em­ploy­ees, to whom in­cre­men­tal dona­tions would pre­sum­ably be fund­ing. How­ever, I think this is plau­si­bly a mis­take. When eval­u­at­ing pub­lic com­pa­nies, we are always con­cerned about threats to their core busi­ness which may be ob­scured by smaller, more rapidly grow­ing, seg­ments. We don’t want com­pa­nies to ‘buy growth’ in an at­tempt to cover up core weak­ness. A similar prin­ci­ple seems plau­si­ble here: an or­gani­sa­tion might hire new em­ploy­ees that are all in­di­vi­d­u­ally pro­duc­tive, thereby cov­er­ing up a re­duc­tion in pro­duc­tivity/​fo­cus/​out­put from the founders or ear­lier em­ploy­ees. In the ab­sence of these ad­di­tional hires, the (allegedly sub-marginal) ex­ist­ing em­ploy­ees would be more pro­duc­tive, in or­der to en­sure the or­gani­sa­tion did not fall be­low a min­i­mum level of out­put. I think this is why a num­ber of EA or­gani­sa­tions seem to have seen sub­lin­ear re­turns to scale.

Paper au­thor­ship allocation

Vir­tu­ally all AI Xrisk re­lated pa­pers are co-au­thored, fre­quently be­tween or­gani­sa­tions. This raises the ques­tion of how to al­lo­cate credit be­tween in­sti­tu­tions. In gen­eral in aca­demic the first au­thor has done most of the work, with a sharp drop off (though this is not the case in fields like eco­nomics, where an alpha­betic or­der­ing is used).

In cases where au­thors had mul­ti­ple af­fili­a­tions, I as­signed credit to X-risk or­gani­sa­tions over uni­ver­si­ties. In a few cases where an au­thor was af­fili­ated with mul­ti­ple or­gani­sa­tions I used my judge­ment e.g. as­sign­ing Stu­art Arm­strong to FHI not MIRI.

This policy could be crit­i­cized if you thought ex­ter­nal co-au­thors was a good thing, for ex­pand­ing the field, and hence should not be a dis­count.

Abil­ity to scale

No or­gani­sa­tion I asked to re­view this doc­u­ment sug­gested they were not look­ing for more dona­tions.

My im­pres­sion is that very small or­gani­sa­tions ac­tu­ally scale worse than larger or­gani­sa­tions, be­cause very small or­gani­sa­tions have a num­ber of ad­van­tages that are lost as they grow—large or­gani­sa­tions have already lost these ad­van­tages. Th­ese in­clude a lack of in­ter­nal co­or­di­na­tion prob­lems and highly mo­ti­vated founders who are will­ing to work for lit­tle or noth­ing.

Notable for both MIRI and FHI I think their best work for the year was pro­duced by non-founders, sug­gest­ing this point has been passed for them, which is a pos­i­tive.

This means I am rel­a­tively more keen to fund ei­ther very small or rel­a­tively large or­gani­sa­tions, un­less it is nec­es­sary to pre­vent the small or­gani­sa­tion from go­ing un­der.

Pro­fes­sion­al­ism & Rep­u­ta­tional Risks

Or­gani­sa­tions should con­sider rep­u­ta­tional risks. To the ex­tent that avoid­ing AI Risk largely con­sists in per­suad­ing a wide range of ac­tors to take the threat se­ri­ously and act ap­pro­pri­ately, ac­tions that jeop­ar­dize this by mak­ing the move­ment ap­pear silly or worse should be avoided.

In the past EA or­gani­sa­tions have been want­ing in this re­gard. There have been at least two ma­jor nega­tive PR events, and a num­ber of near misses.

One con­trib­u­tor to this risk is that from a PR per­spec­tive there is no clear dis­tinc­tion be­tween the ac­tions or­gani­sa­tional lead­ers take in their role rep­re­sent­ing their or­gani­sa­tions and the ac­tions they take as pri­vate per­sons. If a poli­ti­cian or CEO be­haves im­morally in their per­sonal life, this is taken (per­haps un­fairly) as a mark against the or­gani­sa­tion—claims that state­ments or ac­tions do not re­flect the views of their em­ployer are sim­ply not cred­ible. In­deed, lead­ers are of­ten judged by un­usu­ally stringent stan­dards—they should be­have in a way that not merely *is not im­moral*, but is *un­ques­tion­ably not im­moral*. This in­cludes care­fully con­trol­ling poli­ti­cal state­ments, as even main­stream views can have nega­tive ram­ifi­ca­tions. I think Jaan’s policy is pru­dent:

as a gen­eral rule, i try to steer clear of hot poli­ti­cal de­bates (and sig­nal­ling tribal af­fili­a­tions), be­cause do­ing that seems in­stru­men­tally counter-pro­duc­tive for my goal of x-risk re­duc­tion. source

Xrisk or­gani­sa­tions should con­sider hav­ing poli­cies in place to pre­vent se­nior em­ploy­ees from es­pous­ing con­tro­ver­sial poli­ti­cal opinions on face­book or oth­er­wise pub­lish­ing ma­te­ri­als that might bring their or­gani­sa­tion into dis­re­pute. They should also en­sure that se­nior em­ploy­ees do not at­tempt to take ad­van­tage of their po­si­tion. This re­quires or­gani­sa­tions to bear in mind that they need to main­tain the re­spect of the world at large, and that ac­tions which ap­pear ac­cept­able within the Bay Area may not be so to the wider world.

Fi­nan­cial Controls

If money is to be spent wisely it must first not be lost or stolen. A num­ber of or­gani­sa­tions have had wor­ry­ing fi­nan­cial mis­man­age­ment in the past. For ex­am­ple, MIRI suffered a ma­jor theft by ex-em­ployee in 2009, though ap­par­ently they have re­cov­ered the money.

How­ever, I’m not sure this in­for­ma­tion is all that use­ful to a po­ten­tial donor—the main pre­dic­tor of whether I’m aware of some fi­nan­cial mis­man­age­ment in an or­gani­sa­tion’s past is sim­ply how fa­mil­iar I am with the or­gani­sa­tion, fol­lowed by how old they are, which it is un­fair to pun­ish them for.

It might be worth giv­ing FHI some credit here, as they are both old enough and I’m fa­mil­iar enough with them that the ab­sence of ev­i­dence of mis­man­age­ment may ac­tu­ally con­sti­tute some non-triv­ial ev­i­dence of its ab­sence. Gen­er­al­is­ing this point, uni­ver­sity af­fili­a­tion might be a pro­tec­tive fac­tor—though not always this was not the case, as shown when SCI’s par­ent, Im­pe­rial Col­lege, mis­placed $333,000 , and it can also raise fun­gi­bil­ity is­sues.

Com­mu­ni­ca­tion with donors

All or­gani­sa­tions should write a short doc­u­ment an­nu­ally (prefer­ably early De­cem­ber), lay­ing out their top goals for the com­ing year in a clear and suc­cinct man­ner, and then briefly de­scribing how suc­cess­ful they thought they were at achiev­ing the pre­vi­ous year’s goals. The doc­u­ment should also con­tain a sim­ple table show­ing to­tal in­come and ex­pen­di­ture for the last 3 years, pro­jec­tions for the next year, and the num­ber of em­ploy­ees. Some or­gani­sa­tions have done a good job of this; oth­ers could im­prove.

Public com­pa­nies fre­quently do Non Deal Road­shows (NDRs), where some com­bi­na­tion of CEO, CFO and IR will travel to meet with in­vestors, an­swer­ing their ques­tions as well as giv­ing in­vestors a chance to judge man­age­ment qual­ity.

While it would be un­duly ex­pen­sive, both in terms of time and money, for Xrisk or­gani­sa­tions to host such tours, when se­nior mem­bers are vis­it­ing cities with ma­jor con­cen­tra­tions of po­ten­tial donors (e.g. NYC, Lon­don, Oxford, Bay Area) they should con­sider host­ing in­for­mal events where peo­ple can ask ques­tions; GiveWell already hosts such an an­nual meet­ing in NYC. This could help im­prove ac­countabil­ity and re­duce donor aliena­tion.

Liter­a­ture and Or­gani­sa­tional Review

Tech­ni­cal Safety Work Fo­cused Organisations


MIRI is the largest pure-play AI ex­is­ten­tial risk group. Based in Berkeley, it fo­cuses on math­e­mat­ics re­search that is un­likely to be pro­duced by aca­demics, try­ing to build the foun­da­tions for the de­vel­op­ment of safe AIs. Their agent foun­da­tions work is ba­si­cally try­ing to work out the cor­rect way of think­ing about agents and learn­ing/​de­ci­sion mak­ing by spot­ting ar­eas where our cur­rent mod­els fail and seek­ing to im­prove there.

Much of their work this year seems to in­volve try­ing to ad­dress self-refer­ence in some way—how can we de­sign, or even just model, agents that are smart enough to think about them­selves? This work is tech­ni­cal, ab­stract, and re­quires a con­sid­er­able be­lief in their long-term vi­sion, as it is rarely lo­cally ap­pli­ca­ble.

Dur­ing the year they an­nounced some­thing of a pivot, to­wards spend­ing more of their time on ml work, in ad­di­tion to their pre­vi­ous agent-foun­da­tions fo­cus. I think this is pos­si­bly a mis­take; while more di­rectly rele­vant, this work seems sig­nifi­cantly more ad­dress­able by main­stream ml re­searchers than their agent-foun­da­tions work, though to be fair main­stream ml re­searchers have gen­er­ally not ac­tu­ally done it. Ad­di­tion­ally, this work seems some­what out­side of their ex­per­tise. In any event at this early stage the new re­search di­rec­tion has not (and wouldn’t have been ex­pected to) pro­duced any re­search to judge it by.

Vir­tu­ally all of MIRI’s work, es­pe­cially on the agent foun­da­tions side, does very well on re­place­abil­ity; it seems un­likely that any­one not mo­ti­vated by AI safety would pro­duce this work. Even within those con­cerned about friendly AI, few not at MIRI would pro­duce this work.

Para­met­ric Bounded Lob’s The­o­rem and Ro­bust Co­op­er­a­tion of Bounded Agents offers a cool and sub­stan­tive re­sult. Ba­si­cally by prov­ing a bounded ver­sion of Lob’s the­o­rem we can en­sure that proof-find­ing agents will be able to util­ise Lo­bian rea­son­ing. This is es­pe­cially use­ful for agents that need to model other agents, as it al­lows two ‘equally pow­er­ful’ agents to come to con­clu­sions about each other. In terms of im­prov­ing our un­der­stand­ing of gen­eral rea­son­ing agents, this seems like a rea­son­able step for­ward, es­pe­cially for self-im­prov­ing agents, who need to rea­son about *even more pow­er­ful agents*. It could also help game-the­o­retic ap­proaches at get­ting use­ful work from un­friendly AIs, as it shows the dan­ger that sep­a­rate AIs could causally co­op­er­ate in this fash­ion, though I don’t think MIRI would nec­es­sar­ily agree on that point.

In­duc­tive Co­her­ence was an in­ter­est­ing at­tempt to solve the prob­lem of rea­son­ing about log­i­cal un­cer­tainty. I didn’t re­ally fol­low the maths. Asymp­totic Con­ver­gence in On­line Learn­ing with Un­bounded De­lays was an­other at­tempt to solve the is­sue from an­other an­gle. Ac­cord­ing to MIRI it’s ba­si­cally su­per­seded by the next pa­per any­way, so I didn’t in­vest too much time in these pa­pers.

Log­i­cal In­duc­tion is a very im­pres­sive pa­per. Ba­si­cally they make a lot of progress on the prob­lem of log­i­cal un­cer­tainty by set­ting up a fi­nan­cial mar­ket of Ar­row-De­breu se­cu­ri­ties for log­i­cal state­ments, and then spec­i­fy­ing that you shouldn’t be very ex­ploitable in this mar­ket. From this, a huge num­ber of de­sir­able, and some­what sur­pris­ing, prop­er­ties fol­low. The pa­per pro­vides a model of a log­i­cal agent that we can work with to prove other re­sults, be­fore we ac­tu­ally have a prac­ti­cal im­ple­men­ta­tion of that agent. Hope­fully this also helps cause some differ­en­tial progress to­wards more trans­par­ent AI tech­niques.

A For­mal Solu­tion to the Grain of Truth Prob­lem pro­vides a class of bayesian agents whose pri­ors as­sign pos­i­tive prob­a­bil­ity to the other agents in the class. The math­e­mat­ics be­hind the pa­per seem pretty im­pres­sive, and the re­sult seems use­ful—ul­ti­mately AIs will have to be able to lo­cate them­selves in the world, and to think about other AIs. Pro­duc­ing an ab­stract for­mal way of mod­el­ling these is­sues now helps us make progress be­fore such AI is ac­tu­ally de­vel­oped—and think­ing about ab­stract gen­eral sys­tems is of­ten eas­ier than messy par­tic­u­lar in­stan­ti­a­tions. The lead au­thor on this pa­per was Jan Leike, who was at ANU (now at Deep­mind/​FHI), so MIRI only gets par­tial credit here.

Align­ment for Ad­vanced ML Sys­tems is a high-level strat­egy /​ ‘point po­ten­tial re­searchers to­wards this so they un­der­stand what to work on’ piece. I’d say it’s ba­si­cally mid­way be­tween Con­crete Prob­lems and Value Learn­ing Prob­lem; more ex­plicit about the Xrisk /​ Value Learn­ing prob­lems than the former, but more ML than the lat­ter. It dis­cusses a va­ri­ety of is­sues and sum­marises some of the liter­a­ture, in­clud­ing Re­ward Hack­ing, Scal­able Over­sight, Do­mes­tic­ity, Am­bi­guity Iden­ti­fi­ca­tion, Ro­bust­ness to Distri­bu­tional Shift, and Value Ex­trap­o­la­tion.

For­mal­iz­ing Con­ver­gent In­stru­men­tal Goals is a cute pa­per that ba­si­cally for­mal­ises the clas­sic Omo­hun­dro pa­per on the sub­ject, show­ing that AGI won’t by de­fault leave hu­mans alone and co-ex­ist from the Ort Cloud. Ap­par­ently some peo­ple didn’t find Omo­hun­dro’s ini­tial ar­gu­ment in­tu­itively ob­vi­ous—this nice for­mal­i­sa­tion hope­fully ren­ders the con­clu­sion even clearer. How­ever, wouldn’t con­sider the model de­vel­oped here (which is pur­pose­fully very bare-bones in the in­ter­ests of gen­er­al­ity) as a foun­da­tion for fu­ture work; this is a one-and-done pa­per.

Defin­ing Hu­man Values for Value Learn­ers pro­duces a model of hu­man val­ues ba­si­cally as con­cepts that ab­stract from lower ex­pe­riences like pain or hunger in or­der to bet­ter pro­mote them—pain in turn ab­stract­ing from evolu­tion­ary goals in or­der to bet­ter pro­mote the germline. It’s a nice idea, but I doubt we will get to the cor­rect model this way—as op­posed to more ML-in­spired routes.

MIRI also spon­sored a se­ries of MIRIx work­shops, helping ex­ter­nal re­searchers en­gage with MIRI’s ideas. One of these lead to Self-Mod­ifi­ca­tion in Ra­tional Agents, where Tom Ever­itt et al ba­si­cally for­mal­ise an in­tu­itive re­sult, from LessWrong and no doubt el­se­where—that Ghandi does not want to want to mur­der—in nice ML style. Given how much good work has come out of ANU, how­ever, per­haps the MIRIx work­shop should not get that much coun­ter­fac­tual credit.

MIRI sub­mit­ted a doc­u­ment to the White House’s Re­quest for In­for­ma­tion on AI safety. The sub­mis­sion seems pretty good, but it’s hard to tell what if any im­pact it had. The sub­mis­sion was not refer­enced in the fi­nal White House re­port, but I don’t think that’s much ev­i­dence.

MIRI’s lead re­searcher is heav­ily in­volved as an ad­vi­sor (and par­tial owner) in a startup that is try­ing to de­velop more in­tu­itive math­e­mat­i­cal ex­pla­na­tions; MIRI also paid him to de­velop con­tent about AI risk for that plat­form. He also pub­lished a short eBook, which was very funny but some­what porno­graphic and not very re­lated to AI. I think this is prob­a­bly not very helpful for MIRI’s rep­u­ta­tion as a se­ri­ous re­search in­sti­tu­tion.

MIRI spent around $1,650,000 in 2015, and $1,750,000 in 2016.


FHI is a well-es­tab­lished re­search in­sti­tute, af­fili­ated with Oxford and led by Nick Bostrom. Com­pared to the other groups we are re­view­ing they have a large staff and large bud­get. As a rel­a­tively ma­ture in­sti­tu­tion they pro­duced a de­cent amount of re­search over the last year that we can eval­u­ate.

Their re­search is more varied than MIRI’s, in­clud­ing strate­gic work, work di­rectly ad­dress­ing the value-learn­ing prob­lem, and cor­rigi­bil­ity work.

Stu­art Arm­strong has two no­table pa­pers this year, Safely In­ter­rupt­able Agents and Off-policy Monte Carlo agents with vari­able be­havi­our poli­cies, both on the theme of In­ter­rupt­ibil­ity—how to de­sign an AI such that it can be in­ter­rupted af­ter be­ing launched and its be­havi­our al­tered. In the long run this won’t be enough—we will have to solve the AI al­ign­ment prob­lem even­tu­ally—but this might help provide more time, or maybe a sav­ing throw. Pre­vi­ously I had thought I un­der­stood this re­search agenda—it was about mak­ing an AI in­differ­ent to a red but­ton through clev­erly de­signed util­ity func­tions or pri­ors. With these lat­est two pa­pers I’m less sure, as they seem to be con­cerned only with in­ter­rup­tions dur­ing the train­ing phase, and do not pre­vent the AI from pre­dict­ing or try­ing to pre­vent in­ter­rup­tion. How­ever they seem to be work­ing on a co­her­ent pro­gram, so I trust that this re­search di­rec­tion makes sense. Also im­por­tantly, one of the pa­pers was coau­thored Lau­rent Orseau of Deep­mind. I think these sorts of col­lab­o­ra­tions with lead­ing AI re­searchers are in­cred­ibly valuable.

Jan Leike, who re­cently joined Deep­mind, is also af­fili­ated with FHI, and as a fan of his work (in­clud­ing the Grain of Truth pa­per de­scribed above) I am op­ti­mistic about what he will pro­duce, if also pes­simistic about my abil­ity to judge it. Ex­plo­ra­tion Po­ten­tial seems to provide a met­ric to help AIs ex­plore in a goal-aware fash­ion, which is de­sir­able, but there seems to be still a long way to go be­fore this prob­lem is solved.

Strate­gic Open­ness re­turns to a ques­tion FHI ad­dressed be­fore, namely what are the costs and benefits open AI re­search, as op­posed to se­cre­tive or pro­pri­etary. This pa­per is as com­pre­hen­sive as one would ex­pect from Bostrom. Much of the ma­te­rial is ob­vi­ous when you read it, but col­lect­ing a suffi­cient num­ber of in­di­vi­d­u­ally triv­ial things can pro­duce a valuable re­sult. The pa­per seems like a valuable strate­gic con­tri­bu­tion; un­for­tu­nately it may sim­ply be ig­nored by peo­ple who want to set up Open AI groups. It does well on re­place­abil­ity; it seems un­likely that any­one not mo­ti­vated by AI safety would pro­duce this work. It might benefit from the ex­tra pres­tige of be­ing pub­lished in a jour­nal.

FHI also pub­lished a Learn­ing the Prefer­ences of Ig­no­rant, In­con­sis­tant Agents, on how to in­fer val­ues from ig­no­rant and in­con­sis­tant agents. This pro­vides sim­ple func­tional forms for hy­per­bolic dis­count­ing or ig­no­rant agents, some cute ex­am­ples of learn­ing what com­bi­na­tions of prefer­ences and bias could have yielded be­havi­our, and a sur­vey to show the model agrees with or­di­nary peo­ple’s in­tu­itions. How­ever while it and the re­lated pa­per from 2015, Learn­ing the Prefer­ences of Bounded Agents, are some of the best work I have seen on the sub­ject, they have no solu­tion to prob­lem of too many free pa­ram­e­ters; it shows pos­si­ble com­bi­na­tions of bias+value that could ac­count for ac­tions, but no way to differ­en­ti­ate be­tween the two. FHI had the lead au­thor (Owain Evans) but the sec­ond and third au­thors were not FHI. The pa­per came out De­cem­ber 2015, but we are offset­ting our year by one month so this still falls within the time pe­riod. Pre­sum­ably the work was done ear­lier in 2015, but equally pre­sum­ably there is other re­search FHI is work­ing on now that I can’t see

The other FHI re­search con­sists of three main col­lab­o­ra­tions with the Global Pri­ori­ties Pro­ject on strate­gic policy-ori­en­tated re­search, lead by Owen Cot­ton-Bar­ratt. Un­der­pro­tec­tion of Un­pre­dictable Statis­ti­cal Lives Com­pared to Pre­dictable Ones ba­si­cally lays the ground­work for reg­u­la­tion /​ in­ter­nal­i­sa­tion of low-p high im­pact risks. The core idea is pretty ob­vi­ous but ob­vi­ous things still need stat­ing, and the point about com­pe­ti­tion from ir­ra­tional com­peti­tors is good and prob­a­bly non-in­tu­itive to many; the same is­sue oc­curs when profit-mo­ti­vated west­ern firms at­tempt to com­pete with ‘strate­gic’ Asian com­peti­tors (e.g. the Chi­nese steel in­dus­try, or var­i­ous Ja­panese firms). While it dis­cusses us­ing in­surance to solve some is­sues, it doesn’t men­tion how set­ting the at­tach point=to­tal as­sets can solve some in­cen­tive al­ign­ment prob­lems. More no­tably, it does not ad­dress the dan­ger that in­dus­try-re­quested reg­u­la­tion can lead to reg­u­la­tory cap­ture. The ar­ti­cle is also be­hind a pay­wall, which seems likely to re­duce its im­pact. The work­ing pa­per, Beyond risk-benefit anal­y­sis: pric­ing ex­ter­nal­ities for gain-of-func­tion re­search of con­cern deals with a similar is­sue. Over­all while I think this work is quite solid eco­nomic the­ory, and ad­dresses a ne­glected over­all topic, I think it is un­likely that this ap­proach will make much differ­ence to AI risk, though it could be use­ful for biose­cu­rity or the like. They also pro­duced Global Catas­trophic Risks, which, mea culpa, I have not read.

FHI spent £1.1m in 2016 (they were un­able to provide me 2015 num­bers due to a staff ab­sence). As­sum­ing their cost struc­ture is fun­da­men­tally ster­ling based, this cor­re­sponds to around $1,380,000.


OpenAI, Musk’s AI re­search com­pany, ap­par­ently has $1bn pledged. I doubt in­cre­men­tal dona­tions are best spent here, even though they seem to be do­ing some good work, like Con­crete Prob­lems in AI Safety.

If this doc­u­ment proves use­ful enough to pro­duce again next year, I’ll aim to in­clude a longer sec­tion on OpenAI.

Cen­ter for Hu­man-Com­pat­i­ble AI

The Cen­ter for Hu­man-Com­pat­i­ble AI, founded by Stu­art Rus­sell in Berkeley, launched in Au­gust.

As they are ex­tremely new, there is no track record to judge and com­pare—the pub­li­ca­tions on their pub­li­ca­tions page ap­pear to all have been pro­duced prior to the found­ing of the in­sti­tute. I think there is a good chance that they will do good work—Rus­sell has worked on rele­vant pa­pers be­fore, like Co­op­er­a­tive In­verse Re­in­force­ment Learn­ing, which ad­dresses how an AI should ex­plore if it has a hu­man teach­ing it the value of differ­ent out­comes. I think the two Evans et al pa­pers (one, two) offer a more promis­ing ap­proach to this spe­cific ques­tion, be­cause they do not as­sume the AI can di­rectly ob­serve the value of an out­come, but the Rus­sell pa­per may be use­ful for cor­rigi­bil­ity—see for ex­am­ple Dy­lan Had­field-Menell’s The Off Switch. In­for­ma­tion Gather­ing Ac­tions over Hu­man In­ter­nal State seems also po­ten­tially rele­vant to the value learn­ing prob­lem.

How­ever, they have ad­e­quate (over $5m) ini­tial fund­ing from the Open Philan­thropy Pro­ject, the Lev­er­hulme Trust, CITRIS and the Fu­ture of Hu­man­ity In­sti­tute. If they can­not pro­duce a sub­stan­tial amount of re­search over the next year with this quan­tity of fund­ing, it seems un­likely that any more would help (though if you be­lieved in con­vex re­turns to fund­ing it might, for ex­am­ple due to thresh­old//​crit­i­cal mass effects), and if they do we can re­view them then. As such I wish them good luck.

Strat­egy /​ Outreach fo­cused organisations


The Fu­ture of Life In­sti­tute was founded to do out­reach, in­clud­ing run the Puerto Rico con­fer­ence. Elon Musk donated $10m for the or­gani­sa­tion to re-dis­tribute; given the size of the dona­tion it has right­fully come to some­what dom­i­nate their ac­tivity. The 2015 grant pro­gram recom­mended around $7m grants to a va­ri­ety of re­searchers and in­sti­tu­tions. Some of the grants were spread over sev­eral years.

My ini­tial in­ten­tion was to eval­u­ate FLI as a an­nual grant-mak­ing or­gani­sa­tion, judg­ing them by their re­search port­fo­lio. I have read over 26 pa­pers thus sup­ported. How­ever, I am now scep­ti­cal this is the cor­rect way to think about FLI.

In terms of di­rectly fund­ing the most im­por­tant re­search, donat­ing to FLI is un­likely to be the best strat­egy. Their grant pool is al­lo­cated by a board of (anony­mous) AI re­searchers. While grants are made in ac­cor­dance with the ex­cel­lent re­search pri­ori­ties doc­u­ment, much of the money has his­tor­i­cally funded, and is likely to con­tinue to fund, shorter term re­search pro­jects than donors may oth­er­wise pri­ori­tise. The valuable longer-term pro­jects they do fund tend to be at in­sti­tutes like MIRI or FHI, so donors wish­ing to sup­port this could sim­ply donate di­rectly. Of course some of their fund­ing does sup­port valuable long-term re­search that is not done at the other or­gani­sa­tions. For ex­am­ple they sup­ported the Stein­hardt et al pa­per Avoid­ing Im­posters and Delin­quents: Ad­ver­sar­ial Crowd­sourc­ing and Peer Pre­dic­tion on how to get use­ful work from un­re­li­able agents, which I do not see how donors could have sup­ported ex­cept through FLI.

Also un­for­tu­nately I sim­ply don’t have time to re­view all the re­search they sup­ported, which is en­tirely my own fault. Hope­fully the pieces I read (which in­clude many by MIRI and other in­sti­tutes men­tioned in this piece) are rep­re­sen­ta­tive of their over­all port­fo­lio.

Rather, I think FLI’s main work is in con­sen­sus-build­ing. They suc­cess­fully got a large num­ber of lead­ing AI re­searchers to sign the Open Let­ter, which was among other things refer­enced in the White House re­port on AI, though the let­ter took place be­fore the 2016 time frame we are look­ing at. The ‘main­stream­ing’ of the cause over the last two years is plau­si­bly partly at­tributable to FLI; un­for­tu­nately it is very hard to judge to what ex­tent.

They also work on non-AI ex­is­ten­tial risks, in par­tic­u­lar nu­clear war. In keep­ing with the fo­cus of this doc­u­ment and my own limi­ta­tions, I will not at­tempt to eval­u­ate their out­put here, but other po­ten­tial donors should keep it in mind.


CSER is an ex­is­ten­tial risk fo­cused group lo­cated in Cam­bridge. Its found­ing was an­nounced in late 2012 - the or­gani­sa­tion ex­isted in some form in March 2014.

They were sig­nifi­cantly re­spon­si­ble for the award of £10m by the Lev­er­hulme trust to fund a new re­search in­sti­tute in Cam­bridge, the Lev­er­hulme Cen­tre for the Fu­ture of In­tel­li­gence. To the ex­tent CSER is re­spon­si­ble, this is good lev­er­age of fi­nan­cial re­sources. How­ever, other or­gani­sa­tions are also perform­ing out­reach, and there is a limit to how many child or­gani­sa­tions you can spawn in the same city: if the first had trou­ble hiring, might not the sec­ond?

In Au­gust of 2016 CSER pub­lished an up­date, in­clud­ing a list of re­search they cur­rently have un­der­way, to be pub­lished on­line shortly.

While they have held a large num­ber of events, in­clud­ing con­fer­ences, as of De­cem­ber 2016 there is no pub­lished re­search on their web­site. When I reached out to CSER they said they had var­i­ous pieces in the pro­cess of peer re­view, but were not in a po­si­tion to pub­li­cally share them—hope­fully next year or later in De­cem­ber.

In gen­eral I think it is best for re­search to be as open as pos­si­ble. If not shared pub­li­cly it can­not ex­ert very much in­fluence, and we can­not eval­u­ate the im­pact of the or­gani­sa­tion. It is some­what dis­ap­point­ing that CSER has not pro­duced any pub­lic-fac­ing re­search over the course of mul­ti­ple years; ap­par­ently they have had trou­ble hiring.

As such I en­courage CSER to pub­lish (even if not peer re­viewed, though not at the ex­pense of peer re­view) so it can be con­sid­ered for fu­ture dona­tions.


The Global Catas­trophic Risks In­sti­tute is run by Seth Baum and Tony Bar­rett. They have pro­duced work on a va­ri­ety of ex­is­ten­tial risks.

This in­cludes strate­gic work, for ex­am­ple On the Pro­mo­tion of Safe and So­cially Benefi­cial Ar­tifi­cial In­tel­li­gence, which pro­vides a nu­anced dis­cus­sion of how to frame the is­sue so as not to alienate key stake­hold­ers. For ex­am­ple, it ar­gues that an ‘AI arms race’ is bad fram­ing, in­im­i­cal to cre­at­ing a cul­ture of safety. It also high­lights the mil­i­tary and auto in­dus­try as pos­si­ble forces for a safety cul­ture. This pa­per sig­nifi­cantly in­formed my own think­ing on the sub­ject.

Another strate­gic piece is A Model of Path­ways to Ar­tifi­cial Su­per­in­tel­li­gence Catas­tro­phe for Risk and De­ci­sion Anal­y­sis, which ap­plies risk tree anal­y­sis to AI. This seems to be a very me­thod­i­cal ap­proach to the prob­lem.

In both cases the work does not seem re­place­able—it seems un­likely that in­dus­try par­ti­ci­pants would pro­duce such work.

They also pro­duced some work on en­sur­ing ad­e­quate food sup­ply in the event of dis­aster and on the ethics of space ex­plo­ra­tion, which both seem valuable, but I’m not qual­ified to judge.

Pre­vi­ously Seth Baum sug­gested that one of their main ad­van­tages lay in skill at stake­holder en­gage­ment; while I cer­tainly agree this is very im­por­tant, it’s hard to eval­u­ate from the out­side.

GCRI op­er­ates on a sig­nifi­cantly smaller bud­get than some of the other or­gani­sa­tions; they spent $98,000 in 2015 and ap­prox­i­mately $170,000 in 2016.

Global Pri­ori­ties Project

The Global Pri­ori­ties Pro­ject is a small group at Oxford fo­cus­ing on strate­gic work, in­clud­ing ad­vis­ing gov­ern­ments and pub­lish­ing re­search. For ease of refer­ence, here is that sec­tion again:

Un­der­pro­tec­tion of Un­pre­dictable Statis­ti­cal Lives Com­pared to Pre­dictable Ones ba­si­cally lays the ground­work for reg­u­la­tion /​ in­ter­nal­i­sa­tion of low-p high im­pact risks. The core idea is pretty ob­vi­ous but ob­vi­ous things still need stat­ing, and the point about com­pe­ti­tion from ir­ra­tional com­peti­tors is good and prob­a­bly non-in­tu­itive to many; the same is­sue oc­curs when profit-mo­ti­vated west­ern firms at­tempt to com­pete with ‘strate­gic’ Asian com­peti­tors (e.g. the Chi­nese steel in­dus­try, or var­i­ous Ja­panese firms). While it dis­cusses us­ing in­surance to solve some is­sues, it doesn’t men­tion how set­ting the at­tach point=to­tal as­sets can solve some in­cen­tive al­ign­ment prob­lems. More no­tably, it does not ad­dress the dan­ger that in­dus­try-re­quested reg­u­la­tion can lead to reg­u­la­tory cap­ture. The ar­ti­cle is also be­hind a pay­wall, which seems likely to re­duce its im­pact. The work­ing pa­per, Beyond risk-benefit anal­y­sis: pric­ing ex­ter­nal­ities for gain-of-func­tion re­search of con­cern deals with a similar is­sue. Over­all while I think this work is quite solid eco­nomic the­ory, and ad­dresses a ne­glected over­all topic, I think it is un­likely that this ap­proach will make much differ­ence to AI risk, though it could be use­ful for biose­cu­rity or the like. They also pro­duced Global Catas­trophic Risks, which, mea culpa, I have not read.

Its op­er­a­tions have now been ab­sorbed within CEA, which raises donor fun­gi­bil­ity con­cerns. His­tor­i­cally CEA par­tially ad­dressed this by al­lo­cat­ing un­re­stricted dona­tions in pro­por­tion to re­stricted dona­tions. How­ever, they re­scinded this policy. As a re­sult, I am not sure how donors could en­sure the GPP ac­tu­ally coun­ter­fac­tu­ally benefited from in­creased dona­tions.

GPP has suc­cess­fully pro­duced nu­anced pieces of re­search aimed at pro­vid­ing a foun­da­tion for fu­ture policy. Do­ing so re­quires an un­emo­tional eval­u­a­tion of the situ­a­tion, and a cer­tain apoli­ti­cal at­ti­tude, to en­sure your work can in­fluence both poli­ti­cal par­ties. While the GPP peo­ple seem well suited to this task, CEA ex­ec­u­tives have on a num­ber of oc­ca­sions pro­moted a par­ti­san view of EA, which hope­fully will not af­fect the work of the GPP.

GPP clearly col­lab­o­rates closely with FHI. GPP’s note­wor­thy pub­li­ca­tions this year were coau­thored with FHI, in­clud­ing the fact that lead au­thor Owen Cot­ton-Bar­ratt lists dual af­fili­a­tion. CEA and FHI also share offices. As such it seems likely that, if all GPP’s sup­port­ers de­cided to donate to FHI in­stead of CEA, GPP’s re­searchers might sim­ply end up be­ing em­ployed by FHI. This would en­tail some re­struc­tur­ing costs but the long term im­pact on re­search out­put does not seem very large.

AI Impacts

note: this sec­tion added 2016-12-14.

AI Im­pacts is a small group that does high-level strat­egy work, es­pe­cially on AI timelines, loosely as­so­ci­ated with MIRI.

They seem to have done some quite in­ter­est­ing work—for ex­am­ple the ar­ti­cle on the in­tel­li­gence ca­pac­ity of cur­rent hard­ware, which ar­gues that cur­rent global com­put­ing hard­ware could only sup­port a rel­a­tively small num­ber of EMs. This was quite sur­pris­ing to me, and would make me much more con­fi­dent about the prospects for hu­man­ity if we de­vel­oped EMs soon; a small ini­tial num­ber would al­low us to adapt be­fore their im­pact be­came over­whelming. They also suc­cess­fully found a sig­nifi­cant er­ror in pre­vi­ously pub­lished Xrisk work sig­nifi­cantly un­der­mined the con­clu­sion (which had been that the fore­casts of ex­perts and non-ex­perts did not sig­nifi­cantly differ).

Un­for­tu­nately they do not seem to have a strong record of pub­lish­ing. My im­pres­sion is their work has re­ceived rel­a­tively lit­tle at­ten­tion, partly be­cause of this, though as the in­tended end-user of the re­search ap­pears to be peo­ple who are already very in­ter­ested in AI safety, maybe they do not need much dis­tri­bu­tion.

They were sup­ported by a FLI grant, and ap­par­ently do not have need for ad­di­tional fund­ing at this time.

Xrisks Institute

The X-Risks in­sti­tute ap­pears to be mainly in­volved in pub­lish­ing mag­a­z­ine ar­ti­cles, as op­posed to aca­demic re­search. As I think pop­u­lar out­reach—as op­posed to aca­demic out­reach—is quite low value for AI Risk, and po­ten­tially coun­ter­pro­duc­tive if done poorly, I have not re­viewed their work in great de­tail.

X-Risks Net

The X-risks net have pro­duced a va­ri­ety of strate­gic maps that sum­marise the land­scape around var­i­ous ex­is­ten­tial risks.


All three or­gani­sa­tions are ‘meta’ in some way: CFAR at­tempts helps equip peo­ple to do the re­quired re­search; 80k helps peo­ple choose effec­tive ca­reers, and REG spends money to raise even more.

I think CFAR’s mis­sion, es­pe­cially SPARC, is very in­ter­est­ing; I have donated there in the past. Nate Soares (head of MIRI) cred­its them with pro­duc­ing at least one coun­ter­fac­tual in­cre­men­tal re­searcher, though as MIRI now claims to be dol­lar-con­strained, pre­sum­ably ab­sent CFAR they would have hired the cur­rent marginal can­di­date ear­lier in­stead.

“it led di­rectly to MIRI hires, at least one of which would not have hap­pened oth­er­wise” source

They also re­cently an­nounced change of strat­egy, to­wards more di­rect AI fo­cus.

How­ever I do not know how to eval­u­ate them, so choose to say noth­ing rather than do a bad job.

Other Papers

Ar­guably the best pa­per of the year, Con­crete Prob­lems in AI Safety, was not pub­lished by any of the above or­gani­sa­tions—it was a col­lab­o­ra­tion be­tween re­searchers at Google Brain, Stan­ford, Berkeley and Paul Chris­ti­ano (who is now at OpenAI). It is a high-level strat­egy /​ liter­a­ture re­view /​ ‘point po­ten­tial re­searchers to­wards this so they un­der­stand what to work on’ piece, fo­cus­ing on prob­lems that are rele­vant to and ad­dress­able by main­stream ml re­searchers. It dis­cusses a va­ri­ety of is­sues and sum­marises some of the liter­a­ture, in­clud­ing Re­ward Hack­ing, Scal­able Over­sight (in­clud­ing origi­nal work by Paul Chris­ti­ano), Do­mes­tic­ity/​Low im­pact, Safe Ex­plo­ra­tion, Ro­bust­ness to Distri­bu­tional Shift. It was men­tioned in the White House policy doc­u­ment.

The AGI Con­tain­ment Prob­lem, on AI box de­sign, is also in­ter­est­ing, again not pro­duced by any of the above or­gani­sa­tions. It goes through in some de­tail many prob­lems that a box would have to ad­dress in a sig­nifi­cantly more con­crete and or­ganised way than pre­vi­ous treat­ments of the sub­ject.


I think the most valuable pa­pers this year were basically

In gen­eral I am more con­fi­dent the FHI work will be use­ful than the MIRI work, as it more di­rectly ad­dresses the is­sue. It seems quite likely gen­eral AI could be de­vel­oped via a path that ren­ders the MIRI roadmap un­work­able (e.g. if the an­swer is just to add enough lay­ers to your neu­ral net), though MIRI’s re­cent pivot to­wards ml work seems in­tended to ad­dress this.

How­ever, the MIRI work is sig­nifi­cantly less re­place­able—and FHI is already pretty ir­re­place­able! I ba­si­cally be­lieve that if MIRI were not pur­su­ing it no-one else would. And if MIRI is cor­rect, their work is more vi­tal than FHI’s.

To achieve this out­put, MIRI spent around $1,750,000, while FHI spent around $1,400,000.

Hope­fully my de­liber­a­tions above prove use­ful to some read­ers. Here is my even­tual de­ci­sion, rot13′d so you can do come to your own con­clu­sions first if you wish:

Qbangr gb obgu gur Zn­pu­var Va­gryyv­trapr Erfrnepu Vafgvghgr naq gur Shgher bs Uhz­navgl Vafgvghgr, ohg fbzr­jung ovn­frq gb­jneqf gur sbezre. V jvyy nyfb znxr n fznyyre qbangvba gb gur Ty­bony Pngn­fge­bcuvp Evfxf Vafgvghgr.

How­ever I wish to em­pha­sis that all the above or­gani­sa­tions seem to be do­ing good work on the most im­por­tant is­sue fac­ing mankind. It is the na­ture of mak­ing de­ci­sions un­der scarcity that we must pri­ori­tize some over oth­ers, and I hope that all or­gani­sa­tions will un­der­stand that this nec­es­sar­ily in­volves nega­tive com­par­i­sons at times.

Ne­glected questions

Here are some is­sues that seem to have not been ad­dressed much by re­search dur­ing 2016:

Ne­glected Prob­lems that have to be solved eventually

The prob­lem of Re­ward Hack­ing /​ wire­head­ing—but see Self-Mod­ifi­ca­tion of Policy and Utility Func­tion in Ra­tional Agents

On­tol­ogy Iden­ti­fi­ca­tion—how can an AI match up its goal struc­ture with its rep­re­sen­ta­tion of the world

Nor­ma­tive Uncer­tainty—how should an AI act if it is un­cer­tain about the true value (ex­cept inas­much as it is im­plic­itly ad­dressed by Value Learn­ing pa­pers like Owain’s)

Value Ex­trap­o­la­tion—how do we go from a per­son’s ac­tual val­ues, which might be con­tra­dic­tory, to some sort of re­flec­tive equil­ibria? And how do we com­bine the val­ues of mul­ti­ple peo­ple?

Stolen Fu­ture—how do we en­sure first mover ad­van­tages don’t al­low a small group of peo­ple, whose val­ues do not re­flect those of wider hu­man­ity, past and pre­sent, from gain­ing un­due in­fluence.

Ne­glected Prob­lems that would prob­a­bly helpful to be solved

Do­mes­tic­ity—How to de­sign an AI that tries not to af­fect the world much, vol­un­tar­ily stay­ing in its box.

Differ­en­tial progress—is it ad­van­ta­geous to pro­mote a cer­tain type of AI de­vel­op­ment above oth­ers?


I shared a draft of this doc­u­ment with rep­re­sen­ta­tives of MIRI, FHI, FLI, GPP, CSER and GCRI. CSER was un­able to re­view the doc­u­ment due to their an­nual con­fer­ence. I’m very grate­ful for Greg Lewis, Alex Flint and Jess Riedel for helping re­view this doc­u­ment. Any re­main­ing in­ad­e­qua­cies and mis­takes are my own.

I in­terned at MIRI back when it was SIAI, vol­un­teered very briefly at GWWC (part of CEA) and once ap­plied for a job at FHI. I am per­sonal friends with peo­ple at many of the above or­gani­sa­tions.

I added the sec­tion on AI Im­pacts and made cor­rected some ty­pos 2016-12-14.


Am­ran Sid­diqui, Alan Fern, Thomas Diet­ter­ich and Shub­ho­moy Das; Finite Sam­ple Com­plex­ity of Rare Pat­tern Ano­maly De­tec­tion; http://​​auai.org/​​uai2016/​​pro­ceed­ings/​​pa­pers/​​226.pdf

An­drew Critch; Para­met­ric Bounded Lob’s The­o­rem and Ro­bust Co­op­er­a­tion of Bounded Agents; http://​​arxiv.org/​​abs/​​1602.04184

An­thony Bar­rett and Seth Baum; A Model of Path­ways to Ar­tifi­cial Su­per­in­tel­li­gence Catas­tro­phe for Risk and De­ci­sion Anal­y­sis; http://​​seth­baum.com/​​ac/​​fc_AI-Path­ways.html

An­thony M. Bar­rett and Seth D. Baum; Risk anal­y­sis and risk man­age­ment for the ar­tifi­cial su­per­in­tel­li­gence re­search and de­vel­op­ment pro­cess; http://​​seth­baum.com/​​ac/​​fc_AI-RandD.html

Aristide C Y Tos­sou and Chris­tos Dimi­trakakis; Al­gorithms for Differ­en­tially Pri­vate Multi-Armed Ban­dits; https://​​arxiv.org/​​pdf/​​1511.08681.pdf

Bas Ste­une­brink, Kristinn Tho­ris­son, Jur­gen Sch­mid­hu­ber; Grow­ing Re­cur­sive Self-Im­provers; http://​​peo­ple.id­sia.ch/​​~ste­une­brink/​​Publi­ca­tions/​​AGI16_grow­ing_re­cur­sive_self-im­provers.pdf

Carolyn Kim, Ashish Sab­har­wal and Ste­fano Er­mon; Ex­act sam­pling with in­te­ger lin­ear pro­grams and ran­dom per­tur­ba­tions; https://​​cs.stan­ford.edu/​​~er­mon/​​pa­pers/​​kim-sab­har­wal-er­mon.pdf

Chang Liu, Jes­sica Ham­rick, Jaime Fisac, Anca Dra­gan, Karl Hedrick, Shankar Sas­try and Thomas Griffiths; Goal In­fer­ence Im­proves Ob­jec­tive and Per­ceived Perfor­mance in Hu­man Robot Col­lab­o­ra­tion; http://​​www.jessham­rick.com/​​pub­li­ca­tions/​​pdf/​​Liu2016-Goal_In­fer­ence_Im­proves_Ob­jec­tive.pdf

Dario Amodei, Chris Olah, Ja­cob Stein­hardt, Paul Chris­ti­ano, John Schul­man, Dan Mané; Con­crete Prob­lems in AI Safety; https://​​arxiv.org/​​abs/​​1606.06565

David Silk; Limits to Ver­ifi­ca­tion and val­i­da­tion and ar­tifi­cial in­tel­li­gence; https://​​arxiv.org/​​abs/​​1604.06963

Dy­lan Had­field-Menell, Anca Dra­gan, Pieter Abbeel, Stu­art Rus­sell; Co­op­er­a­tive In­verse Re­in­force­ment Learn­ing; https://​​arxiv.org/​​abs/​​1606.03137

Dy­lan Had­field-Menell; The Off Switch; https://​​in­tel­li­gence.org/​​files/​​csr­bai/​​had­field-menell-slides.pdf

Ed Felten and Terah Lyons; The Ad­minis­tra­tion’s Re­port on the Fu­ture of Ar­tifi­cial In­tel­li­gence; https://​​www.white­house.gov/​​blog/​​2016/​​10/​​12/​​ad­minis­tra­tions-re­port-fu­ture-ar­tifi­cial-intelligence

Fed­erico Pistono, Ro­man V Yam­polskiy; Uneth­i­cal Re­search: How to Create a Malev­olent Ar­tifi­cial In­tel­li­gence; https://​​arxiv.org/​​ftp/​​arxiv/​​pa­pers/​​1605/​​1605.02817.pdf

Fereshte Khani, Martin Ri­nard and Percy Liang; Unan­i­mous pre­dic­tion for 100% pre­ci­sion with ap­pli­ca­tion to learn­ing se­man­tic map­pings; https://​​arxiv.org/​​abs/​​1606.06368

Ja­cob Stein­hardt, Percy Liang; Un­su­per­vised Risk Es­ti­ma­tion with only Struc­tural As­sump­tions; cs.stan­ford.edu/​​~jstein­hardt/​​pub­li­ca­tions/​​risk-es­ti­ma­tion/​​preprint.pdf

Ja­cob Stein­hardt, Gre­gory Vali­ant and Moses Char­ikar; Avoid­ing Im­posters and Delin­quents: Ad­ver­sar­ial Crowd­sourc­ing and Peer Pre­dic­tion; https://​​arxiv.org/​​abs/​​1606.05374

James Bab­cock, Janos Kra­mar, Ro­man Yam­polskiy; The AGI Con­tain­ment Prob­lem; https://​​arxiv.org/​​pdf/​​1604.00545v3.pdf

Jan Leike, Jes­sica Tay­lor, Benya Fallen­stein; A For­mal Solu­tion to the Gain of Truth Prob­lem; http://​​www.auai.org/​​uai2016/​​pro­ceed­ings/​​pa­pers/​​87.pdf

Jan Leike, Tor Lat­ti­more, Lau­rent Orseau and Mar­cus Hut­ter; Thomp­son Sam­pling is Asymp­tot­i­cally Op­ti­mal in Gen­eral En­vi­ro­ments; https://​​arxiv.org/​​abs/​​1602.07905

Jan Leike; Ex­plo­ra­tion Po­ten­tial; https://​​arxiv.org/​​abs/​​1609.04994

Jan Leike; Non­para­met­ric Gen­eral Re­in­force­ment Learn­ing; https://​​jan.leike.name/​​pub­li­ca­tions/​​Non­para­met­ric%20Gen­eral%20Re­in­force­ment%20Learn­ing%20-%20Leike%202016.pdf

Jes­sica Tay­lor, Eliezer Yud­kowsky, Pa­trick LaVic­toire, An­drew Critch; Align­ment for Ad­vanced Ma­chine Learn­ing Sys­tems; https://​​in­tel­li­gence.org/​​files/​​Align­men­tMachineLearn­ing.pdf

Jes­sica Tay­lor; Quan­tiliz­ers: A Safer Alter­na­tive to Max­i­miz­ers for Limited Op­ti­miza­tion; http://​​www.aaai.org/​​ocs/​​in­dex.php/​​WS/​​AAAIW16/​​pa­per/​​view/​​12613

Joshua Greene, Francesca Rossi, John Ta­sioulas, Kristen Brent Ven­able, Brian Willi­ams; Embed­ding Eth­i­cal Prin­ci­ples in Col­lec­tive De­ci­sion Sup­port Sys­tems; http://​​www.aaai.org/​​ocs/​​in­dex.php/​​AAAI/​​AAAI16/​​pa­per/​​view/​​12457

Joshua Greene; Our driver­less dilemma; http://​​sci­ence.sci­encemag.org/​​con­tent/​​352/​​6293/​​1514

Kaj So­tala; Defin­ing Hu­man Values for Value Learn­ers; http://​​www.aaai.org/​​ocs/​​in­dex.php/​​WS/​​AAAIW16/​​pa­per/​​view/​​12633

Kristinn R. Thóris­son, Jordi Bieger, Thrös­tur Tho­rarensen, Jóna S. Sigurðardót­tir and Bas R. Ste­une­brink; Why Ar­tifi­cial In­tel­li­gence Needs a Task The­ory (And What It Might Look Like); http://​​peo­ple.id­sia.ch/​​~ste­une­brink/​​Publi­ca­tions/​​AGI16_task_the­ory.pdf

Kristinn Tho­ris­son; About Un­der­stand­ing;

Lau­rent Orseau and Stu­art Arm­strong; Safely In­ter­rupt­ible Agents; http://​​www.auai.org/​​uai2016/​​pro­ceed­ings/​​pa­pers/​​68.pdf

Lun-Kai Hsu, Tu­dor Achim and Ste­fano Er­mon; Tight vari­a­tional bounds via ran­dom pro­jec­tions and i-pro­jec­tions; https://​​arxiv.org/​​abs/​​1510.01308

Marc Lip­sitch, Ni­cholas Evans, Owen Cot­ton-Bar­ratt; Un­der­pro­tec­tion of Un­pre­dictable Statis­ti­cal Lives Com­pared to Pre­dictable Ones; http://​​on­linelibrary.wiley.com/​​doi/​​10.1111/​​risa.12658/​​full

Nate Soares and Benya Fallen­stein; Align­ing Su­per­in­tel­li­gence with Hu­man In­ter­ests: A Tech­ni­cal Re­search Agenda; https://​​in­tel­li­gence.org/​​files/​​Tech­ni­calA­genda.pdf

Nate Soares; MIRI OSTP sub­mis­sion; https://​​in­tel­li­gence.org/​​2016/​​07/​​23/​​ostp/​​

Nate Soares; The Value Learn­ing Prob­lem; https://​​in­tel­li­gence.org/​​files/​​ValueLearn­ingProb­lem.pdf

Nathan Ful­ton, An­dre Plater; A logic of proofs for differ­en­tial dy­namic logic: Toward in­de­pen­dently check­able proof cer­tifi­cates for dy­namic log­ics; http://​​nful­ton.org/​​pa­pers/​​lpdl.pdf

Nick Bostrom; Strate­gic Im­pli­ca­tions of Open­ness in AI Devel­op­ment; http://​​www.nick­bostrom.com/​​pa­pers/​​open­ness.pdf

Owain Evans, An­dreas Stuh­lmul­ler, Noah Good­man; Learn­ing the Prefer­ences of Bounded Agents; https://​​www.fhi.ox.ac.uk/​​wp-con­tent/​​up­loads/​​nips-work­shop-2015-web­site.pdf

Owain Evans, An­dreas Stuh­lmul­ler, Noah Good­man; Learn­ing the Prefer­ences of Ig­no­rant, In­con­sis­tent Agents; https://​​arxiv.org/​​abs/​​1512.05832

Owen Cot­ton-Bar­ratt, Se­bas­tian Far­quhar, An­drew Sny­der-Beat­tie; Beyond Risk-Benefit Anal­y­sis: Pric­ing Ex­ter­nal­ities for Gain-of-Func­tion Re­search of Con­cern; http://​​globalpri­ori­tiespro­ject.org/​​2016/​​03/​​be­yond-risk-benefit-anal­y­sis-pric­ing-ex­ter­nal­ities-for-gain-of-func­tion-re­search-of-con­cern/​​

Owen Cot­ton-Bar­ratt, Se­bas­tian Far­quhar, John Halstead, Ste­fan Schu­bert, An­drew Sny­der-Beat­tie; Global Catas­trophic Risks 2016; http://​​globalpri­ori­tiespro­ject.org/​​2016/​​04/​​global-catas­trophic-risks-2016/​​

Peter Asaro; The Li­a­bil­ity Prob­lem for Au­tonomous Ar­tifi­cial Agents; https://​​www.aaai.org/​​ocs/​​in­dex.php/​​SSS/​​SSS16/​​pa­per/​​view/​​12699

Phil Tor­res; Agen­tial Risks: A Com­pre­hen­sive In­tro­duc­tion; http://​​jet­press.org/​​v26.2/​​tor­res.pdf

Ro­man V. Yam­polskiy; Ar­tifi­cial In­tel­li­gence Safety and Cy­ber­se­cu­rity: a Timeline of AI Failures; https://​​arxiv.org/​​ftp/​​arxiv/​​pa­pers/​​1610/​​1610.07997.pdf

Ro­man V. Yam­polskiy; Tax­on­omy of Path­ways to Danger­ous Ar­tifi­cial In­tel­li­gence; https://​​arxiv.org/​​abs/​​1511.03246

Ro­man Yam­polskiy; Ver­ifier The­ory from Ax­ioms to Un­ver­ifi­a­bil­ity of Math­e­mat­ics; http://​​​​abs/​​1609.00331

Scott Garrabrant, Tsvi Ben­son-Tilsen, An­drew Critch, Nate Soares and Jes­sica Tay­lor; Log­i­cal In­duc­tion; https://​​in­tel­li­gence.org/​​2016/​​09/​​12/​​new-pa­per-log­i­cal-in­duc­tion/​​

Scott Garrabrant, Benya Fallen­stein, Abram Dem­ski, Nate Soares; In­duc­tive Co­her­ence; https://​​arxiv.org/​​abs/​​1604.05288

Scott Garrabrant, Benya Fallen­stein, Abram Dem­ski, Nate Soares; Uniform Co­her­ence; https://​​arxiv.org/​​abs/​​1604.05288

Scott Garrabrant, Nate Soares and Jes­sica Tay­lor; Asymp­totic Con­ver­gence in On­line Learn­ing with Un­bounded De­lays; https://​​arxiv.org/​​abs/​​1604.05280

Scott Garrabrant, Sid­dharth Bhaskar, Abram Dem­ski, Joanna Garrabrant, Ge­orge Koleszarik and Evan Lloyd; Asymp­totic Log­i­cal Uncer­tainty and the Ben­ford Test; http://​​arxiv.org/​​abs/​​1510.03370

Seth Baum and An­thony Bar­rett; The most ex­treme risks: Global catas­tro­phes; http://​​seth­baum.com/​​ac/​​fc_Ex­treme.html

Seth Baum; On the Pro­mo­tion of Safe and So­cially Benefi­cial Ar­tifi­cial In­tel­li­gence; https://​​pa­pers.ssrn.com/​​sol3/​​pa­pers.cfm?ab­stract_id=2816323

Seth Baum; The Ethics of Outer Space: A Con­se­quen­tial­ist Per­spec­tive; http://​​seth­baum.com/​​ac/​​2016_SpaceEthics.html

Shengjia Zhao, So­rathan Chat­u­rapruek, Ashish Sab­har­wal andSte­fano Er­mon; Clos­ing the gap be­tween short and long xors for model count­ing; https://​​arxiv.org/​​abs/​​1512.08863

Soenke Ziesche and Ro­man V. Yam­polskiy; Ar­tifi­cial Fun: Map­ping Minds to the Space of Fun; https://​​arxiv.org/​​abs/​​1606.07092

Stephanie Rosen­thal, Sai Sel­varaj, Manuela Veloso; Ver­bal­iza­tion: Nar­ra­tion of Au­tonomous Robot Ex­pe­rience.; http://​​www.ij­cai.org/​​Pro­ceed­ings/​​16/​​Papers/​​127.pdf

Stephen M. Omo­hun­dro; The Ba­sic AI Drives; https://​​self­awaresys­tems.files.word­press.com/​​2008/​​01/​​ai_drives_fi­nal.pdf

Stu­art Arm­strong; Off-policy Monte Carlo agents with vari­able be­havi­our poli­cies; https://​​www.fhi.ox.ac.uk/​​wp-con­tent/​​up­loads/​​monte_carlo_arXiv.pdf

Tom Ever­itt and mar­cus Hut­ter; Avoid­ing wire­head­ing with value re­in­force­ment learn­ing; https://​​arxiv.org/​​abs/​​1605.03143

Tom Ever­itt, Daniel Filan, Mayank Daswani, and Mar­cus Hut­ter; Self-Mod­ifi­ca­tion of Policy and Utility Func­tion in Ra­tional Agents; http://​​www.tomever­itt.se/​​pa­pers/​​AGI16-sm.pdf

Tsvi Ben­son-Tilsen, Nate Soares; For­mal­iz­ing Con­ver­gent In­stru­men­tal Goals; http://​​www.aaai.org/​​ocs/​​in­dex.php/​​WS/​​AAAIW16/​​pa­per/​​view/​​12634

Tu­dor Achim, Ashish Sab­har­wal, Ste­fano Er­mon; Beyond par­ity con­straints: Fourier anal­y­sis of hash func­tions for in­fer­ence; http://​​www.jmlr.org/​​pro­ceed­ings/​​pa­pers/​​v48/​​achim16.html

Vin­cent Conitzer, Walter Sin­nott-Arm­strong, Jana Schaich Borg, Yuan Deng and Max Kramer; Mo­ral De­ci­sion Mak­ing Frame­works for Ar­tifi­cial In­tel­li­gence; https://​​users.cs.duke.edu/​​~conitzer/​​moralAAAI17.pdf

Vin­cent Mul­ler and Nick Bostrom; Fu­ture Progress in Ar­tifi­cial In­tel­li­gence: A sur­vey of Ex­pert Opinion; www.nick­bostrom.com/​​pa­pers/​​sur­vey.pdf

Vit­to­rio Per­era, Sai P. Selveraj, Stephanie Rosen­thal, Manuela Veloso; Dy­namic Gen­er­a­tion and Refine­ment of Robot Ver­bal­iza­tion; http://​​www.cs.cmu.edu/​​~mmv/​​pa­pers/​​16ro­man-ver­bal­iza­tion.pdf

Zuhe Zhang, Ben­jamin Ru­bin­stein, Chris­tos Dimi­trakakis; On the Differ­en­tial Pri­vacy of Bayesian In­fer­ence; https://​​arxiv.org/​​abs/​​1512.06992