Long-Term Future Fund: August 2019 grant recommendations

Note: The Q4 dead­line for ap­pli­ca­tions to the Long-Term Fu­ture Fund is Fri­day 11th Oc­to­ber. Ap­ply here.

We opened up an ap­pli­ca­tion for grant re­quests ear­lier this year, and it was open for about one month. This post con­tains the list of grant re­cip­i­ents for Q3 2019, as well as some of the rea­son­ing be­hind the grants. Most of the fund­ing for these grants has already been dis­tributed to the re­cip­i­ents.

In the write­ups be­low, we ex­plain the pur­pose for each grant and sum­ma­rize our rea­son­ing be­hind their recom­men­da­tion. Each sum­mary is writ­ten by the fund man­ager who was most ex­cited about recom­mend­ing the rele­vant grant (with a few ex­cep­tions that we’ve noted be­low). Th­ese differ a lot in length, based on how much available time the differ­ent fund mem­bers had to ex­plain their rea­son­ing.

When we’ve shared ex­cerpts from an ap­pli­ca­tion, those ex­cerpts may have been lightly ed­ited for con­text or clar­ity.

Grant Recipients

Grants Made By the Long-Term Fu­ture Fund

Each grant re­cip­i­ent is fol­lowed by the size of the grant and their one-sen­tence de­scrip­tion of their pro­ject. All of these grants have been made.

  • Sa­muel Hil­ton, on be­half of the HIPE team ($60,000): Plac­ing a staff mem­ber within the gov­ern­ment, to sup­port civil ser­vants to do the most good they can.

  • Stag Lynn ($23,000): To spend the next year lev­el­ing up var­i­ous tech­ni­cal skills with the goal of be­com­ing more im­pact­ful in AI safety.

  • Roam Re­search ($10,000): Work­flowy, but with much more power to or­ga­nize your thoughts and col­lab­o­rate with oth­ers.

  • Alexan­der Gietelink Ol­den­z­iel ($30,000): In­de­pen­dent AI Safety think­ing, do­ing re­search in as­pects of self-refer­ence in us­ing tech­niques from type the­ory, topos the­ory and cat­e­gory the­ory more gen­er­ally.

  • Alexan­der Sie­gen­feld ($20,000): Char­ac­ter­iz­ing the prop­er­ties and con­straints of com­plex sys­tems and their ex­ter­nal in­ter­ac­tions.

  • Sören Min­der­mann ($36,982): Ad­di­tional fund­ing for an AI strat­egy PhD at Oxford /​ FHI to im­prove my re­search productivity

  • AI Safety Camp ($41,000): A re­search ex­pe­rience pro­gram for prospec­tive AI safety re­searchers.

  • Miranda Dixon-Luinen­burg ($13,500): Writ­ing EA-themed fic­tion that ad­dresses X-risk top­ics.

  • David Man­heim ($30,000): Multi-model ap­proach to cor­po­rate and state ac­tors rele­vant to ex­is­ten­tial risk miti­ga­tion.

  • Joar Skalse ($10,000): Up­skil­ling in ML in or­der to be able to do pro­duc­tive AI safety re­search sooner than oth­er­wise.

  • Chris Cham­bers ($36,635): Com­bat pub­li­ca­tion bias in sci­ence by pro­mot­ing and sup­port­ing the Registered Re­ports jour­nal for­mat.

  • Jess Whit­tle­stone ($75,080): Re­search on the links be­tween short- and long-term AI policy while skil­ling up in tech­ni­cal ML.

  • Lynette Bye ($23,000): Pro­duc­tivity coach­ing for effec­tive al­tru­ists to in­crease their im­pact.

To­tal dis­tributed: $439,197

Other Recommendations

The fol­low­ing peo­ple and or­ga­ni­za­tions were ap­pli­cants who got al­ter­na­tive sources of fund­ing, or de­cided to work on a differ­ent pro­ject. The Long-Term Fu­ture Fund recom­mended grants to them, but did not end up fund­ing them.

The fol­low­ing recom­men­da­tion still has a writeup be­low:

  • Cen­ter for Ap­plied Ra­tion­al­ity ($150,000): Help promis­ing peo­ple to rea­son more effec­tively and find high-im­pact work, such as re­duc­ing x-risk.

We did not write up the fol­low­ing recom­men­da­tions:

  • Jake Coble, who re­quested $10,000 to con­duct re­search alongside Si­mon Beard of CSER. This grant re­quest came with an early dead­line, so we made the recom­men­da­tion ear­lier in the grant cy­cle. How­ever, af­ter our recom­men­da­tion went out, Jake found a differ­ent pro­ject he preferred, and no longer re­quired fund­ing.

  • We recom­mended an­other in­di­vi­d­ual for a grant, but they wound up ac­cept­ing fund­ing from an­other source. (They re­quested that we not share their name; we would have shared this in­for­ma­tion had they re­ceived fund­ing from us.)

Wri­te­ups by He­len Toner

Sa­muel Hil­ton, on be­half of the HIPE team ($60,000)

Plac­ing a staff mem­ber within the gov­ern­ment, to sup­port civil ser­vants to do the most good they can.

This grant sup­ports HIPE (https://​​hipe.org.uk), a UK-based or­ga­ni­za­tion that helps civil ser­vants to have high-im­pact ca­reers. HIPE’s pri­mary ac­tivi­ties are re­search­ing how to have a pos­i­tive im­pact in the UK gov­ern­ment; dis­sem­i­nat­ing their find­ings via work­shops, blog posts, etc.; and pro­vid­ing one-on-one sup­port to in­ter­ested in­di­vi­d­u­als.

HIPE has so far been en­tirely vol­un­teer-run. This grant funds part of the cost of a full-time staff mem­ber for two years, plus some office and travel costs.

Our rea­son­ing for mak­ing this grant is based on our im­pres­sion that HIPE has already been able to gain some trac­tion as a vol­un­teer or­ga­ni­za­tion, and on the fact that they now have the op­por­tu­nity to place a full-time staff mem­ber within the Cabi­net Office. We see this both as a promis­ing op­por­tu­nity in its own right, and also as a pos­i­tive sig­nal about the en­gage­ment HIPE has been able to cre­ate so far. The fact that the Cabi­net Office is will­ing to provide desk space and cover part of the over­head cost for the staff mem­ber sug­gests that HIPE is en­gag­ing suc­cess­fully with its core au­di­ences.

HIPE does not yet have ro­bust ways of track­ing its im­pact, but they ex­pressed strong in­ter­est in im­prov­ing their im­pact track­ing over time. We would hope to see a more fleshed-out im­pact eval­u­a­tion if we were asked to re­new this grant in the fu­ture.

I’ll add that I (He­len) per­son­ally see promise in the idea of ser­vices that offer ca­reer dis­cus­sion, coach­ing, and men­tor­ing in more spe­cial­ized set­tings. (Other fund mem­bers may agree with this, but it was not part of our dis­cus­sion when de­cid­ing whether to make this grant, so I’m not sure.)

Wri­te­ups by Alex Zhu

Stag Lynn ($23,000)

To spend the next year lev­el­ing up var­i­ous tech­ni­cal skills with the goal of be­com­ing more im­pact­ful in AI safety

Stag’s cur­rent in­ten­tion is to spend the next year im­prov­ing his skills in a va­ri­ety of ar­eas (e.g. pro­gram­ming, the­o­ret­i­cal neu­ro­science, and game the­ory) with the goal of con­tribut­ing to AI safety re­search, meet­ing rele­vant peo­ple in the x-risk com­mu­nity, and helping out in EA/​ra­tio­nal­ity re­lated con­texts wher­ever he can (eg, at ra­tio­nal­ity sum­mer camps like SPARC and ESPR).

Two pro­jects he may pur­sue dur­ing the year:

  • Work­ing to im­ple­ment cer­tifi­cates of im­pact in the EA/​X-risk com­mu­nity, in the hope of en­courag­ing co­or­di­na­tion be­tween fun­ders with differ­ent val­ues and in­creas­ing trans­parency around the con­tri­bu­tions of differ­ent peo­ple to im­pact­ful pro­jects.

  • Work­ing as an un­paid per­sonal as­sis­tant to some­one in EA who is suffi­ciently busy for this form of as­sis­tance to be use­ful, and suffi­ciently pro­duc­tive for the as­sis­tance to be valuable.

I recom­mended fund­ing Stag be­cause I think he is smart, pro­duc­tive, and al­tru­is­tic, has a track record of do­ing use­ful work, and will con­tribute more use­fully to re­duc­ing ex­is­ten­tial risk by di­rectly de­vel­op­ing his ca­pa­bil­ities and em­bed­ding him­self in the EA com­mu­nity than he would by finish­ing his un­der­grad­u­ate de­gree or work­ing a full-time job. While I’m not yet clear on what pro­jects he will pur­sue, I think it’s likely that the end re­sult will be very valuable — pro­jects like im­pact cer­tifi­cates re­quire sub­stan­tial work from some­one with tech­ni­cal and ex­e­cu­tional skills, and Stag seems to me to fit the bill.

More on Stag’s back­ground: In high school, Stag had top finishes in var­i­ous Lat­vian and Euro­pean Olympiads, in­clud­ing a gold medal in the 2015 Lat­vian Olympiad in Math­e­mat­ics. Stag has also pre­vi­ously taken the ini­ti­a­tive to work on EA causes—for ex­am­ple, he joined two other peo­ple in Latvia in at­tempt­ing to cre­ate the Lat­vian chap­ter of Effec­tive Altru­ism (which reached the point of cre­at­ing a Lat­vian-lan­guage web­site), and he has vol­un­teered to take on ma­jor re­spon­si­bil­ities in fu­ture iter­a­tions of the Euro­pean Sum­mer Pro­gram in Ra­tion­al­ity (which in­tro­duces promis­ing high-school stu­dents to effec­tive al­tru­ism).

Po­ten­tial con­flict of in­ter­est: at the time of mak­ing the grant, Stag was liv­ing with me and helping me with var­i­ous odd jobs, as part of his plan to meet peo­ple in the EA com­mu­nity and help out where he could. This ar­range­ment lasted for about 1.5 months. To com­pen­sate for this po­ten­tial is­sue, I’ve in­cluded notes on Stag from Oliver Habryka, an­other fund man­ager.

Oliver Habryka’s com­ments on Stag Lynn

I’ve in­ter­acted with Stag in the past and have broadly pos­i­tive im­pres­sions of him, in par­tic­u­lar his ca­pac­ity for in­de­pen­dent strate­gic thinking

Stag has achieved a high level of suc­cess in Lat­vian and Galois Math­e­mat­i­cal Olympiads. I gen­er­ally think that suc­cess in these com­pe­ti­tions is one of the best pre­dic­tors we have of a per­son’s fu­ture perfor­mance on mak­ing in­tel­lec­tual progress on core is­sues in AI safety. See also my com­ments and dis­cus­sion on the grant to Misha Yagudin last round.

Stag has also con­tributed sig­nifi­cantly to im­prov­ing both ESPR and SPARC , both of which in­tro­duce tal­ented pre-col­lege stu­dents to core ideas in EA and AI safety. In par­tic­u­lar, he’s helped the pro­grams find and se­lect strong par­ti­ci­pants, while sug­gest­ing cur­ricu­lum changes that gave them more op­por­tu­ni­ties to think in­de­pen­dently about im­por­tant is­sues. This gives me a pos­i­tive im­pres­sion of Stag’s abil­ity to con­tribute to other pro­jects in the space. (I also con­sider ESPR and SPARC to be among the most cost-effec­tive ways to get more ex­cel­lent peo­ple in­ter­ested in work­ing on top­ics of rele­vance to the long-term fu­ture, and take this as an­other sig­nal of Stag’s tal­ent at se­lect­ing and/​or im­prov­ing pro­jects.)

Roam Re­search ($10,000)

Work­flowy, but with much more power to or­ga­nize your thoughts and col­lab­o­rate with oth­ers.

Roam is a web ap­pli­ca­tion which au­to­mates the Zet­telkas­ten method, a note-tak­ing /​ doc­u­ment-draft­ing pro­cess based on phys­i­cal in­dex cards. While it is difficult to start us­ing the sys­tem, those who do of­ten find it ex­tremely helpful, in­clud­ing a re­searcher at MIRI who claims that the method dou­bled his re­search pro­duc­tivity.

On my in­side view, if Roam suc­ceeds, an ex­pe­rienced user of the note-tak­ing app Work­flowy will get at least as much value switch­ing to Roam as they got from us­ing Work­flowy in the first place. (Many EAs, my­self in­cluded, see Work­flowy as an in­te­gral part of our in­tel­lec­tual pro­cess, and I think Roam might be­come even more in­te­gral than Work­flowy. See also Sarah Con­stantin’s re­view of Roam, which de­scribes Roam as be­ing po­ten­tially as “profound a men­tal pros­thetic as hy­per­text”, and her more re­cent en­dorse­ment of Roam.)

Over the course of the last year, I’ve had in­ter­mit­tent con­ver­sa­tions with Conor White-Sul­li­van, Roam’s CEO, about the app. I started out in a po­si­tion of skep­ti­cism: I doubted that Roam would ever have ac­tive users, let alone suc­ceed at its stated mis­sion. After a re­cent up­date call with Conor about his LTF Fund ap­pli­ca­tion, I was en­couraged enough by Roam’s most re­cent progress, and suffi­ciently con­vinced of the pos­si­ble up­sides of its pos­si­ble suc­cess, that I de­cided to recom­mend a grant to Roam.

Since then, Roam has de­vel­oped enough as a product that I’ve per­son­ally switched from Work­flowy to Roam and now recom­mend Roam to my friends. Roam’s progress on its product, com­bined with its grow­ing base of ac­tive users, has led me to feel sig­nifi­cantly more op­ti­mistic about Roam suc­ceed­ing at its mis­sion.

(This fund­ing will sup­port Roam’s gen­eral op­er­at­ing costs, in­clud­ing ex­penses for Conor, one em­ployee, and sev­eral con­trac­tors.)

Po­ten­tial con­flict of in­ter­est: Conor is a friend of mine, and I was once his house­mate for a few months.

In­de­pen­dent AI Safety think­ing, do­ing re­search in as­pects of self-refer­ence in us­ing tech­niques from type the­ory, topos the­ory and cat­e­gory the­ory more gen­er­ally.

In our pre­vi­ous round of grants, we funded MIRI as an or­ga­ni­za­tion: see our April re­port­for a de­tailed ex­pla­na­tion of why we chose to sup­port their work. I think Alexan­der’s re­search di­rec­tions could lead to sig­nifi­cant progress on MIRI’s re­search agenda — in fact, MIRI was suffi­ciently im­pressed by his work that they offered him an in­tern­ship. I have also spo­ken to him in some depth, and was im­pressed both by his re­search taste and clar­ity of thought.

After the in­tern­ship ends, I think it will be valuable for Alexan­der to have ad­di­tional fund­ing to dig deeper into these top­ics; I ex­pect this grant to sup­port roughly 1.5 years of re­search. Dur­ing this time, he will have reg­u­lar con­tact with re­searchers at MIRI, re­port­ing on his re­search progress and re­ceiv­ing feed­back.

Alexan­der Sie­gen­feld ($20,000)

Char­ac­ter­iz­ing the prop­er­ties and con­straints of com­plex sys­tems and their ex­ter­nal in­ter­ac­tions.

Alexan­der is a 5th-year grad­u­ate stu­dent in physics at MIT, and he wants to con­duct in­de­pen­dent de­con­fu­sion re­search for AI safety. His goal is to get a bet­ter con­cep­tual un­der­stand­ing of multi-level world mod­els by com­ing up with bet­ter for­mal­isms for an­a­lyz­ing com­plex sys­tems at differ­ing lev­els of scale, build­ing off of the work of Ya­neer Bar-Yam. (Ya­neer is Alexan­der’s ad­vi­sor, and the pres­i­dent of the New England Com­plex Science In­sti­tute.)

I de­cided to recom­mend fund­ing to Alexan­der be­cause I think his re­search di­rec­tions are promis­ing, and be­cause I was per­son­ally im­pressed by his tech­ni­cal abil­ities and his clar­ity of thought. Tsvi Ben­son-Tilsen, a MIRI re­searcher, was also im­pressed enough by Alexan­der to recom­mend that the Fund sup­port him. Alexan­der plans to pub­lish a pa­per on his re­search; it will be eval­u­ated by re­searchers at MIRI, helping him de­cide how best to pur­sue fur­ther work in this area.

Po­ten­tial con­flict of in­ter­est: Alexan­der and I have been friends since our un­der­grad­u­ate years at MIT.

Wri­te­ups by Oliver Habryka

I have a sense that fun­ders in EA, usu­ally due to time con­straints, tend to give lit­tle feed­back to or­ga­ni­za­tions they fund (or de­cide not to fund). In my write­ups be­low, I tried to be as trans­par­ent as pos­si­ble in ex­plain­ing the rea­sons for why I came to be­lieve that each grant was a good idea, my great­est un­cer­tain­ties and/​or con­cerns with each grant, and some back­ground mod­els I use to eval­u­ate grants. (I hope this last item will help oth­ers bet­ter un­der­stand my fu­ture de­ci­sions in this space.)

I think that there ex­ist more pub­li­cly defen­si­ble (or eas­ier to un­der­stand) ar­gu­ments for some of the grants that I recom­mended. How­ever, I tried to ex­plain the ac­tual mod­els that drove my de­ci­sions for these grants, which are of­ten hard to sum­ma­rize in a few para­graphs. I apol­o­gize in ad­vance that some of the ex­pla­na­tions be­low are prob­a­bly difficult to un­der­stand.

Thoughts on grant se­lec­tion and grant incentives

Some higher-level points on many of the grants be­low, as well as many grants from last round:

For al­most ev­ery grant we make, I have a lot of opinions and thoughts about how the ap­pli­cant(s) could achieve their aims bet­ter. I also have a lot of ideas for pro­jects that I would pre­fer to fund over the grants we are ac­tu­ally mak­ing.

How­ever, in the cur­rent struc­ture of the LTFF, I pri­mar­ily have the abil­ity to se­lect po­ten­tial grantees from an es­tab­lished pool, rather than en­courag­ing the cre­ation of new pro­jects. Alongside my time con­straints, this means that I have a very limited abil­ity to con­tribute to the pro­jects with my own thoughts and mod­els.

Ad­di­tion­ally, I spend a lot of time think­ing in­de­pen­dently about these ar­eas, and have a broad view of “ideal pro­jects that could be made to ex­ist.” This means that for many of the grants I am recom­mend­ing, it is not usu­ally the case that I think the pro­jects are very good on all the rele­vant di­men­sions; I can see how they fall short of my “ideal” pro­jects. More fre­quently, the pro­jects I fund are among the only available pro­jects in a refer­ence class I be­lieve to be im­por­tant, and I recom­mend them be­cause I want pro­jects of that type to re­ceive more re­sources (and be­cause they pass a mod­er­ate bar for qual­ity).

Some ex­am­ples:

  • Our grant to the Kocherga com­mu­nity space club last round. I see Kocherga as the only promis­ing pro­ject try­ing to build in­fras­truc­ture that helps peo­ple pur­sue pro­jects re­lated to x-risk and ra­tio­nal­ity in Rus­sia.

  • I recom­mended this round’s grant to Miranda partly be­cause I think Miranda’s plans are good and I think her past work in this do­main and oth­ers is of high qual­ity, but also be­cause she is the only per­son who ap­plied with a pro­ject in a do­main that seems promis­ing and ne­glected (us­ing fic­tion to com­mu­ni­cate oth­er­wise hard-to-ex­plain ideas re­lat­ing to x-risk and how to work on difficult prob­lems).

  • In the Novem­ber 2018 grant round, I recom­mended a grant to Or­pheus Lum­mis to run an AI safety un­con­fer­ence in Mon­treal. This is be­cause I think he had a great idea, and would cre­ate a lot of value even if he ran the events only mod­er­ately well. This isn’t the same as be­liev­ing Or­pheus has ex­cel­lent skills in the rele­vant do­main; I can imag­ine other ap­pli­cants who I’d have been more ex­cited to fund, had they ap­plied.

I am, over­all, still very ex­cited about the grants be­low, and I think they are a much bet­ter use of re­sources than what I think of as the most com­mon coun­ter­fac­tu­als to donat­ing to the LTFF fund (e.g. donat­ing to the largest or­ga­ni­za­tions in the space, donat­ing based on time-limited per­sonal re­search) .

How­ever, re­lated to the points I made above, I will have many crit­i­cisms of al­most all the pro­jects that re­ceive fund­ing from us. I think that my crit­i­cisms are valid, but read­ers shouldn’t in­ter­pret them to mean that I have a nega­tive im­pres­sion of the grants we are mak­ing — which are strong de­spite their flaws. Ag­gre­gat­ing my in­di­vi­d­ual (and fre­quently crit­i­cal) recom­men­da­tions will not give read­ers an ac­cu­rate im­pres­sion of my over­all (highly pos­i­tive) view of the grant round.

(If I ever come to think that the pool of valuable grants has dried up, I will say so in a high-level note like this one.)

I can imag­ine that in the fu­ture I might want to in­vest more re­sources into writ­ing up lists of po­ten­tial pro­jects that I would be ex­cited about, though it is also not clear to me that I want peo­ple to op­ti­mize too much for what I am ex­cited about, and think that the cur­rent bal­ance of “things that I think are ex­cit­ing, and that peo­ple feel in­ter­nally mo­ti­vated to do and gen­er­ated their own plans for” seems pretty de­cent.

To fol­low up the above with a high-level as­sess­ment, I am slightly less ex­cited about this round’s grants than I am about last round’s, and I’d es­ti­mate (very roughly) that this round is about 25% less cost-effec­tive than the pre­vi­ous round.


For both this round and the last round, I wrote the write­ups in col­lab­o­ra­tion with Ben Pace, who works with me on LessWrong and the Align­ment Fo­rum. After an ex­ten­sive dis­cus­sion about the grants and the Fund’s rea­son­ing for them, we split the grants be­tween us and in­de­pen­dently wrote ini­tial drafts. We then iter­ated on those drafts un­til they ac­cu­rately de­scribed my think­ing about them and the rele­vant do­mains.

I am also grate­ful for Aaron Gertler’s help with edit­ing and re­fin­ing these write­ups, which has sub­stan­tially in­creased their clar­ity.

Sören Min­der­mann ($36,982)

Ad­di­tional fund­ing to im­prove my re­search pro­duc­tivity dur­ing an AI strat­egy PhD pro­gram at Oxford /​ FHI.
I’m look­ing for ad­di­tional fund­ing to sup­ple­ment my 15k pound/​y PhD stipend for 3-4 years from Septem­ber 2019. I am hop­ing to roughly dou­ble this. My PhD is at Oxford in ma­chine learn­ing, but co-su­per­vised by Allan Dafoe from FHI so that I can fo­cus on AI strat­egy. We will have mul­ti­ple joint meet­ings each month, and I will have a desk at FHI.
The pur­pose is to in­crease my pro­duc­tivity and hap­piness. Given my ex­pected fi­nan­cial situ­a­tion, I cur­rently have to make com­pro­mises on e. g. Ubers, Soylent, eat­ing out with col­leagues, ac­com­mo­da­tion, qual­ity and wait­ing times for health care, spend­ing time com­par­ing prices, travel du­ra­tions and stress, and eat­ing less healthily.
I ex­pect that more fi­nan­cial se­cu­rity would in­crease my own pro­duc­tivity and the effec­tive­ness of the time in­vested by my su­per­vi­sors.

I think that when FHI or other or­ga­ni­za­tions in that refer­ence class have trou­ble do­ing cer­tain things due to lo­gis­ti­cal ob­sta­cles, we should usu­ally step in and fill those gaps (e.g. see Ja­cob Lager­ros’ grant from last round). My sense is that FHI has trou­ble with pro­vid­ing fund­ing in situ­a­tions like this (due to bud­getary con­straints im­posed by Oxford Univer­sity).

I’ve in­ter­acted with Sören in the past (dur­ing my work at CEA), and gen­er­ally have pos­i­tive im­pres­sions of him in a va­ri­ety of do­mains, like his ba­sic think­ing about AI Align­ment, and his gen­eral com­pe­tence from run­ning pro­jects like the EA Newslet­ter.

I have a lot of trust in the judg­ment of Nick Bostrom and sev­eral other re­searchers at FHI. I am not cur­rently very ex­cited about the work at GovAI (the team that Allan Dafoe leads), but still have enough trust in many of the rele­vant de­ci­sion mak­ers to think that it is very likely that So­eren should be sup­ported in his work.

In gen­eral, I think many of the salaries for peo­ple work­ing on ex­is­ten­tial risk are low enough that they have to make ma­jor trade­offs in or­der to deal with the re­sult­ing fi­nan­cial con­straints. I think that in­creas­ing salaries in situ­a­tions like this is a good idea (though I am hes­i­tant about in­creas­ing salaries for other types of jobs, for a va­ri­ety of rea­sons I won’t go into here, but am happy to ex­pand on).

This fund­ing should last for about 2 years of Sören’s time at Oxford.

AI Safety Camp ($41,000)

A re­search ex­pe­rience pro­gram for prospec­tive AI safety re­searchers.
We want to or­ga­nize the 4th AI Safety Camp (AISC) - a re­search re­treat and pro­gram for prospec­tive AI safety re­searchers. Com­pared to past iter­a­tions, we plan to change the for­mat to in­clude a 3 to 4-day pro­ject gen­er­a­tion pe­riod and team for­ma­tion work­shop, fol­lowed by a sev­eral-week pe­riod of on­line team col­lab­o­ra­tion on con­crete re­search ques­tions, a 6 to 7-day in­ten­sive re­search re­treat, and on­go­ing men­tor­ing af­ter the camp. The tar­get ca­pac­ity is 25 − 30 par­ti­ci­pants, with pro­jects that range from tech­ni­cal AI safety (ma­jor­ity) to policy and strat­egy re­search. More in­for­ma­tion about past camps is at https://​​aisafe­ty­camp.com/​​
Early-ca­reer en­try stage seems to be a less well-cov­ered part of the tal­ent pipeline, es­pe­cially in Europe. In­di­vi­d­ual men­tor­ing is costly from the stand­point of ex­pert ad­vi­sors (esp. com­pared to guided team work), while in­tern­ships and e.g. MSFP have limited ca­pac­ity and are US-cen­tric. After the camp, we ad­vise and en­courage par­ti­ci­pants on fu­ture ca­reer steps and help con­nect them to other or­ga­ni­za­tions, or di­rect them to fur­ther in­di­vi­d­ual work and learn­ing if they are pur­su­ing an aca­demic track..
Overviews of pre­vi­ous re­search pro­jects from the first 2 camps can be found here:
1- http://​​bit.ly/​​2FFFcK1
2- http://​​bit.ly/​​2KKjPLB
Pro­jects from AISC3 are still in progress and there is no pub­lic sum­mary.
To eval­u­ate the camp, we send out an eval­u­a­tion form di­rectly af­ter the camp has con­cluded and then in­for­mally fol­low the ca­reer de­ci­sions, pub­li­ca­tions, and other AI safety/​EA in­volve­ment of the par­ti­ci­pants. We plan to con­duct a larger sur­vey from past AISC par­ti­ci­pants later in 2019 to eval­u­ate our mid-term im­pact. We ex­pect to get a more com­pre­hen­sive pic­ture of the im­pact, but it is difficult to eval­u­ate coun­ter­fac­tu­als and in­di­rect effects (e.g. net­work­ing effects). The (anec­do­tal) pos­i­tive ex­am­ples we at­tribute to past camps in­clude the ac­cel­er­a­tion of en­trance of sev­eral peo­ple in the field, re­search out­puts that in­clude 2 con­fer­ence pa­pers, sev­eral SW pro­jects, and about 10 blog­posts.
The main di­rect costs of the camp are the op­por­tu­nity costs of par­ti­ci­pants, or­ga­niz­ers and ad­vi­sors. There are also down­side risks as­so­ci­ated with per­sonal con­flicts at multi-day re­treats and dis­cour­ag­ing ca­pa­ble peo­ple from the field if the camp is run poorly. We ac­tively work to pre­vent this by pro­vid­ing both on-site and ex­ter­nal anony­mous con­tact points, as well as ac­tively at­tend­ing to par­ti­ci­pant well-be­ing, in­clud­ing dur­ing the on­line phases.

This grant is for the AI Safety Camp, to which we made a grant in the last round. Of the grants I recom­mended this round, I am most un­cer­tain about this one. The pri­mary rea­son is that I have not re­ceived much ev­i­dence about the perfor­mance of ei­ther of the last two camps [1], and I as­sign at least some prob­a­bil­ity that the camps are not fa­cil­i­tat­ing very much good work. (This is mostly be­cause I have low ex­pec­ta­tions for the qual­ity of most work of this kind and haven’t looked closely enough at the camp to over­ride these — not be­cause I have pos­i­tive ev­i­dence that they pro­duce low-qual­ity work.)

My biggest con­cern is that the camps do not provide a suffi­cient level of feed­back and men­tor­ship for the at­ten­dees. When I try to pre­dict how well I’d ex­pect a re­search re­treat like the AI Safety Camp to go, much of the im­pact hinges on putting at­ten­dees into con­tact with more ex­pe­rienced re­searchers and hav­ing a good men­tor­ing setup. Some of the prob­lems I have with the out­put from the AI Safety Camp seem like they could be ex­plained by a lack of men­tor­ship.

From the ev­i­dence I ob­serve on their web­site, I see that the at­ten­dees of the sec­ond camp all pro­duced an ar­ti­fact of their re­search (e.g. an aca­demic writeup or code repos­i­tory). I think this is a very pos­i­tive sign. That said, it doesn’t look like any al­ign­ment re­searchers have com­mented on any of this work (this may in part have been be­cause most of it was pre­sented in for­mats that re­quire a lot of time to en­gage with, such as GitHub repos­i­to­ries), so I’m not sure the out­put ac­tu­ally lead to the par­ti­ci­pants to get any feed­back on their re­search di­rec­tions, which is one of the most im­por­tant things for peo­ple new to the field.

After some fol­lowup dis­cus­sion with the or­ga­niz­ers, I heard about changes to the up­com­ing camp (the tar­get of this grant) that ad­dress some of the above con­cerns (in­de­pen­dent of my feed­back). In par­tic­u­lar, the camp is be­ing re­named to “AI Safety Re­search Pro­gram”, and is now split into two parts — a topic se­lec­tion work­shop and a re­search re­treat, with ex­pe­rienced AI Align­ment re­searchers at­tend­ing the work­shop. The for­mat change seems likely to be a good idea, and makes me more op­ti­mistic about this grant.

I gen­er­ally think hackathons and re­treats for re­searchers can be very valuable, al­low­ing for fo­cused think­ing in a new en­vi­ron­ment. I think the AI Safety Camp is held at a rel­a­tively low cost, in a part of the world (Europe) where there ex­ist few other op­por­tu­ni­ties for po­ten­tial new re­searchers to spend time think­ing about these top­ics, and some promis­ing peo­ple have at­tended. I hope that the camps are go­ing well, but I will not fund an­other one with­out spend­ing sig­nifi­cantly more time in­ves­ti­gat­ing the pro­gram.


[1] After sign­ing off on this grant, I found out that, due to over­lap be­tween the or­ga­niz­ers of the events, some feed­back I got about this camp was ac­tu­ally feed­back about the Hu­man Aligned AI Sum­mer School, which means that I had even less in­for­ma­tion than I thought. In April I said I wanted to talk with the or­ga­niz­ers be­fore re­new­ing this grant, and I ex­pected to have at least six months be­tween ap­pli­ca­tions from them, but we re­ceived an­other ap­pli­ca­tion this round and I ended up not hav­ing time for that con­ver­sa­tion.

Miranda Dixon-Luinen­burg ($13,500)

Writ­ing EA-themed fic­tion that ad­dresses X-risk top­ics.
I want to spend three months eval­u­at­ing my abil­ity to pro­duce an origi­nal work that ex­plores ex­is­ten­tial risk, ra­tio­nal­ity, EA, and re­lated themes such as co­or­di­na­tion be­tween peo­ple with differ­ent be­liefs and back­grounds, han­dling burnout, plan­ning on long timescales, growth mind­set, etc. I pre­dict that com­plet­ing a high-qual­ity novel of this type would take ~12 months, so 3 months is just an ini­tial test.
In 3 months, I would hope to pro­duce a de­tailed out­line of an origi­nal work plus sev­eral com­pleted chap­ters. Si­mul­ta­neously, I would be eval­u­at­ing whether writ­ing full-time is a good fit for me in terms of mo­ti­va­tion and per­sonal wellbe­ing.
I have spent the last 2 years writ­ing an EA-themed fan­fic­tion of The Last Her­ald-Mage tril­ogy by Mercedes Lackey (on­line at https://​​archive­ofourown.org/​​se­ries/​​936480). In this pe­riod I have com­pleted 9 “books” of the se­ries, to­tal­ling 1.2M words (av­er­age of 60K words/​​month), mostly while I was also work­ing full-time. (I am cur­rently writ­ing the fi­nal arc, and when I finish, hope to cre­ate a shorter abridged/​​ed­ited ver­sion with a more solid be­gin­ning and bet­ter pac­ing over­all.)
In the writ­ing pro­cess, I re­searched key back­ground top­ics, in par­tic­u­lar AI safety work (I read a num­ber of Ar­bital ar­ti­cles and most of this MIRI pa­per on de­ci­sion the­ory: https://​​arxiv.org/​​pdf/​​1710.05060v1.pdf), as well as ethics, men­tal health, or­ga­ni­za­tional best prac­tices, me­dieval his­tory and eco­nomics, etc. I have ac­cu­mu­lated a very ded­i­cated group of around 10 beta read­ers, all EAs, who read early drafts of each sec­tion and give feed­back on how well it ad­dresses var­i­ous top­ics, which gives me more con­fi­dence that I am por­tray­ing these con­cepts ac­cu­rately.

One nat­u­ral de­com­po­si­tion of whether this grant is a good idea is to first ask whether writ­ing fic­tion of this type is valuable, then whether Miranda is ca­pa­ble of ac­tu­ally cre­at­ing that type of fic­tion, and last whether fund­ing Miranda will make a sig­nifi­cant differ­ence in the amount/​qual­ity of her fic­tion.

I think that many peo­ple read­ing this will be sur­prised or con­fused about this grant. I feel fairly con­fi­dent that grants of this type are well worth con­sid­er­ing, and I am in­ter­ested in fund­ing more pro­jects like this in the fu­ture, so I’ve tried my best to sum­ma­rize my rea­son­ing. I do think there are some good ar­gu­ments for why we should be hes­i­tant to do so (partly sum­ma­rized by the sec­tion be­low that lists things that I think fic­tion doesn’t do as well as non-fic­tion), so while I think that grants like this are quite im­por­tant, and have the po­ten­tial to do a sig­nifi­cant amount of good, I can imag­ine chang­ing my mind about this in the fu­ture.

The track record of fiction

In a gen­eral sense, I think that fic­tion has a pretty strong track record of both be­ing suc­cess­ful at con­vey­ing im­por­tant ideas, and be­ing a good at­trac­tor of tal­ent and other re­sources. I also think that good fic­tion is of­ten nec­es­sary to es­tab­lish shared norms and shared lan­guage.

Here are some ex­am­ples of com­mu­ni­ties and in­sti­tu­tions that I think used fic­tion very cen­trally in their func­tion. Note that af­ter the first ex­am­ple, I am mak­ing no claim that the effect was good, I’m just es­tab­lish­ing the mag­ni­tude of the po­ten­tial effect size.

  • Harry Pot­ter and the Meth­ods of Ra­tion­al­ity (HPMOR) was in­stru­men­tal in the growth and de­vel­op­ment of both the EA and Ra­tion­al­ity com­mu­ni­ties. It is very likely the sin­gle most im­por­tant re­cruit­ment mechanism for pro­duc­tive AI al­ign­ment re­searchers, and has also drawn many other peo­ple to work on the broader aims of the EA and Ra­tion­al­ity com­mu­ni­ties.

  • Fic­tion was a core part of the strat­egy of the ne­oliberal move­ment; fic­tion writ­ers were among the groups referred to by Hayek as “sec­ond­hand deal­ers in ideas.” An ex­am­ple of some­one whose fic­tion played both a large role in the rise of ne­oliber­al­ism and in its even­tual spread would be Ayn Rand.

  • Al­most ev­ery ma­jor re­li­gion, cul­ture and na­tion-state is built on shared myths and sto­ries, usu­ally fic­tional (though the sto­ries are of­ten held to be true by the groups in ques­tion, mak­ing this data point a bit more con­fus­ing).

  • Fran­cis Ba­con’s (un­finished) utopian novel “The New At­lantis” is of­ten cited as the pri­mary in­spira­tion for the found­ing of the Royal So­ciety, which may have been the sin­gle in­sti­tu­tion with the great­est in­fluence on the progress of the sci­en­tific rev­olu­tion.

On a more con­cep­tual level, I think fic­tion tends to be par­tic­u­larly good at achiev­ing the fol­low­ing aims (com­pared to non-fic­tion writ­ing):

  • Teach­ing low-level cog­ni­tive pat­terns by dis­play­ing char­ac­ters that fol­low those pat­terns, al­low­ing the reader to learn from very con­crete ex­am­ples set in a fic­tional world. (Com­pare Ae­sop’s Fables to some non­fic­tion book of moral pre­cepts — it can be much eas­ier to re­mem­ber good habits when we at­tach them to char­ac­ters.)

  • Estab­lish­ing norms, by hav­ing sto­ries that dis­play the con­se­quences of not fol­low­ing cer­tain norms, and the re­wards of fol­low­ing them in the right way

  • Estab­lish­ing a com­mon lan­guage, by not only ex­plain­ing con­cepts, but also show­ing con­cepts as they are used, and how they are brought up in con­ver­sa­tional context

  • Estab­lish­ing com­mon goals, by cre­at­ing con­crete utopian vi­sions of pos­si­ble fu­tures that mo­ti­vate peo­ple to work to­wards them together

  • Reach­ing a broader au­di­ence, since we nat­u­rally find sto­ries more ex­cit­ing than ab­stract de­scrip­tions of concepts

(I wrote in more de­tail about how this works for HPMOR in the last grant round.)

In con­trast, here are some things that fic­tion is gen­er­ally worse at (though a lot of these de­pend on con­text; since fic­tion of­ten con­tains em­bed­ded non-fic­tion ex­pla­na­tions, some of these can be over­come):

  • Care­fully eval­u­at­ing ideas, in par­tic­u­lar when eval­u­at­ing them re­quires em­piri­cal data. There is a norm against show­ing graphs or ta­bles in fic­tion books, mak­ing any ex­pla­na­tion that rests on that kind of data difficult to ac­cess in fic­tion.

  • Con­vey­ing pre­cise tech­ni­cal definitions

  • En­gag­ing in di­alogue with other writ­ers and researchers

  • Deal­ing with top­ics in which read­ers tend to come to bet­ter con­clu­sions by men­tally dis­tanc­ing them­selves from the prob­lem at hand, in­stead of en­gag­ing with con­crete visceral ex­am­ples (I think some eth­i­cal top­ics like the trol­ley prob­lem qual­ify here, as well as prob­lems that re­quire math­e­mat­i­cal con­cepts that don’t neatly cor­re­spond to easy real-world ex­am­ples)

Over­all, I think cur­rent writ­ing about both ex­is­ten­tial risk, ra­tio­nal­ity, and effec­tive al­tru­ism skews too much to­wards non-fic­tion, so I’m ex­cited about ex­per­i­ment­ing with fund­ing fic­tion writ­ing.

Miranda’s writing

The sec­ond ques­tion is whether I trust Miranda to ac­tu­ally be able to write fic­tion that lev­er­ages these op­por­tu­ni­ties and pro­vides value. This is why I think Miranda can do a good job:

  • Her cur­rent fic­tion pro­ject is read by a few peo­ple whose taste I trust, and many of them de­scribe hav­ing de­vel­oped valuable skills or in­sights as a re­sult (for ex­am­ple, bet­ter skills for crisis man­age­ment, a bet­ter con­cep­tion of moral philos­o­phy, an im­proved moral com­pass, and some in­sights about de­ci­sion the­ory)

  • She wrote fre­quently on LessWrong and her blog for a few years, pro­duc­ing con­tent of con­sis­tently high qual­ity that, while not fic­tional, of­ten dis­played some of the same use­ful prop­er­ties as fic­tion writ­ing.

  • I’ve seen her ex­e­cute a large va­ri­ety of difficult pro­jects out­side of her writ­ing, which means I am a lot more op­ti­mistic about things like her abil­ity to mo­ti­vate her­self on this pro­ject, and ex­cel­ling in the non-writ­ing as­pects of the work (e.g. pro­mot­ing her fic­tion to au­di­ences be­yond the EA and ra­tio­nal­ity com­mu­ni­ties)

  • She worked in op­er­a­tions at CEA and re­ceived strong re­views from her cowork­ers

  • She helped CFAR run the op­er­a­tions for SPARC in two con­sec­u­tive years and performed well as a lo­gis­tics vol­un­teer for 11 of their other work­shops

  • I’ve seen her or­ga­nize var­i­ous events and provide use­ful help with lo­gis­tics and gen­eral prob­lem-solv­ing on a large num­ber of oc­ca­sions

My two biggest con­cerns are:

  • Miranda los­ing mo­ti­va­tion to work on this pro­ject, be­cause writ­ing fic­tion with a spe­cific goal re­quires a sig­nifi­cantly differ­ent mo­ti­va­tion than do­ing it for per­sonal enjoyment

  • The fic­tion be­ing well-writ­ten and en­gag­ing, but failing to ac­tu­ally help peo­ple bet­ter un­der­stand the im­por­tant is­sues it tries to cover.

I like the fact that this grant is for an ex­plo­ra­tory 3 months rather than a longer pe­riod of time; this al­lows Miranda to pivot if it doesn’t work out, rather than be­ing tied to a pro­ject that isn’t go­ing well.

The coun­ter­fac­tual value of funding

It would be rea­son­able to ask whether a grant is re­ally nec­es­sary, given that Miranda has pro­duced a huge amount of fic­tion in the last two years with­out re­ceiv­ing fund­ing ex­plic­itly ded­i­cated to that. I have two thoughts here:

  1. I gen­er­ally think that we should avoid de­clin­ing to pay peo­ple just be­cause they’d be will­ing to do valuable work for free. It seems good to re­ward peo­ple for work even if this doesn’t make much of a differ­ence in the qual­ity/​con­sis­tency of the work, be­cause I ex­pect this promise of re­ward to help peo­ple build long-term mo­ti­va­tion and en­courage ex­plo­ra­tion.

  2. To ex­plain this a bit more, I think this grant will help other peo­ple build mo­ti­va­tion to­wards pur­su­ing similar pro­jects in the fu­ture, by set­ting a prece­dent for po­ten­tial fund­ing in this space. For ex­am­ple, I think the pos­si­bil­ity of fund­ing (and recog­ni­tion) was also a mo­ti­va­tor for Miranda in start­ing to work on this pro­ject.

  3. I ex­pect this grant to have a sig­nifi­cant effect on Miranda’s pro­duc­tivity, be­cause I think that there is of­ten a qual­i­ta­tive differ­ence be­tween work some­one pro­duces in their spare time and work that some­one can fo­cus full-time on. In par­tic­u­lar, I ex­pect this grant to cause Miranda’s work to im­prove in the di­men­sions that she doesn’t nat­u­rally find very stim­u­lat­ing, which I ex­pect will in­clude edit­ing, re­struc­tur­ing, and other forms of “pol­ish”.

David Man­heim ($30,000)

Multi-model ap­proach to cor­po­rate and state ac­tors rele­vant to ex­is­ten­tial risk miti­ga­tion.
Work for 2-3 months on con­tin­u­ing to build out a multi-model ap­proach to un­der­stand­ing in­ter­na­tional re­la­tions and multi-stake­holder dy­nam­ics as it re­lates to risks of strong(er) AI sys­tems de­vel­op­ment, based on and ex­tend­ing similar work done on biolog­i­cal weapons risks done on be­half of FHI’s Biorisk group and sup­port­ing Open Philan­thropy Pro­ject plan­ning.
This work is likely to help policy and de­ci­sion anal­y­sis for effec­tive al­tru­ism re­lated to the deeply un­cer­tain and com­plex is­sues in in­ter­na­tional re­la­tions and long term plan­ning that need to be con­sid­ered for many ex­is­ten­tial risk miti­ga­tion ac­tivi­ties. While the pro­ject is fo­cused on un­der­stand­ing ac­tors and mo­ti­va­tions in the short term, the de­ci­sions be­ing sup­ported are ex­actly those that are crit­i­cal for ex­is­ten­tial risk miti­ga­tion, with long term im­pli­ca­tions for the fu­ture.

I feel a lot of skep­ti­cism to­ward much of the work done in the aca­demic study of in­ter­na­tional re­la­tions. Judg­ing from my mod­els of poli­ti­cal in­fluence and its effects on the qual­ity of in­tel­lec­tual con­tri­bu­tions, and my mod­els of re­search fields with lit­tle abil­ity to perform ex­per­i­ments, I have high pri­ors that work in in­ter­na­tional re­la­tions is of sig­nifi­cantly lower qual­ity than in most sci­en­tific fields. How­ever, I have en­gaged rel­a­tively lit­tle with ac­tual re­search on the topic of in­ter­na­tional re­la­tions (out­side of un­usual schol­ars like Nick Bostrom) and so am hes­i­tant in my judge­ment here.

I also have a fair bit of worry around biorisk. I haven’t re­ally had the op­por­tu­nity to en­gage with a good case for it, and nei­ther have many of the peo­ple I would trust most in this space, in large part due to se­crecy con­cerns from peo­ple who work on it (more on that be­low). Due to this, I am wor­ried about in­for­ma­tion cas­cades. (An in­for­ma­tion cas­cade is a situ­a­tion where peo­ple pri­mar­ily share what they be­lieve but not why, and be­cause peo­ple up­date on each oth­ers’ be­liefs you end up with a lot of peo­ple all be­liev­ing the same thing pre­cisely be­cause ev­ery­one else does.)

I think is valuable to work on biorisk, but this view is mostly based on in­di­vi­d­ual con­ver­sa­tions that are hard to sum­ma­rize, and I feel un­com­fortable with my level of un­der­stand­ing of pos­si­ble in­ter­ven­tions, or even just con­cep­tual frame­works I could use to ap­proach the prob­lem. I don’t know how most peo­ple who work in this space came to de­cide it was im­por­tant, and those I’ve spo­ken to have usu­ally been re­luc­tant to share de­tails in con­ver­sa­tion (e.g. about spe­cific dis­cov­er­ies they think cre­ated risk, or types of ar­gu­ments that con­vinced them to fo­cus on biorisk over other threats).

I’m broadly sup­port­ive of work done at places like FHI and by the peo­ple at OpenPhil who care about x-risks, so I am in fa­vor of fund­ing their work (e.g. Soren’s grant above). But I don’t feel as though I can defer to the peo­ple work­ing in this do­main on the ob­ject level when there is so much se­crecy around their epistemic pro­cess, be­cause I and oth­ers can­not eval­u­ate their rea­son­ing.

How­ever, I am ex­cited about this grant, be­cause I have a good amount of trust in David’s judg­ment. To be more spe­cific, he has a track record of iden­ti­fy­ing im­por­tant ideas and in­sti­tu­tions and then work­ing on/​with them. Some con­crete ex­am­ples in­clude:

  • Wrote up a pa­per on Good­hart’s Law with Scott Garrabrant (af­ter see­ing Scott’s very terse post on it)

  • Works with the biorisk teams at FHI and OpenPhil

  • Com­pleted his PhD in pub­lic policy and de­ci­sion the­ory at the RAND Cor­po­ra­tion, which is an un­usu­ally in­no­va­tive in­sti­tu­tion (e.g. this study);

  • Writes in­ter­est­ing com­ments and blog posts on the in­ter­net (e.g. LessWrong)

  • Has offered men­tor­ing in his fields of ex­per­tise to other peo­ple work­ing or prepar­ing to work pro­jects in the x-risk space; I’ve heard pos­i­tive feed­back from his mentees

Another ma­jor fac­tor for me is the de­gree to which David is shares his think­ing openly and trans­par­ently on the in­ter­net, and par­ti­ci­pates in pub­lic dis­course, so that other peo­ple in­ter­ested in these top­ics can en­gage with his ideas. (He’s also a su­perfore­caster, which I think is pre­dic­tive of broadly good judg­ment.) If David didn’t have this track record of pub­lic dis­course, I likely wouldn’t be recom­mend­ing this grant, and if he sud­denly stopped par­ti­ci­pat­ing, I’d be fairly hes­i­tant to recom­mend such a grant in the fu­ture.

As I said, I’m not ex­cited about the spe­cific pro­ject he is propos­ing, but have trust in his sense of which pro­jects might be good to work on, and I have em­pha­sized to him that I think he should feel com­fortable work­ing on the pro­jects he thinks are best. I strongly pre­fer a world where David has the free­dom to work on the pro­jects he judges to be most valuable, com­pared to the world where he has to take un­re­lated jobs (e.g. teach­ing at uni­ver­sity).

Joar Skalse ($10,000)

Up­skil­ling in ML in or­der to be able to do pro­duc­tive AI safety re­search sooner than oth­er­wise.
I am re­quest­ing grant money to up­skill in ma­chine learn­ing (ML).
Back­ground: I am an un­der­grad­u­ate stu­dent in Com­puter Science and Philos­o­phy at Oxford Univer­sity, about to start the 4th year of a 4-year de­gree. I plan to do re­search in AI safety af­ter I grad­u­ate, as I deem this to be the most promis­ing way of hav­ing a sig­nifi­cant pos­i­tive im­pact on the long-term fu­ture
What I’d like to do:
I would like to im­prove my skills in ML by read­ing liter­a­ture and re­search, repli­cat­ing re­search pa­pers, build­ing ML-based sys­tems, and so on.
To do this effec­tively, I need ac­cess to the com­pute that is re­quired to train large mod­els and run lengthy re­in­force­ment learn­ing ex­per­i­ments and similar.
It would also likely be very benefi­cial if I could live in Oxford dur­ing the va­ca­tions, as I would then be in an en­vi­ron­ment in which it is eas­ier to be pro­duc­tive. It would also make it eas­ier for me to speak with the re­searchers there, and give me ac­cess to the fa­cil­ities of the uni­ver­sity (in­clud­ing libraries, etc.).
It would also be use­ful to be able to at­tend con­fer­ences and similar events.

Joar was one of the co-au­thors on the Mesa-Op­ti­misers pa­per, which I found sur­pris­ingly use­ful and clearly writ­ten, es­pe­cially con­sid­er­ing that its au­thors had rel­a­tively lit­tle back­ground in al­ign­ment re­search or re­search in gen­eral. I think it is prob­a­bly the sec­ond most im­por­tant piece of writ­ing on AI al­ign­ment that came out in the last 12 months, af­ter the Embed­ded Agency se­quence. My cur­rent best guess is that this type of con­cep­tual clar­ifi­ca­tion /​ de­con­fu­sion is the most im­por­tant type of re­search in AI al­ign­ment, and the type of work I’m most in­ter­ested in fund­ing. While I don’t know ex­actly how Joar con­tributed to the pa­per, my sense is that all the au­thors put in a sig­nifi­cant effort (bar Scott Garrabrant, who played a su­per­vis­ing role).

This grant is for pro­jects dur­ing and in be­tween terms at Oxford. I want to sup­port Joar pro­duc­ing more of this kind of re­search, which I ex­pect this grant to help with. He’s also been writ­ing fur­ther thoughts on­line (ex­am­ple), which I think has many pos­i­tive effects (per­son­ally and as ex­ter­nal­ities).

My brief thoughts on the pa­per (non­tech­ni­cal):

  • The pa­per in­tro­duced me to a lot of of ter­minol­ogy that I’ve con­tinued to use over the past few months (which is not true for most ter­minol­ogy in­tro­duced in this space)

  • It helped me de­con­fuse my think­ing on a bunch of con­crete prob­lems (in par­tic­u­lar on the ques­tion of whether things like Alpha Go can be dan­ger­ous when “scaled up”)

  • I’ve seen mul­ti­ple other re­searchers and thinkers I re­spect re­fer to it positively

  • In ad­di­tion to be­ing pub­lished as a pa­per, it was writ­ten up as a se­ries of blog­posts in a way that made it a lot more accessible

More of my thoughts on the pa­per (tech­ni­cal):

Note: If you haven’t read the pa­per, or you don’t have other back­ground in the sub­ject, this sec­tion will likely be un­clear. It’s not es­sen­tial to the case for the grant, but I wanted to share it in case peo­ple with the req­ui­site back­ground are in­ter­ested in more de­tails about the research

I was sur­prised by how helpful the con­cep­tual work in the pa­per was—helping me think about where the op­ti­miza­tion was hap­pen­ing in a sys­tem like AlphaGo Zero im­proved my un­der­stand­ing of that sys­tem and how to con­nect it to other sys­tems that do op­ti­miza­tion in the world. The pri­mary for­mal­ism in the pa­per was clar­ify­ing rather than ob­scur­ing (and the ra­tio of in­sight to for­mal­ism was very high—see my ad­den­dum be­low for more thoughts on that).

Once the ba­sic con­cepts were in place, clar­ify­ing differ­ent ba­sic tools that would en­courage op­ti­miza­tion to hap­pen in ei­ther the base op­ti­mizer or the mesa op­ti­mizer (e.g. con­strain­ing and ex­pand­ing space/​time offered to the base or mesa op­ti­miz­ers has in­ter­est­ing effects), plus clar­ify­ing the types of al­ign­ment /​ pseudo-al­ign­ment /​ in­ter­nal­iz­ing of the base ob­jec­tive, all helped me think about this is­sue very clearly. It largely used ba­sic tech­ni­cal lan­guage I already knew, and put it to­gether in ways that would’ve taken me many months to achieve on my own—a very helpful con­cep­tual piece of work.

Note on the write­ups for Chris, Jess, and Lynette

The fol­low­ing three grants were more ex­cit­ing to one or more other fund man­agers than they were to me (Oliver). I think that for all three, if it had just been me on the grant com­mit­tee, we might have not ac­tu­ally made them. How­ever, I had more re­sources available to in­vest into these write­ups, and as such I ended up sum­ma­riz­ing my view on them, in­stead of some­one else on the fund do­ing so. As such, they are prob­a­bly less rep­re­sen­ta­tive of the rea­sons for why we made these grants than the write­ups above.

In the course of think­ing through these grants, I formed (and wrote out be­low) more de­tailed, ex­plicit mod­els of the top­ics. Although these mod­els were not coun­ter­fac­tual in the Fund’s mak­ing the grants, I think they are fairly pre­dic­tive of my fu­ture grant recom­men­da­tions.

Chris Cham­bers ($36,635)

Note: Ap­pli­ca­tion sent in by Ja­cob Hil­ton.

Com­bat pub­li­ca­tion bias in sci­ence by pro­mot­ing and sup­port­ing the Registered Re­ports jour­nal for­mat
I’m sug­gest­ing a grant to fund a teach­ing buy­out for Pro­fes­sor Chris Cham­bers, an aca­demic at the Univer­sity of Cardiff work­ing to pro­mote and sup­port Registered Re­ports. This fund­ing op­por­tu­nity was origi­nally iden­ti­fied and re­searched by Hauke Hille­brandt, who pub­lished a full anal­y­sis here. In brief, a Registered Re­port is a for­mat for jour­nal ar­ti­cles where peer re­view and ac­cep­tance de­ci­sions hap­pen be­fore data is col­lected, so that the re­sults are much less sus­cep­ti­ble to pub­li­ca­tion bias. The grant would free Chris of teach­ing du­ties so that he can work full-time on try­ing to get Registered Re­ports to be­come part of main­stream sci­ence, which in­cludes out­reach to jour­nal ed­i­tors and sup­port­ing them through the pro­cess of adopt­ing the for­mat for their jour­nal. More de­tails of Chris’s plans can be found here.
I think the main rea­son for fund­ing this is from a wor­ld­view di­ver­sifi­ca­tion per­spec­tive: I would ex­pect it to broadly im­prove the effi­ciency of sci­en­tific re­search by im­prov­ing the com­mu­ni­ca­tion of nega­tive re­sults, and to en­able peo­ple to make bet­ter-in­formed use of sci­en­tific re­search by re­duc­ing pub­li­ca­tion bias. I would ex­pect these effects to be pri­mar­ily within fields where em­piri­cal tests tend to be use­ful but not always defini­tive, such as clini­cal tri­als (one of Chris’s fo­cus ar­eas), which would have knock-on effects on health.
From an X-risk per­spec­tive, the key ques­tion to an­swer seems to be which tech­nolo­gies differ­en­tially benefit from this grant. I do not have a strong opinion on this, but to quote Brian Wang from a Face­book thread:
In terms of [...] bio-risk, my ini­tial thoughts are that re­pro­ducibil­ity con­cerns in biol­ogy are strongest when it comes to biomedicine, a field that can be broadly viewed as defense-en­abling. By con­trast, I’m not sure that re­pro­ducibil­ity con­cerns hin­der the more fun­da­men­tal, offense-en­abling de­vel­op­ments in biol­ogy all that much (e.g., the fal­ling costs of gene syn­the­sis, the dis­cov­ery of CRISPR).
As for why this par­tic­u­lar in­ter­ven­tion strikes me as a cost-effec­tive way to im­prove sci­ence, it is shovel-ready, it may be the sort of thing that tra­di­tional fund­ing sources would miss, it has been care­fully vet­ted by Hauke, and I thought that Chris seemed thought­ful and in­tel­li­gent from his videoed talk.”

The Let’s Fund re­port linked in the ap­pli­ca­tion played a ma­jor role in my as­sess­ment of the grant, and I prob­a­bly would not have been com­fortable recom­mend­ing this grant with­out ac­cess to that re­port.

Thoughts on Registered Reports

The repli­ca­tion crisis in psy­chol­ogy, and the broad spread of “ca­reer sci­ence,” have made it (to me) quite clear that the method­olog­i­cal foun­da­tions of at least psy­chol­ogy it­self, but pos­si­bly also the broader life-sci­ences, are cre­at­ing a very large vol­ume of false and likely un­re­pro­ducible claims.

This is in large part caused by prob­le­matic in­cen­tives for in­di­vi­d­ual sci­en­tists to en­gage in highly bi­ased re­port­ing and statis­ti­cally du­bi­ous prac­tices.

I think pre­reg­is­tra­tion has the op­por­tu­nity to fix a small but sig­nifi­cant part of this prob­lem, pri­mar­ily by re­duc­ing file-drawer effects. To bor­row an ex­pla­na­tion from the Let’s Fund re­port (lightly ed­ited for clar­ity):

[Pre-reg­is­tra­tion] was in­tro­duced to ad­dress two prob­lems: pub­li­ca­tion bias and an­a­lyt­i­cal flex­i­bil­ity (in par­tic­u­lar out­come switch­ing in the case of clini­cal medicine).
Publi­ca­tion bias, also known as the file drawer prob­lem, refers to the fact that many more stud­ies are con­ducted than pub­lished. Stud­ies that ob­tain pos­i­tive and novel re­sults are more likely to be pub­lished than stud­ies that ob­tain nega­tive re­sults or re­port repli­ca­tions of prior re­sults. The con­se­quence is that the pub­lished liter­a­ture in­di­cates stronger ev­i­dence for find­ings than ex­ists in re­al­ity.
Out­come switch­ing refers to the pos­si­bil­ity of chang­ing the out­comes of in­ter­est in the study de­pend­ing on the ob­served re­sults. A re­searcher may in­clude ten vari­ables that could be con­sid­ered out­comes of the re­search, and — once the re­sults are known — in­ten­tion­ally or un­in­ten­tion­ally se­lect the sub­set of out­comes that show statis­ti­cally sig­nifi­cant re­sults as the out­comes of in­ter­est. The con­se­quence is an in­crease in the like­li­hood that re­ported re­sults are spu­ri­ous by lev­er­ag­ing chance, while nega­tive ev­i­dence gets ig­nored.
This is one of sev­eral re­lated re­search prac­tices that can in­flate spu­ri­ous find­ings when anal­y­sis de­ci­sions are made with knowl­edge of the ob­served data, such as se­lec­tion of mod­els, ex­clu­sion rules and co­vari­ates. Such data-con­tin­gent anal­y­sis de­ci­sions con­sti­tute what has be­come known as P-hack­ing, and pre-reg­is­tra­tion can pro­tect against all of these.
It also effec­tively blinds the re­searcher to the out­come be­cause the data are not col­lected yet and the out­comes are not yet known. This way the re­searcher’s un­con­scious bi­ases can­not in­fluence the anal­y­sis strat­egy

“Registered re­ports” refers to a spe­cific pro­to­col that jour­nals are en­couraged to adopt, which in­te­grates pre­reg­is­tra­tion into the jour­nal ac­cep­tance pro­cess. Illus­trated by this pic­ture (bor­rowed from the Let’s Fund re­port):

Of the many ways to im­ple­ment pre­reg­is­tra­tion prac­tices, I don’t think the one that Cham­bers pro­poses seems ideal, and I can see some flaws with it, but I still think that the qual­ity of clini­cal sci­ence (and po­ten­tially other fields) will sig­nifi­cantly im­prove if more jour­nals adopt the reg­istered re­ports pro­to­col. (Please keep this in mind as you read my con­cerns in the next sec­tion.)

The im­por­tance of band­width con­straints for journals

Cham­bers has the ex­plicit goal of mak­ing all clini­cal tri­als re­quire the use of reg­istered re­ports. That out­come seems po­ten­tially quite harm­ful, and pos­si­bly worse than the cur­rent state of clini­cal sci­ence. (How­ever, since that cur­rent state is very far from “uni­ver­sal reg­istered re­ports,” I am not very wor­ried about this grant con­tribut­ing to that sce­nario.)

The Let’s Fund re­port cov­ers the benefits of pre­reg­is­tra­tion pretty well, so I won’t go into much de­tail here. In­stead, I will men­tion some of my spe­cific con­cerns with the pro­to­col that Cham­bers is try­ing to pro­mote.

From the reg­istered re­ports web­site:

Manuscripts that pass peer re­view will be is­sued an in prin­ci­ple ac­cep­tance (IPA), in­di­cat­ing that the ar­ti­cle will be pub­lished pend­ing suc­cess­ful com­ple­tion of the study ac­cord­ing to the ex­act meth­ods and an­a­lytic pro­ce­dures out­lined, as well as a defen­si­ble and ev­i­dence-bound in­ter­pre­ta­tion of the re­sults.

This seems un­likely to be the best course of ac­tion. I don’t think that the most widely-read jour­nals should only pub­lish repli­ca­tions. The key rea­son is that many sci­en­tific jour­nals are solv­ing a band­width con­straint—shar­ing pa­pers that are worth read­ing, not merely pa­pers that say true things, to help re­searchers keep up to date with new find­ings in their field. A math jour­nal could have pa­pers for ev­ery true math­e­mat­i­cal state­ment, in­clud­ing triv­ial ones, but they in­stead need to fo­cus on true state­ments that are use­ful to sig­nal boost to the math­e­mat­ics com­mu­nity. (Re­lated con­cepts are the trade­off be­tween bias and var­i­ance in Ma­chine Learn­ing, or ac­cu­racy and cal­ibra­tion in fore­cast­ing.)

Ul­ti­mately, from a value of in­for­ma­tion per­spec­tive, it is to­tally pos­si­ble for a study to only be in­ter­est­ing if it finds a pos­i­tive re­sult, and to be un­in­ter­est­ing when an­a­lyzed pre-pub­li­ca­tion from the per­spec­tive of the ed­i­tor. It seems bet­ter to en­courage pre-pub­li­ca­tion, but still take into ac­count the in­for­ma­tion value of a pa­per’s ex­per­i­men­tal re­sults, even if this doesn’t fully pre­vent pub­li­ca­tion bias.

To give a con­crete (and highly sim­plified) ex­am­ple, imag­ine a world where you are try­ing to find an effec­tive treat­ment for a dis­ease. You don’t have great the­ory in this space, so you ba­si­cally have to test 100 plau­si­ble treat­ments. On their own, none of these have a high like­li­hood of be­ing effec­tive, but you ex­pect that at least one of them will work rea­son­ably well.

Cur­rently, you would pre­reg­ister those tri­als (as is re­quired for clini­cal tri­als), and then start perform­ing the stud­ies one by one. Each failure pro­vides rel­a­tively lit­tle in­for­ma­tion (since the prior prob­a­bil­ity was low any­ways), so you are un­likely to be able to pub­lish it in a pres­ti­gious jour­nal, but you can prob­a­bly still pub­lish it some­where. Not many peo­ple would hear about it, but it would be find­able if some­one is look­ing speci­fi­cally for ev­i­dence about the spe­cific dis­ease you are try­ing to treat, or the treat­ment that you tried out. How­ever, find­ing a suc­cess­ful treat­ment is highly valuable in­for­ma­tion which will likely get pub­lished in a jour­nal with a lot of read­ers, caus­ing lots of peo­ple to hear about the po­ten­tial new treat­ment.

In a world with manda­tory reg­istered re­ports, none of these stud­ies will be pub­lished in a high-read­er­ship jour­nal, since jour­nals will be forced to make a de­ci­sion be­fore they know the out­come of a treat­ment. Be­cause all 100 stud­ies are equally un­promis­ing, none are likely to pass the high bar of such a jour­nal, and they’ll wind up in ob­scure pub­li­ca­tions (if they are pub­lished at all) [1]. Thus, even if one of them finds a suc­cess­ful re­sult, few peo­ple will hear about it. High-read­er­ship jour­nals ex­ist in large part to spread news about valuable re­sults in a limited band­width en­vi­ron­ment; this no longer hap­pens in sce­nar­ios of this kind.

Be­cause of dy­nam­ics like this, I think it is very un­likely that any ma­jor jour­nals will ever switch to­wards only pub­lish­ing reg­istered re­port-based stud­ies, even within clini­cal tri­als, since no jour­nal would want to pass up on the op­por­tu­nity to pub­lish a study that has the op­por­tu­nity to rev­olu­tionize the field.

Im­por­tance of se­lect­ing for clarity

Here is the full set of crite­ria that pa­pers are be­ing eval­u­ated by for stage 2 of the reg­istered re­ports pro­cess:

1. Whether the data are able to test the au­thors’ pro­posed hy­pothe­ses by satis­fy­ing the ap­proved out­come-neu­tral con­di­tions (such as qual­ity checks or pos­i­tive con­trols)
2. Whether the In­tro­duc­tion, ra­tio­nale and stated hy­pothe­ses are the same as the ap­proved Stage 1 sub­mis­sion (re­quired)
3. Whether the au­thors ad­hered pre­cisely to the reg­istered ex­per­i­men­tal pro­ce­dures
4. Whether any un­reg­istered post hoc analy­ses added by the au­thors are jus­tified, method­olog­i­cally sound, and in­for­ma­tive
5. Whether the au­thors’ con­clu­sions are jus­tified given the data

The above list is com­pre­hen­sive, and does not in­clude any men­tion of the clar­ity of the au­thors’ writ­ing, the qual­ity/​rigor of the ex­pla­na­tion pro­vided by the pa­per’s method­ol­ogy, or the im­pli­ca­tions of the pa­per’s find­ings on un­der­ly­ing the­ory. (All of these are very im­por­tant to how jour­nals cur­rently eval­u­ate pa­pers.) This means that jour­nals can only filter for those char­ac­ter­is­tics in the first stage of the reg­istered re­ports pro­cess, when large parts of the pa­per haven’t yet been writ­ten. As a re­sult, large parts of the pa­per ba­si­cally have no se­lec­tion ap­plied to them for con­cep­tual clar­ity, as well as thought­ful anal­y­sis of im­pli­ca­tions for fu­ture the­ory, likely re­sult­ing in those qual­ities get­ting worse.

I think the goal of reg­istered re­ports is to split re­search in two halves where you pub­lish two sep­a­rate pa­pers, one that is em­piri­cal, and an­other that is purely the­o­ret­i­cal, which that takes the re­sults of the first pa­per as given and ex­plores their con­se­quences. We already see this split a good amount in physics, in which there ex­ists a pretty sig­nifi­cant di­vide be­tween ex­per­i­men­tal and the­o­ret­i­cal physics, the lat­ter of which rarely performs ex­per­i­ments. I don’t know whether en­courag­ing this split in a given field is a net-im­prove­ment, since I gen­er­ally think that a lot of good sci­ence comes from com­bin­ing the gath­er­ing of good em­piri­cal data with care­ful anal­y­sis and ex­pla­na­tions, and I am par­tic­u­larly wor­ried that the anal­y­sis of the re­sults in pa­pers pub­lished via reg­istered re­ports will be of par­tic­u­larly low-qual­ity, which en­courages the spread of bad ex­pla­na­tions and mis­con­cep­tions which can cause a lot of dam­age (though some of that is definitely offset by re­duc­ing the de­gree to which sci­en­tists can fit hy­pothe­ses post-hoc, due to pre­reg­is­tra­tion). The costs here seem re­lated to Chris Olah’s ar­ti­cle on re­search debt.

Again, I think both of these prob­lems are un­likely to be­come se­ri­ous is­sues, be­cause at most I can imag­ine get­ting to a world where some­thing be­tween 10% and 30% of top jour­nal pub­li­ca­tions in a given field have gone through reg­istered re­ports-based pre­reg­is­tra­tion. I would be deeply sur­prised if there weren’t al­ter­na­tive out­lets for pa­pers that do try to com­bine the gath­er­ing of em­piri­cal data with high-qual­ity ex­pla­na­tions and anal­y­sis.

Failures due to bureaucracy

I should also note clini­cal sci­ence is not some­thing I have spent large amounts of time think­ing about, that I am quite con­cerned about adding more red tape and nec­es­sary lo­gis­ti­cal hur­dles to jump through when reg­is­ter­ing clini­cal tri­als. I have high un­cer­tainty about the effect of reg­istered re­ports on the costs of do­ing small-scale clini­cal ex­per­i­ments, but it seems more likely than not that they will lengthen the re­view pro­cess, and add ad­di­tional method­olog­i­cal con­straints.

(There is also a chance that it will re­duce these bur­dens by giv­ing sci­en­tists feed­back ear­lier in the pro­cess and let­ting them be more cer­tain of the value of run­ning a par­tic­u­lar study. How­ever, this effect seems slightly weaker to me than the ad­di­tional costs, though I am very un­cer­tain about this.)

In the cur­rent sci­en­tific en­vi­ron­ment, run­ning even a sim­ple clini­cal study may re­quire mil­lions of dol­lars of over­head (a re­lated ex­am­ple is de­tailed in Scott Alexan­der’s “My IRB night­mare”). I be­lieve this bar­rier is a sub­stan­tial drag on progress in med­i­cal sci­ence. In this con­text, I think that re­quiring even more manda­tory doc­u­men­ta­tion, and adding even more up­front costs, seems very costly. (Though again, it seems highly un­likely for the reg­istered re­ports for­mat to ever be­come manda­tory on a large scale, and giv­ing more re­searchers the op­tion to pub­lish a study via the reg­istered re­ports pro­to­col, de­pend­ing on their lo­cal trade­offs, seems likely net-pos­i­tive)

To sum­ma­rize these three points:

  • If jour­nals have to com­mit to pub­lish­ing stud­ies, it’s not ob­vi­ous to me that this is good, given that they would have to do so with­out ac­cess to im­por­tant in­for­ma­tion (e.g. whether a sur­pris­ing re­sult was found) and only a limited num­ber of slots for pub­lish­ing pa­pers.

  • It seems quite im­por­tant for jour­nals to be able to se­lect pa­pers based on the clar­ity of their ex­pla­na­tions, both for ease of com­mu­ni­ca­tion and for con­cep­tual re­fine­ment.

  • Ex­ces­sive red tape in clini­cal re­search seems like one of the main prob­lems with med­i­cal sci­ence to­day, so adding more is wor­ry­ing, though the sign of the reg­istered re­ports pro­to­col on this is a bit ambigious

Differ­en­tial tech­nolog­i­cal progress

Let’s Fund cov­ers differ­en­tial tech­nolog­i­cal progress con­cerns in their writeup. Key quote:

One might worry that fund­ing meta-re­search in­dis­crim­i­nately speeds up all re­search, in­clud­ing re­search which car­ries a lot of risks. How­ever, for the above rea­sons, we be­lieve that meta-re­search im­proves pre­dom­i­nantly so­cial sci­ence and ap­plied clini­cal sci­ence (“p-value sci­ence’) and so has a strong differ­en­tial tech­nolog­i­cal de­vel­op­ment el­e­ment, that hope­fully makes so­ciety wiser be­fore more risks from tech­nol­ogy emerge through in­no­va­tion. How­ever, there are some re­pro­ducibil­ity con­cerns in harder sci­ences such as ba­sic biolog­i­cal re­search and high en­ergy physics that might be sped up by meta-re­search and thus carry risks from emerg­ing tech­nolo­gies[110].

My sense is that fur­ther progress in so­ciol­ogy and psy­chol­ogy seems net pos­i­tive from a global catas­trophic risk re­duc­tion per­spec­tive. The case for clini­cal sci­ence seems a bit weaker, but still pos­i­tive.

In gen­eral, I am more ex­cited about this grant in wor­lds in which global catas­tro­phes are less im­me­di­ate and less likely than my usual mod­els sug­gest, and I’m think­ing of this grant in some sense as a hedg­ing bet, in case we live in one of those wor­lds.

Over­all, a rea­son­able sum­mary of my po­si­tion on this grant would be “I think pre­reg­is­tra­tion helps, but is prob­a­bly not re­ally at­tack­ing the core is­sues in sci­ence. I think this grant is good, be­cause I think it ac­tu­ally makes pre­reg­is­tra­tion a pos­si­bil­ity in a large num­ber of jour­nals, though I dis­agree with Chris Cham­bers on whether it would be good for all clini­cal tri­als to re­quire pre­reg­is­tra­tion, which I think would be quite bad. On the mar­gin, I sup­port his efforts, but if I ever come to change my mind about this, it’s likely for one or more of the above rea­sons.”


[1] The jour­nal could also pub­lish a ran­dom sub­set, though at scale that gives rise to the same dy­nam­ics, so I’ll ig­nore that case. It could also batch a large num­ber of the ex­per­i­ments un­til the ex­pected value of in­for­ma­tion is above the rele­vant thresh­old, though that sig­nifi­cantly in­creases costs.

Jess Whit­tle­stone ($75,080)

Note: Fund­ing from this grant will go to the Lev­er­hulme Cen­tre for the Fu­ture of In­tel­li­gence, which will fund Jess in turn. The LTF Fund is not re­plac­ing fund­ing that CFI would have sup­plied in­stead; with­out this grant, Jess would need to pur­sue grants from sources out­side CFI.

Re­search on the links be­tween short- and long-term AI policy while skil­ling up in tech­ni­cal ML.
I’m ap­ply­ing for fund­ing to cover my salary for a year as a post­doc at the Lev­er­hulme CFI, en­abling me to do two things:
-- Re­search the links be­tween short- and long-term AI policy. My plan is to start broad: think­ing about how to ap­proach, frame and pri­ori­tise work on ‘short-term’ is­sues from a long-term per­spec­tive, and then fo­cus­ing in on a more spe­cific is­sue. I en­vi­sion two main out­puts (pa­pers/​re­ports): (1) re­fram­ing var­i­ous as­pects of ‘short-term’ AI policy from a long-term per­spec­tive (e.g. high­light­ing ways that ‘short-term’ is­sues could have long-term con­se­quences, and ways of work­ing on AI policy to­day most likely to have a long-run im­pact); (2) tack­ling a spe­cific is­sue in ‘short-term’ AI policy with pos­si­ble long-term con­se­quences (tbd, but an ex­am­ple might be the pos­si­ble im­pact of micro­tar­get­ing on democ­racy and epistemic se­cu­rity as AI ca­pa­bil­ities ad­vance).
-- Skill up in tech­ni­cal ML by tak­ing courses from the Cam­bridge ML mas­ters.
Most work on long-term im­pacts of AI fo­cuses on is­sues aris­ing in the fu­ture from AGI. But is­sues aris­ing in the short term may have long-term con­se­quences: ei­ther by di­rectly lead­ing to ex­treme sce­nar­ios (e.g. au­to­mated surveillance lead­ing to au­thor­i­tar­i­anism), or by un­der­min­ing our ca­pa­bil­ity to deal with other threats (e.g. dis­in­for­ma­tion un­der­min­ing col­lec­tive de­ci­sion-mak­ing). Policy work to­day will also shape how AI gets de­vel­oped, de­ployed and gov­erned, and what is­sues will arise in the fu­ture. We’re at a par­tic­u­larly good time to in­fluence the fo­cus of AI policy, with many coun­tries de­vel­op­ing AI strate­gies and new re­search cen­tres emerg­ing.
There’s very lit­tle rigor­ous think­ing the best way to do short-term AI policy from a long-term per­spec­tive. My aim is to change that, and in do­ing so im­prove the qual­ity of dis­course in cur­rent AI policy. I would start with a fo­cus on in­fluenc­ing UK AI policy, as I have ex­pe­rience and a strong net­work here (e.g. the CDEI and Office for AI). Since Deep­Mind is in the UK, I think it is worth at least some peo­ple fo­cus­ing on UK in­sti­tu­tions. I would also en­sure this re­search was broadly rele­vant, by col­lab­o­rat­ing with groups work­ing on US AI policy (e.g. FHI, CSET and OpenAI).
I’m also ask­ing for a time buy­out to skill up in ML (~30%). This would im­prove my own abil­ity to do high-qual­ity re­search, by helping me to think clearly about how is­sues might evolve as ca­pa­bil­ities ad­vance, and how tech­ni­cal and policy ap­proaches can best com­bine to in­fluence the fu­ture im­pacts of AI.

The main work I know of Jess’s is her early in­volve­ment in 80,000 Hours. In the first 1-2 years of their ex­is­tence, she wrote dozens of ar­ti­cles for them, and con­tributed to their cul­ture and de­vel­op­ment. Since then I’ve seen her make pos­i­tive con­tri­bu­tions to a num­ber of pro­jects over the years—she has helped in some form with ev­ery EA Global con­fer­ence I’ve or­ga­nized (two in 2015 and one in 2016), and she’s con­tinued to write pub­li­cly in places like the EA Fo­rum, the EA Hand­book, and news sites like Quartz and Vox. This back­ground im­plies that Jess has had a lot of op­por­tu­ni­ties for mem­bers of the fund to judge her out­put. My sense is that this is the main rea­son that the other mem­bers of the fund were ex­cited about this grant — they gen­er­ally trust Jess’s judg­ment and value her ex­pe­rience (while be­ing more hes­i­tant about CFI’s work).

There are three things I looked into for this grant writeup: Jess’s policy re­search out­put, Jess’s blog, and the in­sti­tu­tional qual­ity of Lev­er­hulme CFI. The sec­tion on Lev­er­hulme CFI be­came longer than the sec­tion on Jess and was mostly un­re­lated to her work, so I’ve taken it out and in­cluded it as an ad­den­dum.

Im­pres­sions of Policy Papers

First is her policy re­search. The pa­pers I read were from those linked on her blog. They were:

On the first pa­per, about fo­cus­ing on ten­sions: the pa­per said that many “prin­ci­ples of AI ethics” that peo­ple pub­li­cly talk about in in­dus­try, non-profit, gov­ern­ment and academia are sub­stan­tively mean­ingless, be­cause they don’t come with the sort of con­crete ad­vice that ac­tu­ally tells you how to ap­ply them—and speci­fi­cally, how to trade them off against each other. The part of the pa­per I found most in­ter­est­ing were four para­graphs point­ing to spe­cific ten­sions be­tween prin­ci­ples of AI ethics. They were:

  • Us­ing data to im­prove the qual­ity and effi­ciency of ser­vices vs. re­spect­ing the pri­vacy and au­ton­omy of individuals

  • Us­ing al­gorithms to make de­ci­sions and pre­dic­tions more ac­cu­rate vs. en­sur­ing fair and equal treatment

  • Reap­ing the benefits of in­creased per­son­al­iza­tion in the digi­tal sphere vs. en­hanc­ing soli­dar­ity and citizenship

  • Us­ing au­toma­tion to make peo­ple’s lives more con­ve­nient and em­pow­ered vs. pro­mot­ing self-ac­tu­al­iza­tion and dignity

My sense is that while there is some good pub­lic dis­cus­sion about AI and policy (e.g. OpenAI’s work on re­lease prac­tices seems quite pos­i­tive to me), much con­ver­sa­tion that brands it­self as ‘ethics’ is of­ten not mo­ti­vated by the de­sire to en­sure this novel tech­nol­ogy im­proves so­ciety in ac­cor­dance with our deep­est val­ues, but in­stead by fac­tors like rep­u­ta­tion, PR and poli­tics.

There are many no­tions, like Peter Thiel’s “At its core, ar­tifi­cial in­tel­li­gence is a mil­i­tary tech­nol­ogy” or the com­mon ques­tion “Who should con­trol the AI?” which don’t fully ac­count for the de­tails of how ma­chine learn­ing and ar­tifi­cial in­tel­li­gence sys­tems work, or the ways in which we need to think about them in very differ­ent ways from other tech­nolo­gies; in par­tic­u­lar, that we will need to build new con­cepts and ab­strac­tions to talk about them. I think this is also true of most con­ver­sa­tions around mak­ing AI fair, in­clu­sive, demo­cratic, safe, benefi­cial, re­spect­ful of pri­vacy, etc.; they sel­dom con­sider how these val­ues can be grounded in mod­ern ML sys­tems or fu­ture AGI sys­tems. My sense is that much of the best con­ver­sa­tion around AI is about how to cor­rectly con­cep­tu­al­ize it. This is some­thing that (I was sur­prised to find) Henry Kiss­inger’s ar­ti­cle on AI did well; he spends most of the es­say try­ing to figure out which ab­strac­tions to use, as op­posed to us­ing already ex­ist­ing ones.

The rea­son I liked that bit of Jess’s pa­per is that I felt the pa­per used main­stream lan­guage around AI ethics (in a way that could ap­peal to a broad au­di­ence), but then:

  • Cor­rectly pointed out that AI is a suffi­ciently novel tech­nol­ogy that we’re go­ing to have to re­think what these val­ues ac­tu­ally mean, be­cause the tech­nol­ogy causes a host of fun­da­men­tally novel ways for them to come into tension

  • Pro­vided con­crete ex­am­ples of these tensions

In the con­text of a pub­lic con­ver­sa­tion that I feel is of­ten sub­stan­tially mo­ti­vated by poli­tics and PR rather than truth, see­ing some­one point clearly at im­por­tant con­cep­tual prob­lems felt like a breath of fresh air.

That said, given all of the poli­ti­cal in­cen­tives around pub­lic dis­cus­sion of AI and ethics, I don’t know how pa­pers like this can im­prove the con­ver­sa­tion. For ex­am­ple, com­pa­nies are wor­ried about los­ing in the court of Twit­ter’s pub­lic opinion, and also are wor­ried about things like gov­ern­men­tal reg­u­la­tion, which are strong forces push­ing them to pri­mar­ily take pop­u­lar but in­effec­tual steps to be more “eth­i­cal”. I’m not say­ing pa­pers like this can’t im­prove this situ­a­tion in prin­ci­ple, only that I don’t per­son­ally feel like I have much of a clue about how to do it or how to eval­u­ate whether some­one else is do­ing it well, in ad­vance of their hav­ing suc­cess­fully done it.

Per­son­ally, I feel much more able to eval­u­ate the con­cep­tual work of figur­ing out how to think about AI and its strate­gic im­pli­ca­tions (two stand­out ex­am­ples are this pa­per by Bostrom and this LessWrong post by Chris­ti­ano), rather than work on re­vis­ing pop­u­lar views about AI. I’d be ex­cited to see Jess con­tinue with the con­cep­tual side of her work, but if she in­stead pri­mar­ily aims to in­fluence pub­lic con­ver­sa­tion (the other goal of that pa­per), I per­son­ally don’t think I’ll be able to eval­u­ate and recom­mend grants on that ba­sis.

From the sec­ond pa­per I read sec­tions 3 and 4, which lists many safety and se­cu­rity prac­tices in the fields of biosafety, com­puter in­for­ma­tion se­cu­rity, and in­sti­tu­tional re­view boards (IRBs), then out­lines vari­ables for analysing re­lease prac­tices in ML. I found it use­ful, even if it was shal­low (i.e. did not go into much depth in the fields it cov­ered). Over­all, the pa­per felt like a fine first step in think­ing about this space.

In both pa­pers, I was con­cerned with the level of in­spira­tion drawn from bioethics, which seems to me to be a ter­ribly bro­ken field (cf. Scott Alexan­der talk­ing about his IRB night­mare or medicine’s ‘cul­ture of life’). My un­der­stand­ing is that bioethics co­or­di­nated a suc­cess­ful power grab (cf. OpenPhil’s writeup) from the field of medicine, cre­at­ing hun­dreds of dys­func­tional and im­prac­ti­cal ethics boards that have formed a highly ad­ver­sar­ial re­la­tion­ship with doc­tors (whose prac­ti­cal in­volve­ment with pa­tients of­ten makes them bet­ter than ethi­cists at mak­ing trade­offs be­tween treat­ment, pain/​suffer­ing, and dig­nity). The for­ma­tion of an “AI ethics” com­mu­nity that has this sort of ad­ver­sar­ial, un­healthy re­la­tion­ship with ma­chine learn­ing re­searchers would be an in­cred­ible catas­tro­phe.

Over­all, it seems like Jess is still at the be­gin­ning of her re­search ca­reer (she’s only been in this field for ~1.5 years). And while she’s spent a lot of effort on ar­eas that don’t per­son­ally ex­cite me, both of her pa­pers in­clude in­ter­est­ing ideas, and I’m cu­ri­ous to see her fu­ture work.

Im­pres­sions of Other Writing

Jess also writes a blog, and this is one of the main things that makes me ex­cited about this grant. On the topic of AI, she wrote three posts (1, 2, 3), all of which made good points on at least one im­por­tant is­sue. I also thought the post on con­fir­ma­tion bias and her PhD was quite thought­ful. It cor­rectly iden­ti­fied a lot of prob­lems with dis­cus­sions of con­fir­ma­tion bias in psy­chol­ogy, and came to a much more nu­anced view of the trade-off be­tween be­ing open-minded ver­sus com­mit­ting to your plans and be­liefs. Over­all, the posts show in­de­pen­dent think­ing writ­ten with an in­tent to ac­tu­ally con­vey un­der­stand­ing to the reader, and do­ing a good job of it. They share the vibe I as­so­ci­ate with much of Ju­lia Galef’s work—they’re notic­ing true ob­ser­va­tions /​ con­cep­tual clar­ifi­ca­tions, suc­cess­fully mov­ing the con­ver­sa­tion for­ward one or two steps, and avoid­ing poli­ti­cal con­flict.

I do have some sig­nifi­cant con­cerns with the work above, in­clud­ing the pos­i­tive por­trayal of bioethics and the ab­sence of any crit­i­cism to­ward the AAAI safety con­fer­ence talks, many of which seem to me to have ma­jor flaws.

While I’m not ex­cited about Lev­er­hulme CFI’s work (see the ad­den­dum for de­tails), I think it will be good for Jess to have free rein to fol­low her own re­search ini­ti­a­tives within CFI. And while she might be able to ob­tain fund­ing el­se­where, this al­ter­na­tive seems con­sid­er­ably worse, as I ex­pect other fund­ing op­tions would sub­stan­tially con­strain the types of re­search she’d be able to con­duct.

Lynette Bye ($23,000)

Pro­duc­tivity coach­ing for effec­tive al­tru­ists to in­crease their im­pact.
I plan to con­tinue coach­ing high-im­pact EAs on pro­duc­tivity. I ex­pect to have 600+ ses­sions with about 100 clients over the next year, fo­cus­ing on peo­ple work­ing in AI safety and EA orgs. I’ve worked with peo­ple at FHI, Open Phil, CEA, MIRI, CHAI, Deep­Mind, the Forethought Foun­da­tion, and ACE, and will prob­a­bly con­tinue to do so. Half of my cur­rent clients (and a third of all clients I’ve worked with) are peo­ple at these orgs. I aim to in­crease my clients’ out­put by im­prov­ing pri­ori­ti­za­tion and in­creas­ing fo­cused work time.
I would use the fund­ing to: offer a sub­si­dized rate to peo­ple at EA orgs (e.g. be­tween $10 and $50 in­stead of $125 per call), offer free coach­ing for se­lect coachees referred by 80,000 Hours, and hire con­trac­tors to help me cre­ate ma­te­ri­als to scale coach­ing.
You can view my im­pact eval­u­a­tion (linked be­low) for how I’m mea­sur­ing my im­pact so far.

(Lynette’s pub­lic self-eval­u­a­tion is here.)

I gen­er­ally think it’s pretty hard to do “pro­duc­tivity coach­ing” as your pri­mary ac­tivity, es­pe­cially when you are young, due to a lack of work ex­pe­rience. This means I have a high bar for it be­ing a good idea that some­one should go full-time into the “help other peo­ple be more pro­duc­tive” busi­ness.

My sense is that Lynette meets that bar, but only barely (to be clear, I con­sider it to be a high bar). The main thing that she seems to be do­ing well is be­ing very or­ga­nized about ev­ery­thing that she is do­ing, in a way that makes me con­fi­dent that her work has had a real im­pact — if not, I think she’d have no­ticed and moved on to some­thing else.

How­ever, as I say in the CFAR writeup, I have a lot of con­cerns with pri­mar­ily op­ti­mis­ing for leg­i­bil­ity, and Lynette’s work shows some signs of this. She has shared around 60 tes­ti­mo­ni­als on her web­site (linked here). Of these, not one of them men­tioned any­thing nega­tive, which clearly in­di­cates that I can’t straight­for­wardly in­ter­pret those tes­ti­mo­ni­als as pos­i­tive ev­i­dence (since any un­bi­ased sam­pling pro­cess would have re­sulted in at least some nega­tive dat­a­points). I much pre­fer what an­other ap­pli­cant did here: they asked peo­ple to send us in­for­ma­tion anony­mously, which in­creased the chance of our hear­ing opinions that weren’t se­lected to cre­ate a pos­i­tive im­pres­sion. As is, I think I ac­tu­ally shouldn’t up­date much on the tes­ti­mo­ni­als, in par­tic­u­lar given that none of them go into much de­tail on how Lynette has helped them, and al­most all of them share a similar struc­ture.

Reflect­ing on the broader pic­ture, I think that Lynette’s mind­set re­flects how I think many of the best op­er­a­tions staff I’ve seen op­er­ate: aim to be pro­duc­tive by us­ing sim­ple out­put met­rics, and by do­ing things in a mind­ful, struc­tured way (as op­posed to, for ex­am­ple, try­ing to aim for deep trans­for­ma­tive in­sights more tra­di­tion­ally as­so­ci­ated with psy­chother­apy). There is a deep grounded-ness and prac­ti­cal na­ture to it. I have a lot of re­spect for that mind­set, and I feel as though it’s un­der­rep­re­sented in the cur­rent EA/​ra­tio­nal­ity land­scape. My in­side-view mod­els sug­gest that you can achieve a bunch of good things by helping peo­ple be­come more pro­duc­tive in this way.

I also think that this mind­set comes with a type of prag­ma­tism that I am more con­cerned about, and of­ten gives rise to what I con­sider un­healthy ad­ver­sar­ial dy­nam­ics. As I dis­cussed above, it’s difficult to get in­for­ma­tion from Lynette’s pos­i­tive tes­ti­mo­ni­als. My sense is that she might have pro­duced them by di­rectly op­ti­mis­ing for “get­ting a grant” and try­ing to give me lots of pos­i­tive in­for­ma­tion, lead­ing to sub­stan­tial bias in the se­lec­tion pro­cess. The tech­nique of ‘just op­ti­mize for the tar­get’ is valuable in lots of do­mains, but in this case was quite nega­tive.

That said, fram­ing her coach­ing as achiev­ing a se­ries of similar re­sults gen­er­ally moves me closer to think­ing about this grant as “coach­ing as a com­mod­ity”. Im­por­tantly, few peo­ple re­ported very large gains in their pro­duc­tivity; the tes­ti­mo­ni­als in­stead show a solid stream of small im­prove­ments. I think that very few peo­ple have ac­cess to good coach­ing, and the high var­i­ance in coach qual­ity means that ex­per­i­ment­ing is of­ten quite ex­pen­sive and time-con­sum­ing. Lynette seems to be able to con­sis­tently pro­duce pos­i­tive effects in the peo­ple she is work­ing with, mak­ing her ser­vices a lot more valuable due to greater cer­tainty around the out­come. (How­ever, I also as­sign sig­nifi­cant prob­a­bil­ity that the way the eval­u­a­tion ques­tions were asked re­duced the rate at which clients re­ported ei­ther nega­tive or highly pos­i­tive ex­pe­riences.)

I think that many pro­duc­tivity coaches fail to achieve Lynette’s level of re­li­a­bil­ity, which is one of the key things that makes me hope­ful about her work here. My guess is that the value-add of coach­ing is of­ten straight­for­wardly pos­i­tive un­less you im­pose sig­nifi­cant costs on your clients, and Lynette seems quite good at avoid­ing that by pri­mar­ily op­ti­miz­ing for pro­fes­sion­al­ism and re­li­a­bil­ity.

Fur­ther Recom­men­da­tions (not funded by the LTF Fund)

Cen­ter for Ap­plied Ra­tion­al­ity ($150,000)

This grant was recom­mended by the Fund, but ul­ti­mately was funded by a pri­vate donor, who (prior to CEA fi­nal­iz­ing its stan­dard due dili­gence checks) had per­son­ally offered to make this dona­tion in­stead. As such, the grant recom­men­da­tion was with­drawn.

Oliver Habryka had cre­ated a full writeup by that point, so it is in­cluded be­low.

Help promis­ing peo­ple to rea­son more effec­tively and find high-im­pact work, such as re­duc­ing x-risk.
The Cen­ter for Ap­plied Ra­tion­al­ity runs work­shops that pro­mote par­tic­u­lar epistemic norms—broadly, that be­liefs should be true, bugs should be solved, and that in­tu­itions/​aver­sions of­ten con­tain use­ful data. Th­ese work­shops are de­signed to cause po­ten­tially im­pact­ful peo­ple to rea­son more effec­tively, and to find peo­ple who may be in­ter­ested in pur­su­ing high-im­pact ca­reers (es­pe­cially AI safety).
Many of the peo­ple cur­rently work­ing on AI safety have been through a CFAR work­shop, such as 27% of the at­ten­dees at the 2019 FLI con­fer­ence on Benefi­cial AI in Puerto Rico, and for some of those peo­ple it ap­pears that CFAR played a causal role in their de­ci­sion to switch ca­reers. In the con­fi­den­tial sec­tion, we list some grad­u­ates from CFAR pro­grams who sub­se­quently de­cided to work on AI safety, along with our es­ti­mates of the coun­ter­fac­tual im­pact of CFAR on their de­ci­sion [16 at MIRI, 3 on the OpenAI safety team, 2 at CHAI, and one each at Ought, Open Phil and the Deep­Mind safety team].
Re­cruit­ment is the most leg­ible form of im­pact CFAR has, and is prob­a­bly its most im­por­tant—the top re­ported bot­tle­neck in the last two years among EA lead­ers at Lead­ers Fo­rum, for ex­am­ple, was find­ing tal­ented em­ploy­ees.
In 2019, we ex­pect to run or co-run over 100 days of work­shops, in­clud­ing our main­line work­shop (de­signed to grow/​im­prove the ra­tio­nal­ity com­mu­nity), work­shops de­signed speci­fi­cally to re­cruit pro­gram­mers (AIRCS) and math­e­mat­i­ci­ans (MSFP) to AI safety orgs, a 4-week­end in­struc­tor train­ing pro­gram (to in­crease our ca­pac­ity to run work­shops), and alumni re­unions in both the United States and Europe (to grow the EA/​ra­tio­nal­ity com­mu­nity and cause im­pact­ful peo­ple to meet/​talk with one an­other). Broadly speak­ing, we in­tend to con­tinue do­ing the sort of work we have been do­ing so far.

In our last grant round, I took an out­side view on CFAR and said that, in terms of out­put, I felt satis­fied with CFAR’s achieve­ments in re­cruit­ment, train­ing and the es­tab­lish­ment of com­mu­nal epistemic norms. I still feel this way about those ar­eas, and my writeup last round still seems like an ac­cu­rate sum­mary of my rea­sons for want­ing to grant to CFAR. I also said that most of my un­cer­tainty about CFAR lies in its long-term strate­gic plans, and I con­tinue to feel rel­a­tively con­fused about my thoughts on that.

I find it difficult to ex­plain my thoughts on CFAR, and I think that a large frac­tion of this difficulty comes from CFAR be­ing an or­ga­ni­za­tion that is in­ten­tion­ally not op­ti­miz­ing to­wards be­ing easy to un­der­stand from the out­side, hav­ing sim­ple met­rics, or more broadly be­ing leg­ible [1]. CFAR is in­ten­tion­ally avoid­ing be­ing leg­ible to the out­side world in many ways. This de­ci­sion is not ob­vi­ously wrong, as I think it brings many pos­i­tives, but I think it is the cause of me feel­ing par­tic­u­larly con­fused about how to talk co­her­ently about CFAR.

Con­sid­er­a­tions around legibility

Sum­mary: CFAR’s work is varied and difficult to eval­u­ate. This has some good fea­tures — it can avoid fo­cus­ing too closely on met­rics that don’t mea­sure im­pact well — but also forces eval­u­a­tors to rely on fac­tors that aren’t easy to mea­sure, like the qual­ity of its in­ter­nal cul­ture. On the whole, while I wish CFAR were some­what more leg­ible, I ap­pre­ci­ate the benefits to CFAR’s work of not max­i­miz­ing “leg­i­bil­ity” at the cost of im­pact or flex­i­bil­ity.

To help me ex­plain my point, let’s con­trast CFAR with an or­ga­ni­za­tion like AMF, which I think of as ex­cep­tion­ally leg­ible. AMF’s work, com­pared to many other or­ga­ni­za­tions with tens of mil­lions of dol­lars on hand, is easy to un­der­stand: they buy bed­nets and give them to poor peo­ple in de­vel­op­ing coun­tries. As long as AMF con­tinues to carry out this plan, and pro­vides ba­sic data show­ing its suc­cess in bed­net dis­tri­bu­tion, I feel like I can eas­ily model what the or­ga­ni­za­tion will do. If I found out that AMF was spend­ing 10% of its money fund­ing re­li­gious lead­ers in de­vel­op­ing coun­tries to preach good eth­i­cal prin­ci­ples for so­ciety, or fund­ing the cam­paigns of gov­ern­ment offi­cials fa­vor­able to their work, I would be very sur­prised and feel like some ba­sic agree­ment or con­tract had been vi­o­lated — re­gard­less of whether I thought those de­ci­sions, in the ab­stract, were good or bad for their mis­sion. AMF claims to dis­tribute anti-malaria bed­nets, and it is on this ba­sis that I would choose whether to sup­port them.

AMF could have been a very differ­ent or­ga­ni­za­tion, and still could be if it wanted to. For ex­am­ple, it could con­duct re­search on var­i­ous ways to effect change, and give its core staff the free­dom to do what­ever they thought was best. This new AMF (“AMF 2.0”) might not be able to tell you ex­actly what they’ll do next year, be­cause they haven’t figured it out yet, but they can tell you that they’ll do what­ever their staff de­ter­mine is best. This could be dis­tribut­ing de­worm­ing pills, pur­su­ing spec­u­la­tive med­i­cal re­search, en­gag­ing in poli­ti­cal ac­tivism, fund­ing re­li­gious or­ga­ni­za­tions, etc.

If GiveWell wanted to eval­u­ate AMF 2.0, they would need to use a rad­i­cally differ­ent style of rea­son­ing. There wouldn’t be a straight­for­ward in­ter­ven­tion with RCTs to look into. There wouldn’t be a straight­for­ward track record of im­pact from which to ex­trap­o­late. Judg­ing AMF 2.0 would re­quire GiveWell to form much more nu­anced judg­ments about the qual­ity of think­ing and ex­e­cu­tion of AMF’s staff, to eval­u­ate the qual­ity of its in­ter­nal cul­ture, and to con­sider a host of other fac­tors that weren’t pre­vi­ously rele­vant.

I think that eval­u­at­ing CFAR re­quires a lot of that kind of anal­y­sis, which seems in­her­ently harder to com­mu­ni­cate to other peo­ple with­out sum­ma­riz­ing one’s views as: “I trust the peo­ple in that or­ga­ni­za­tion to make good de­ci­sions.”

The more gen­eral idea here is that or­ga­ni­za­tions are sub­ject to band­width con­straints—they of­ten want to do lots of differ­ent things, but their fun­ders need to be able to un­der­stand and pre­dict their be­hav­ior with limited re­sources for eval­u­a­tion. As I’ve writ­ten about re­cently, a key vari­able for any or­ga­ni­za­tion is the peo­ple and or­ga­ni­za­tions by which they are try­ing to be un­der­stood and held ac­countable. For char­i­ties that re­ceive most of their fund­ing in small dona­tions from a large pop­u­la­tion of peo­ple who don’t know much about them, this is a very strong con­straint; they must com­mu­ni­cate their work so that peo­ple can un­der­stand it very quickly with lit­tle back­ground in­for­ma­tion. If a char­ity in­stead re­ceives most of its fund­ing in large dona­tions from a small set of peo­ple who fol­low it closely, it can com­mu­ni­cate much more freely, be­cause the fun­ders will be able to spend a lot of their time talk­ing to the org, ex­chang­ing mod­els, and gen­er­ally com­ing to an un­der­stand­ing of why the org is do­ing what it’s do­ing.

This idea partly ex­plains why most or­ga­ni­za­tions tend to fo­cus on leg­i­bil­ity, in how they talk about their work and even in the work they choose to pur­sue. It can be difficult to at­tract re­sources and sup­port from ex­ter­nal par­ties if one’s work isn’t leg­ible.

I think that CFAR is still likely op­ti­miz­ing too lit­tle to­wards leg­i­bil­ity, com­pared to what I think would be ideal for it. Be­ing leg­ible al­lows an or­ga­ni­za­tion to be more con­fi­dent that its work is hav­ing real effects, be­cause it ac­quires ev­i­dence that holds up to a va­ri­ety of differ­ent view­points. How­ever, I think that far too many or­ga­ni­za­tions (non­profit and oth­er­wise) are try­ing too hard to make their work leg­ible, in a way that re­duces in­no­va­tion and also in­tro­duces a va­ri­ety of ad­ver­sar­ial dy­nam­ics. When you make sys­tems that can be gamed, and which carry re­wards for suc­cess (e.g. job sta­bil­ity, pres­tige, etc), peo­ple will re­li­ably turn up to game them [2].

(As Ja­cob Lager­ros has writ­ten in his post on Un­con­scious Eco­nomics, this doesn’t mean peo­ple are con­sciously gam­ing your sys­tem, but merely that this be­hav­ior will even­tu­ally tran­spire. The many causes of this in­clude se­lec­tion effects, re­in­force­ment learn­ing, and memetic evolu­tion.)

In my view, CFAR, by not try­ing to op­ti­mize for a sin­gle, easy-to-ex­plain met­ric, avoids play­ing the “game” many non­prof­its play of fo­cus­ing on work that will look ob­vi­ously good to donors, even if it isn’t what the non­profit be­lieves would be most im­pact­ful. They also avoid a va­ri­ety of other games that come from leg­i­bil­ity, such as job ap­pli­cants get­ting very good at fak­ing the sig­nals that they are a good fit for an or­ga­ni­za­tion, mak­ing it harder for them to find good ap­pli­cants.

Op­ti­miz­ing for com­mu­ni­ca­tion with the goal of be­ing given re­sources in­tro­duces ad­ver­sar­ial dy­nam­ics; some­one ask­ing for money may provide limited/​bi­ased in­for­ma­tion that raises the chance they’ll be given a grant but re­duces the ac­cu­racy of the grant­maker’s un­der­stand­ing. (See my com­ment in Lynette’s writeup be­low for an ex­am­ple of how this can arise.) This op­ti­miza­tion can also tie down your re­sources, forc­ing you to carry out com­mit­ments you made for the sake of leg­i­bil­ity, rather than do­ing what you think would be most im­pact­ful [3].

So I think that it’s im­por­tant that we don’t force all or­ga­ni­za­tions to­wards max­i­mal leg­i­bil­ity. (That said, we should en­sure that or­ga­ni­za­tions are en­couraged to pur­sue at least some de­gree of leg­i­bil­ity, since the lack of leg­i­bil­ity also gives rise to var­i­ous prob­lems.)

Do I trust CFAR to make good de­ci­sions?

As I men­tioned in my ini­tial com­ments on CFAR, I gen­er­ally think that the cur­rent pro­jects CFAR is work­ing on are quite valuable and worth the re­sources they are con­sum­ing. But I have a lot of trou­ble mod­el­ing CFAR’s long-term plan­ning, and I feel like I have to rely in­stead on my mod­els of how much I trust CFAR to make good de­ci­sions in gen­eral, in­stead of be­ing able to eval­u­ate the mer­its of their ac­tual plans.

That said, I do gen­er­ally trust CFAR’s de­ci­sion-mak­ing. It’s hard to ex­plain the ev­i­dence that causes me to be­lieve this, but I’ll give a brief overview any­way. (This ev­i­dence prob­a­bly won’t be com­pel­ling to oth­ers, but I still want to give an ac­cu­rate sum­mary of where my be­liefs come from):

  • I ex­pect that a large frac­tion of CFAR’s fu­ture strate­gic plans will con­tinue to be made by Anna Sala­mon, from whom I have learned a lot of valuable long-term think­ing skills, and who seems to me to have made good de­ci­sions for CFAR in the past.

  • I think CFAR’s cul­ture, while im­perfect, is still based on strong foun­da­tions of good rea­son­ing with deep roots in the philos­o­phy of sci­ence and the writ­ings of Eliezer Yud­kowsky (which I think serve as a good ba­sis for learn­ing how to think clearly).

  • I have made a lot of what I con­sider my best and most im­por­tant strate­gic de­ci­sions in the con­text of, and aided by, events or­ga­nized by CFAR. This sug­gests to me that at least some of that gen­er­al­izes to CFAR’s in­ter­nal abil­ity to think strate­gi­cally.

  • I am ex­cited about a num­ber of in­di­vi­d­u­als who in­tend to com­plete CFAR’s lat­est round of in­struc­tor train­ing, which gives me some op­ti­mism about CFAR’s fu­ture ac­cess to good tal­ent and its abil­ity to es­tab­lish and sus­tain a good in­ter­nal cul­ture.


[1] The fo­cus on ‘leg­i­bil­ity’ in this con­text I take from James C. Scott’s book “See­ing Like a State.” It was in­tro­duced to me by Eliz­a­beth Van Nos­trand in this blog­post dis­cussing it in the con­text of GiveWell and good giv­ing; Scott Alexan­der also dis­cussed it in his re­view of the book . Here’s an ex­am­ple from Scott re­gard­ing cen­tral­ized plan­ning and gov­er­nance:

The cen­tral­ized state wanted the world to be “leg­ible”, i.e. ar­ranged in a way that made it easy to mon­i­tor and con­trol. An in­tact for­est might be more pro­duc­tive than an evenly-spaced rec­t­an­gu­lar grid of Nor­way spruce, but it was harder to leg­is­late rules for, or as­sess taxes on.

[2] The er­rors that fol­low are all forms of Good­hart’s Law, which states that “any ob­served statis­ti­cal reg­u­lar­ity will tend to col­lapse once pres­sure is placed upon it for con­trol pur­poses.”

[3] The benefits of (and forces that en­courage) sta­bil­ity and re­li­a­bil­ity can maybe be most trans­par­ently un­der­stood in the con­text of menu costs and the prevalence of highly sticky wages.


Ad­den­dum: Thoughts on a Strat­egy Ar­ti­cle by the Lead­er­ship of Lev­er­hulme CFI and CSER

I wrote the fol­low­ing in the course of think­ing about the grant to Jess Whit­tle­stone. While the grant is to sup­port Jess’s work, the grant money will go to Lev­er­hulme CFI, which will main­tain dis­cre­tion about whether to con­tinue em­ploy­ing her, and will likely in­fluence what type of work she will pur­sue.

As such, it seems im­por­tant to not only look into Jess’s work, but also look into Lev­er­hulme CFI and its sister or­ga­ni­za­tion, the Cen­tre for the Study of Ex­is­ten­tial Risk (CSER). While my eval­u­a­tion of the or­ga­ni­za­tion that will sup­port Jess dur­ing her post­doc is rele­vant to my eval­u­a­tion of the grant, it is quite long and does not di­rectly dis­cuss Jess or her work, so I’ve moved it into a sep­a­rate sec­tion.

I’ve read a few pa­pers from CFI and CSER over the years, and heard many im­pres­sions of their work from other peo­ple. For this writeup, I wanted to en­gage more con­cretely with their out­put. I reread and re­viewed an ar­ti­cle pub­lished in Na­ture ear­lier this year called Bridg­ing near- and long-term con­cerns about AI, writ­ten by the Ex­ec­u­tive Direc­tors at Lev­er­hulme CFI and CSER re­spec­tively, Stephen Cave and Seán ÓhÉigeartaigh.

Sum­mary and aims of the article

The ar­ti­cle’s sum­mary:

De­bate about the im­pacts of AI is of­ten split into two camps, one as­so­ci­ated with the near term and the other with the long term. This di­vide is a mis­take — the con­nec­tions be­tween the two per­spec­tives de­serve more at­ten­tion, say Stephen Cave and Seán S. ÓhÉigeartaigh.

This is not a po­si­tion I hold, and I’m go­ing to en­gage with the con­tent be­low in more de­tail.

Over­all, I found the claims of the es­say hard to parse and of­ten am­bigu­ous, but I’ve at­tempted to sum­ma­rize what I view as its three main points:

  1. If ML is a pri­mary tech­nol­ogy used in AGI, then there are likely some de­sign de­ci­sions to­day that will cre­ate lock-in in the long-term and have in­creas­ingly im­por­tant im­pli­ca­tions for AGI safety.

  2. If we can pre­dict changes in so­ciety from ML that mat­ter in the long-term (such as au­toma­tion of jobs), then we can pre­pare policy for them in the short term (like prepar­ing ed­u­ca­tional re­train­ing for lorry drivers who will be au­to­mated).

  3. Norms and in­sti­tu­tions built to­day will have long-term effects, and so peo­ple who care about the long term should es­pe­cially care about near-term norms and in­sti­tu­tions.

They say “Th­ese three points re­late to ways in which ad­dress­ing near-term is­sues could con­tribute to solv­ing po­ten­tial long-term prob­lems.

If I ask my­self what Lev­er­hulme/​CSER’s goals are for this doc­u­ment, it feels to me like it is in­tended as a state­ment of diplo­macy. It’s say­ing that near-term and long-term AI risk work are split into two camps, but that we should be look­ing for com­mon ground (“t_he con­nec­tions be­tween the two per­spec­tives de­serve more at­ten­tion_”, “Learn­ing from the long term”). It tries to em­pha­size shared val­ues (“Con­nected re­search pri­ori­ties”) and the im­por­tance of co­op­er­a­tion amongst many en­tities (“The challenges we will face are likely to re­quire deep in­ter­dis­ci­plinary and in­ter­sec­toral col­lab­o­ra­tion be­tween in­dus­tries, academia and poli­cy­mak­ers, alongside new in­ter­na­tional agree­ments”). The goal that I think it is try­ing to achieve is to ne­go­ti­ate trade and peace be­tween the near-term and long-term camps by ar­gu­ing that “This di­vide is a mis­take”.

Draw­ing the defi­ni­tions does a lot of work

The au­thors define “long-term con­cerns” with the fol­low­ing three ex­am­ples:

wide-scale loss of jobs, risks of AI de­vel­op­ing broad su­per­hu­man ca­pa­bil­ities that could put it be­yond our con­trol, and fun­da­men­tal ques­tions about hu­man­ity’s place in a world with in­tel­li­gent ma­chines

De­spite this broad defi­ni­tion, they only use con­crete ex­am­ples from the first cat­e­gory, which I would clas­sify as some­thing like “mid-term is­sues.” I think the pos­si­bil­ity of even wide-scale loss of jobs, un­less in­ter­preted ex­tremely broadly, is some­thing that does not make sense to put into the same cat­e­gory as the other two, which are pri­mar­ily con­cerned with stakes that are or­ders of mag­ni­tude higher (such as the fu­ture of the hu­man species). I think this con­fla­tion of very differ­ent con­cerns causes the rest of the ar­ti­cle to make an ar­gu­ment that is more likely to mis­lead than to in­form.

After this defi­ni­tion, the ar­ti­cle failed to men­tion any is­sue that I would clas­sify as rep­re­sen­ta­tive of the long-term con­cerns of Nick Bostrom or Max Teg­mark, both of whom are cited by the ar­ti­cle to define “long-term is­sues.” (In Teg­mark’s book Life 3.0, he ex­plic­itly cat­e­go­rizes un­em­ploy­ment as a short-term con­cern, to be dis­t­in­guished from long-term con­cerns.)

Con­cep­tual con­fu­sions in short- and mid-term policy suggestions

The ar­ti­cle has the fol­low­ing policy idea:

Take ex­plain­abil­ity (the ex­tent to which the de­ci­sions of au­tonomous sys­tems can be un­der­stood by rele­vant hu­mans): if reg­u­la­tory mea­sures make this a re­quire­ment, more fund­ing will go to de­vel­op­ing trans­par­ent sys­tems, while tech­niques that are pow­er­ful but opaque may be de­pri­ori­tized.

(Let me be clear that this is not ex­plic­itly listed as a policy recom­men­da­tion.)

My naive prior is that there is no good AI reg­u­la­tion a gov­ern­ment could es­tab­lish to­day. I con­tinue to feel this way af­ter look­ing into this case (and the next ex­am­ple be­low). Let me ex­plain why in this case the idea that reg­u­la­tion re­quiring ex­plain­abil­ity would en­courage trans­par­ent + ex­plain­able sys­tems is false.

Modern ML sys­tems are not do­ing a type of rea­son­ing that is amenable to ex­pla­na­tion in the way hu­man de­ci­sions of­ten are. There is not a prin­ci­pled ex­pla­na­tion of their rea­son­ing when de­cid­ing whether to offer you a bank loan, there is merely a mass of cor­re­la­tions be­tween spend­ing his­tory and later re­li­a­bil­ity, which may fac­torise into a small num­ber of well-defined chunks like “how reg­u­larly some­one pays their rent” but it might not. The main prob­lem with the quoted para­graph is that it does not at all at­tempt to spec­ify how to define ex­plain­abil­ity in an ML sys­tem to the point where it can be reg­u­lated, mean­ing that any reg­u­la­tion would ei­ther be mean­ingless and ig­nored, or worse highly dam­ag­ing. Poli­cies formed in this man­ner will ei­ther be of no con­se­quence, or deeply an­tag­o­nise the ML com­mu­nity. We cur­rently don’t know how to think about ex­plain­abil­ity of ML sys­tems, and ig­nor­ing that prob­lem and reg­u­lat­ing that they should be ‘ex­plain­able’ will not work.

The ar­ti­cle also con­tains the fol­low­ing policy idea about au­tonomous weapons.

The de­ci­sions we make now, for ex­am­ple, on in­ter­na­tional reg­u­la­tion of au­tonomous weapons, could have an out­sized im­pact on how this field de­vel­ops. A firm prece­dent that only a hu­man can make a ‘kill’ de­ci­sion could sig­nifi­cantly shape how AI is used — for ex­am­ple, putting the fo­cus on en­hanc­ing in­stead of re­plac­ing hu­man ca­pac­i­ties.

Here and through­out the ar­ti­cle, re­peated uses of the con­di­tional ‘could’ make it un­clear to me whether this is be­ing en­dorsed or merely sug­gested. I can’t quite tell if they think that drone swarms are a long-term is­sue—they con­trast it with a short-term is­sue but don’t ex­plic­itly say that it is long-term. Nonethe­less, I think their sug­gest­ing it here is also a bit mis­guided.

Let me con­trast this with Nick Bostrom on a re­cent epi­sode of the Joe Ro­gan Ex­pe­rience­ex­plain­ing that he thinks that the spe­cific rule has am­bigu­ous value. Here’s a quote from a dis­cus­sion of the cam­paign to ban lethal au­tonomous weapons:

Nick Bostrom: I’ve kind of stood a lit­tle bit on the sidelines on that par­tic­u­lar cam­paign, be­ing a lit­tle un­sure ex­actly what it is that… cer­tainly I think it’d be bet­ter if we re­frained from hav­ing some arms race to de­velop these than not. But if you start to look in more de­tail: What pre­cisely is the thing that you’re hop­ing to ban? So if the idea is the au­tonomous bit, that the robot should not be able to make its own firing de­ci­sion, well, if the al­ter­na­tive to that is there is some 19-year old guy sit­ting in some office build­ing and his job is when­ever the screen flashes ‘fire now’ he has to press a red but­ton. And ex­actly the same thing hap­pens. I’m not sure how much is gained by hav­ing that ex­tra step.
In­ter­viewer: But it feels bet­ter for us for some rea­son. If some­one is push­ing the but­ton.
Nick Bostrom: But what ex­actly does that mean. In ev­ery par­tic­u­lar firing de­ci­sion? Well, you gotta at­tack this group of sur­face ships here, and here are the gen­eral pa­ram­e­ters, and you’re not al­lowed to fire out­side these co­or­di­nates? I don’t know. Another is the ques­tion of: it would be bet­ter if we had no wars, but if there is gonna be a war, maybe it is bet­ter if it’s robots v robots. Or if there’s gonna be bomb­ing, maybe you want the bombs to have high pre­ci­sion rather than low pre­ci­sion—get fewer civilian ca­su­alties.
On the other hand you could imag­ine it re­duces the thresh­old for go­ing to war, if you think that you wouldn’t fear any ca­su­alties you would be more ea­ger to do it. Or if it pro­lifer­ates and you have these mosquito-sized kil­ler-bots that ter­ror­ists have. It doesn’t seem like a good thing to have a so­ciety where you have a fa­cial-recog­ni­tion thing, and then the bot flies out and you just have a kind of dystopia.

Over­all, it seems that in both situ­a­tions, the key open ques­tions are in un­der­stand­ing the sys­tems and how they’ll in­ter­face with ar­eas of in­dus­try, gov­ern­ment and per­sonal life, and that reg­u­la­tion based on in­ac­cu­rate con­cep­tu­al­iza­tions of the tech­nol­ogy would ei­ther be mean­ingless or harm­ful.

Po­lariz­ing ap­proach to policy coordination

I have two main con­cerns with what I see as the in­tent of the pa­per.

The first one can be sum­ma­rized by Robin Han­son’s ar­ti­cle To Op­pose Po­lariza­tion, Tug Side­ways:

The policy world can [be] thought of as con­sist­ing of a few Tug-O-War “ropes” set up in this high di­men­sional policy space. If you want to find a com­fortable place in this world, where the peo­ple around you are re­as­sured that you are “one of them,” you need to con­tinu­ally and clearly tele­graph your loy­alty by treat­ing each policy is­sue as an­other op­por­tu­nity to find more sup­port­ing ar­gu­ments for your side of the key di­men­sions. That is, pick a rope and pull on it.
If, how­ever, you ac­tu­ally want to im­prove policy, if you have a se­cure enough po­si­tion to say what you like, and if you can find a rele­vant au­di­ence, then [you should] pre­fer to pull policy ropes side­ways. Few will bother to re­sist such pulls, and since few will have con­sid­ered such moves, you have a much bet­ter chance of iden­ti­fy­ing a move that im­proves policy. On the few main di­men­sions, not only will you find it very hard to move the rope much, but you should have lit­tle con­fi­dence that you ac­tu­ally have su­pe­rior in­for­ma­tion about which way the rope should be pul­led.

I feel like the ar­ti­cle above is not pul­ling policy ropes side­ways, but is in­stead con­nect­ing long-term is­sues to spe­cific sides of ex­ist­ing policy de­bates, around which there is already a lot of ten­sion. The is­sue of tech­nolog­i­cal un­em­ploy­ment seems to me to be a highly po­lariz­ing topic, where tak­ing a po­si­tion seems ill-ad­vised, and I have very low con­fi­dence about the cor­rect di­rec­tion in which to pull policy. En­tan­gling long-term is­sues with these highly tense short-term is­sues seems like it will likely re­duce our fu­ture abil­ity to broadly co­or­di­nate on these is­sues (by hav­ing them as­so­ci­ated with highly po­larized ex­ist­ing de­bates).

Distinc­tion be­tween long- and short-term thinking

My sec­ond con­cern is that on a deeper level, I think that the type of think­ing that gen­er­ates a lot of the ar­gu­ments around con­cerns for long-term tech­nolog­i­cal risks is very differ­ent from that which sug­gests poli­cies around tech­nolog­i­cal un­em­ploy­ment and racial bias. I think there is some value in hav­ing these sep­a­rate ways of think­ing en­gage in “con­ver­sa­tion,” but I think the linked pa­per is con­fus­ing in that it seems to try to down-play the differ­ences be­tween them. An anal­ogy might be the differ­ences be­tween physics and ar­chi­tec­ture; both fields nom­i­nally work with many similar ob­jects, but the dis­tinc­tion be­tween the two is very im­por­tant, and the fields clearly re­quire differ­ent types of think­ing and prob­lem-solv­ing.

Some of my con­cerns are sum­ma­rized by Eliezer in his writ­ing on Pivotal Acts:

…com­pared to the much more difficult prob­lems in­volved with mak­ing some­thing ac­tu­ally smarter than you be safe, it may be tempt­ing to try to write pa­pers that you know you can finish, like a pa­per on robotic cars caus­ing un­em­ploy­ment in the truck­ing in­dus­try, or a pa­per on who holds le­gal li­a­bil­ity when a fac­tory ma­chine crushes a worker. But while it’s true that crushed fac­tory work­ers and un­em­ployed truck­ers are both, ce­teris paribus, bad, they are not as­tro­nom­i­cal catas­tro­phes that trans­form all galax­ies in­side our fu­ture light cone into pa­per­clips, and the lat­ter cat­e­gory seems worth dis­t­in­guish­ing…
…there will [...] be a temp­ta­tion for the grantseeker to ar­gue, “Well, if AI causes un­em­ploy­ment, that could slow world eco­nomic growth, which will make coun­tries more hos­tile to each other, which would make it harder to pre­vent an AI arms race.” But the pos­si­bil­ity of some­thing end­ing up hav­ing a non-zero im­pact on as­tro­nom­i­cal stakes is not the same con­cept as events that have a game-chang­ing im­pact on as­tro­nom­i­cal stakes. The ques­tion is what are the largest low­est-hang­ing fruit in as­tro­nom­i­cal stakes, not whether some­thing can be ar­gued as defen­si­ble by point­ing to a non-zero as­tro­nom­i­cal im­pact.

I cur­rently don’t think that some­one who is try­ing to un­der­stand how to deal with tech­nolog­i­cal long-term risk should spend much time think­ing about tech­nolog­i­cal un­em­ploy­ment or re­lated is­sues, but it feels like the pa­per is try­ing to ad­vo­cate for the op­po­site po­si­tion.

Con­clud­ing thoughts on the article

Many peo­ple in the AI policy space have to spend a lot of effort to gain re­spect and in­fluence, and it’s gen­uinely hard to figure out a way to do this while act­ing with in­tegrity. One com­mon difficulty in this area is nav­i­gat­ing the in­cen­tives to con­nect one’s ar­gu­ments to is­sues that already get a lot of at­ten­tion (e.g. on­go­ing poli­ti­cal de­bates). My read is that this es­say makes these con­nec­tions even when they aren’t jus­tified; it im­plies that many short- and medium-term con­cerns are a nat­u­ral ex­ten­sion of cur­rent long-term thought, while failing to ac­cu­rately por­tray what I con­sider to be the core ar­gu­ments around long-term risks and benefits from AI. It seems like the effect of this es­say will be to re­duce per­ceived differ­ences be­tween long-term, mid-term and short-term work on risks from AI, to cause con­fu­sion about the ac­tual con­cerns of Bostrom et al., and to make fu­ture com­mu­ni­ca­tions work in this space harder and more po­larized.

Broader thoughts on CSER and CFI

I only had the time and space to cri­tique one spe­cific ar­ti­cle from CFI and CSER. How­ever, from talk­ing to oth­ers work­ing in the global catas­trophic risk space, and from en­gage­ment with sig­nifi­cant frac­tions of the rest of CSER and CFI’s work, I’ve come to think that the prob­lems I see in this ar­ti­cle are mostly rep­re­sen­ta­tive of the prob­lems I see in CSER’s and CFI’s broader strat­egy and work. I don’t think what I’ve writ­ten suffi­ciently jus­tifies that claim; how­ever, it seems use­ful to share this broader as­sess­ment to al­low oth­ers to make bet­ter pre­dic­tions about my fu­ture grant recom­men­da­tions, and maybe also to open a di­alogue that might cause me to change my mind.

Over­all, based on the con­cerns I’ve ex­pressed in this es­say, and that I’ve had with other parts of CFI and CSER’s work, I worry that their efforts to shape the con­ver­sa­tion around AI policy, and to mend dis­putes be­tween those fo­cused on long-term and short-term prob­lems, do not ad­dress im­por­tant un­der­ly­ing is­sues and may have net-nega­tive con­se­quences.

That said, it’s good that these or­ga­ni­za­tions give some re­searchers a way to get PhDs/​post­docs at Cam­bridge with rel­a­tively lit­tle in­sti­tu­tional over­sight and an op­por­tu­nity to ex­plore a large va­ri­ety of differ­ent top­ics (e.g. Jess, and Sha­har Avin, a pre­vi­ous grantee whose work I’m ex­cited about).

Ad­den­dum: Thoughts on in­cen­tives in tech­ni­cal fields in academia

I wrote the fol­low­ing in the course of writ­ing about the AI Safety Camp. This is a model I use com­monly when think­ing about fund­ing for AI al­ign­ment work, but it ended up not be­ing very rele­vant to that writeup, so I’m leav­ing it here as a note of in­ter­est.

My un­der­stand­ing of many parts of tech­ni­cal academia is that there is a strong in­cen­tive to make your writ­ing hard to un­der­stand while ap­pear­ing more im­pres­sive by us­ing a lot of math. Eliezer Yud­kowsky de­scribes his un­der­stand­ing of it as such (and ex­pands on this fur­ther in the rocket al­ign­ment prob­lem):

The point of cur­rent AI safety work is to cross, e.g., the gap be­tween [. . . ] say­ing “Ha ha, I want AIs to have an off switch, but it might be dan­ger­ous to be the one hold­ing the off switch!” to, e.g., re­al­iz­ing that util­ity in­differ­ence is an open prob­lem. After this, we cross the gap to solv­ing util­ity in­differ­ence in un­bounded form. Much later, we cross the gap to a form of util­ity in­differ­ence that ac­tu­ally works in prac­tice with what­ever ma­chine learn­ing tech­niques are used, come the day.
Progress in mod­ern AI safety mainly looks like progress in con­cep­tual clar­ity — get­ting past the stage of “Ha ha it might be dan­ger­ous to be hold­ing the off switch.” Even though Stu­art Arm­strong’s origi­nal pro­posal for util­ity in­differ­ence com­pletely failed to work (as ob­served at MIRI by my­self and Benya), it was still a lot of con­cep­tual progress com­pared to the “Ha ha that might be dan­ger­ous” stage of think­ing.
Sim­ple ideas like these would be where I ex­pect the bat­tle for the hearts of fu­ture grad stu­dents to take place; some­body with ex­po­sure to Arm­strong’s first sim­ple idea knows bet­ter than to walk di­rectly into the whirling ra­zor blades with­out hav­ing solved the cor­re­spond­ing prob­lem of fix­ing Arm­strong’s solu­tion. A lot of the ac­tual in­cre­ment of benefit to the world comes from get­ting more minds past the “walk di­rectly into the whirling ra­zor blades” stage of think­ing, which is not com­plex-math-de­pen­dent.
Later, there’s a need to have real de­ploy­able solu­tions, which may or may not look like im­pres­sive math per se. But ac­tual in­cre­ments of safety there may be a long time com­ing. [. . . ]
Any prob­lem whose cur­rent MIRI-solu­tion looks hard (the kind of proof pro­duced by peo­ple com­pet­ing in an in­ex­ploitable mar­ket to look im­pres­sive, who grav­i­tate to prob­lems where they can pro­duce proofs that look like costly sig­nals of in­tel­li­gence) is a place where we’re flailing around and grasp­ing at com­pli­cated re­sults in or­der to marginally im­prove our un­der­stand­ing of a con­fus­ing sub­ject mat­ter. Tech­niques you can ac­tu­ally adapt in a safe AI, come the day, will prob­a­bly have very sim­ple cores — the sort of core con­cept that takes up three para­graphs, where any re­viewer who didn’t spend five years strug­gling on the prob­lem them­selves will think, “Oh I could have thought of that.” Some­day there may be a book full of clever and difficult things to say about the sim­ple core — con­trast the sim­plic­ity of the core con­cept of causal mod­els, ver­sus the com­plex­ity of prov­ing all the clever things Judea Pearl had to say about causal mod­els. But the plane­tary benefit is mainly from pos­ing un­der­stand­able prob­lems crisply enough so that peo­ple can see they are open, and then from the sim­pler ab­stract prop­er­ties of a found solu­tion — com­pli­cated as­pects will not carry over to real AIs later.

And gives a con­crete ex­am­ple here:

The jour­nal pa­per that Stu­art Arm­strong coau­thored on “in­ter­rupt­ibil­ity” is a far step down from Arm­strong’s other work on cor­rigi­bil­ity. It had to be dumbed way down (I’m count­ing ob­scu­ra­tion with fancy equa­tions and math re­sults as “dumb­ing down”) to be pub­lished in a main­stream jour­nal. It had to be stripped of all the caveats and any men­tion of ex­plicit in­com­plete­ness, which is nec­es­sary meta-in­for­ma­tion for any on­go­ing in­cre­men­tal progress, not to men­tion im­por­tant from a safety stand­point. The root cause can be de­bated but the ob­serv­able seems plain. If you want to get real work done, the ob­vi­ous strat­egy would be to not sub­ject your­self to any aca­demic in­cen­tives or bu­reau­cratic pro­cesses. Par­tic­u­larly in­clud­ing peer re­view by non-”hob­by­ists” (peer com­men­tary by fel­low “hob­by­ists” still be­ing po­ten­tially very valuable), or re­view by grant com­mit­tees staffed by the sort of peo­ple who are still im­pressed by aca­demic sage-cos­tum­ing and will want you to com­pete against pointlessly ob­scured but ter­ribly se­ri­ous-look­ing equa­tions.

(Here is a pub­lic ex­am­ple of Stu­art’s work on util­ity in­differ­ence, though I had difficulty find­ing the most rele­vant ex­am­ples of his work on this sub­ject.)

Some ex­am­ples that seem to me to use an ap­pro­pri­ate level of for­mal­ism in­clude: the Embed­ded Agency se­quence, the Mesa-Op­ti­mi­sa­tion pa­per, some posts by Deep­Mind re­searchers (thoughts on hu­man mod­els, clas­sify­ing speci­fi­ca­tion prob­lems as var­i­ants of Good­hart’s law), and many other blog posts by these au­thors and oth­ers on the AI Align­ment Fo­rum.

There’s a sense in which it’s fine to play around with the few for­mal­isms you have a grasp of when you’re get­ting to grips with ideas in this field. For ex­am­ple, MIRI re­cently held a re­treat for new re­searchers, which led to a num­ber of blog posts that fol­lowed this pat­tern (1, 2, 3, 4). But aiming for lots of tech­ni­cal for­mal­ism is not helpful—any con­cep­tion of use­ful work that fo­cuses pri­mar­ily on mold­ing the idea to the for­mat rather than mold­ing the for­mat to the idea, es­pe­cially for (nom­i­nally) im­pres­sive tech­ni­cal for­mats, is likely op­ti­miz­ing for the wrong met­ric and fal­ling prey to Good­hart’s law.