Alignment Newsletter One Year Retrospective

Cross­posted from the Align­ment Fo­rum.

On April 9, 2018, the first Align­ment Newslet­ter was sent out to me and one test re­cip­i­ent. A year later, it has 889 sub­scribers and two ad­di­tional con­tent writ­ers, and is the thing for which I’m best known. In this post I look at the im­pact of the newslet­ter and try to figure out what, if any­thing, should be changed in the fu­ture.

(If you don’t know about the newslet­ter, you can learn about it and/​or sign up here.)

Summary

In which I bad­ger you to take the 3-minute sur­vey, and sum­ma­rize some key points.

Ac­tions I’d like you to take

  • If you have read at least one is­sue of the newslet­ter in the last two months, take the 3-minute sur­vey! If you’re go­ing to read this post any­way, I’d pre­fer you first read the post and then take the sur­vey; but it’s much bet­ter to take the sur­vey with­out read­ing this post than to not take it at all.

  • Book­mark or oth­er­wise make sure to know about the spread­sheet of pa­pers, which in­cludes ev­ery­thing sent in the newslet­ter, and a few other pa­pers as well.

  • Now that the newslet­ter is available in Man­darin (thanks Xiaohu!), I’d be ex­cited to see the newslet­ter spread to AI re­searchers in China.

  • Give me feed­back in the com­ments so that I can make the newslet­ter bet­ter! I’ve listed par­tic­u­lar top­ics that I want in­put on at the end of the post (be­fore the ap­pendix).

Every­thing else

  • The num­ber of sub­scribers dwarfs the num­ber of peo­ple work­ing in AI safety. I’m not sure who the other sub­scribers are, or what value they get from the newslet­ter.

  • The main benefits of the newslet­ter are: helping tech­ni­cal re­searchers keep up with the field, helping ju­nior re­searchers skill up with­out men­tor­ship, and rep­u­ta­tional effects. The first of these is both the most im­por­tant one, and the most un­cer­tain one.

  • I spent a coun­ter­fac­tual 300-400 hours on the newslet­ter over the last year.

  • Still, in ex­pec­ta­tion the newslet­ter seems well worth the time cost, but due to the high un­cer­tainty on the benefits to re­searchers, it’s plau­si­ble that the newslet­ter is not worth­while.

  • There are a bunch of ques­tions I’d like feed­back on. Most no­tably, I want to get a bet­ter model of how the newslet­ter adds value to tech­ni­cal safety re­searchers.

Newslet­ter updates

In which I tell you about fea­tures of the newslet­ter that you prob­a­bly didn’t know about.

Spreadsheet

Many of you prob­a­bly know me as the guy who sum­ma­rizes a bunch of pa­pers ev­ery week. I claim you should in­stead think of me as the guy who main­tains a gi­ant spread­sheet of al­ign­ment-re­lated pa­pers, and in­ci­den­tally also sends out a changelog of the spread­sheet ev­ery week. You could use the spread­sheet by read­ing the changelog ev­ery week, but you could also use it in other ways:

  • When­ever you want to do a liter­a­ture re­view, you find the rele­vant cat­e­gories in the spread­sheet and use the sum­maries to de­cide which of the pa­pers to read in full.

  • When you come across a new, in­ter­est­ing pa­per, you first Ctrl+F for it in the spread­sheet and read the sum­mary and opinion if they are pre­sent, be­fore de­cid­ing whether to read the pa­per in full. I ex­pect most sum­maries to be more use­ful for this pur­pose than read­ing the ab­stract; the longer sum­maries can be more use­ful than read­ing the ab­stract, in­tro­duc­tion and con­clu­sion. Per­haps you should do it right now, with (say) “Pro­saic AI al­ign­ment”, just to in­tu­itively get how triv­ial it is to do.

  • When you find an in­ter­est­ing idea or con­cept, search for re­lated words in the spread­sheet to find other writ­ing on the topic. (This is most use­ful for non-aca­demic ideas—for aca­demic ones, Google Scholar is the way to go.)

I find my­self us­ing the spread­sheet a cou­ple of times a week, of­ten to re­mind me of what I thought about a pa­per or post that I had read a long time ago, but also for liter­a­ture re­views and find­ing pa­pers that I vaguely re­mem­ber that are rele­vant to what I’m cur­rently think­ing about. Of course, I have a bet­ter grasp of the spread­sheet mak­ing search easy; the cat­e­gories make in­tu­itive sense to me; and I read far more than the typ­i­cal re­searcher, so I’d ex­pect it to sig­nifi­cantly more use­ful to me than to other peo­ple. (On the other hand, I don’t benefit from dis­cov­er­ing new ma­te­rial in the spread­sheet, since I’m usu­ally the one who put it there.)

Translation

Xiaohu Zhu has offered to trans­late the Align­ment Newslet­ter to Man­darin! His trans­la­tions can be found here; I also copy them over to the main Align­ment Newslet­ter page. I’d be ex­cited to see more Chi­nese AI re­searchers read­ing the newslet­ter con­tent.

Newslet­ter stats

In which I pre­sent raw data and ques­tions of un­cer­tainty. This might be use­ful to un­der­stand newslet­ters broadly, but I won’t be draw­ing any big con­clu­sions. The main take­away is that lots of peo­ple read the newslet­ter; in par­tic­u­lar, there are more sub­scribers than re­searchers in the field. Know­ing that, you can skip ahead to “Im­pact of the newslet­ter” and things should still make sense.

Growth

As of Fri­day April 5, ac­cord­ing to Mailchimp, there are 889 sub­scribers to the newslet­ter. Typ­i­cally, the open rate is just over 50%, and the click-through rate is 10-15%. My un­der­stand­ing is that this is very high rel­a­tive to other on­line mailing lists; but that could be be­cause of on­line shop­ping mailing lists, where you are in­cen­tivized to send lots of emails at the ex­pense of open and click-through rates. There are prob­a­bly also read­ers who read the newslet­ter on the Align­ment Fo­rum, LessWrong, or Twit­ter.

The newslet­ter typ­i­cally gets a steady trickle of 0-25 new sub­scribers each week, and some­times gets a large in­crease. Here are all of the weeks in which there were >25 new sub­scribers:

AN #1 → AN #2: 2 → 141 sub­scribers (+139), be­cause of the ini­tial an­nounce­ment.

AN #3 → AN #4: 148 → 238 sub­scribers (+90), prob­a­bly still be­cause of the ini­tial an­nounce­ment, though I don’t know why it grew so lit­tle be­tween #2 and #3.

AN #14 → AN #15: 328 → 405 sub­scribers (+77), don’t know why (though I think I did know at the time)

AN #16 → AN #17: 412 → 524 sub­scribers (+112), be­cause of Miles Brundage’s tweet on July 23 about his fa­vorite newslet­ters.

AN #17 → AN #18: 524 → 553 sub­scribers (+29), be­cause of this SSC post on July 30 and the LessWrong cu­ra­tion of AN #13 on Aug 1.

AN #18 → AN #19: 553 → 590 sub­scribers (+37), be­cause of resi­d­ual effects from the past two weeks.

AN #30 → AN #31: 653 → 689 sub­scribers (+36), be­cause of Rosie Camp­bell’s blog pos­ton Oct 29 about her fa­vorite newslet­ters.

Over time, the opens and clicks have gone down as a per­centage of sub­scribers, but have gone up in ab­solute num­bers. I would guess that the biggest effect is that the most in­ter­ested peo­ple sub­scribed early, and so as time goes on the marginal sub­scriber is less in­ter­ested and ends up bring­ing down the per­centages. Another effect would be that over time peo­ple get less in­ter­ested in the newslet­ter, and stop open­ing/​click­ing on it, but don’t un­sub­scribe. How­ever, over the last few months, rates have been fairly sta­ble, which sug­gests this effect is neg­ligible.

On the other hand, dur­ing the last few months growth has been or­ganic /​ word-of-mouth rather than through “pub­lic­ity” like Miles’s tweet and Rosie’s blog post, so it’s pos­si­ble that or­ganic growth leads to more in­ter­ested sub­scribers who bring up the rates, and this effect ap­prox­i­mately can­cels the de­crease in rates from peo­ple get­ting bored of the newslet­ter. I could test this with more fine-grained data about in­di­vi­d­ual sub­scribers but I don’t care enough.

So far, I have not been try­ing to pub­li­cize the newslet­ter be­yond the ini­tial an­nounce­ment. I’m still not sure of the value of a marginal reader ob­tained via “pub­lic­ity”. The newslet­ter seems to me to be both tech­ni­cal and in­sider-y (i.e. it as­sumes fa­mil­iar­ity with ba­sic AI safety ar­gu­ments), while the marginal reader from “pub­lic­ity” seems not very likely to be ei­ther. That said, I have heard from a few read­ers that the newslet­ter is rea­son­ably easy to fol­low, so maybe I’m putting too much weight on this con­cern. I’d love to hear thoughts in the com­ments.

Com­po­si­tion of subscribers

I don’t know who these 889 sub­scribers are; it’s much larger than the size of the field of AI safety. Even if most of the tech­ni­cal safety re­searchers and strat­egy/​policy re­searchers have sub­scribed, that would only get us to 100-200 sub­scribers. Some guesses on who the re­main­ing peo­ple are:

  • There are lots of peo­ple who are in­tel­lec­tu­ally in­ter­ested in AI safety but don’t work on it full time; maybe a lot of them have sub­scribed.

  • A lot of tech­ni­cal re­searchers are in­ter­ested in AI ethics, fair­ness, bias, ex­pla­na­tions and so on. I oc­ca­sion­ally cover these top­ics. In ad­di­tion, if you’re in­ter­ested in short-term effects of AI, you might be more likely to be in­ter­ested in the long-term effects as well. (Mostly I’m putting this down be­cause I’ve met a few peo­ple in this cat­e­gory who ex­pressed in­ter­est in the newslet­ter.)

  • Non-tech­ni­cal re­searchers in­ter­ested in the effects of AI might plau­si­bly find it use­ful to read the newslet­ter to get a sense of what AI is ca­pa­ble of and how tech­ni­cal re­searchers are think­ing about safety.

Re­gard­less of the an­swer, I’m sur­prised that these peo­ple find the newslet­ter valuable. Most of the time I’m writ­ing to tech­ni­cal safety re­searchers, and rely­ing on an as­sump­tion of shared jar­gon and un­der­ly­ing in­tu­itions that I don’t ex­plain. It’s not as bad as it could be, since I try to make my ex­pla­na­tions ac­cessible both to peo­ple work­ing in tra­di­tional AI as well as peo­ple at MIRI, but I would have guessed that it was still not easy to un­der­stand from the out­side. Some hy­pothe­ses, only the first of which seems plau­si­ble:

  • I’m wrong about how difficult it is to un­der­stand the newslet­ter. Per­haps peo­ple can un­der­stand ev­ery­thing, or maybe they can still get a use­ful gist from sum­maries even if they don’t un­der­stand ev­ery­thing.

  • Peo­ple use it only as a source of in­ter­est­ing pa­pers, and ig­nore the sum­maries and opinions (be­cause they are hard to un­der­stand).

  • Read­ing the sum­maries and opinions gives the illu­sion of un­der­stand­ing even though peo­ple don’t ac­tu­ally un­der­stand what I’m say­ing.

  • Peo­ple like to feel like a part of an elite group who can un­der­stand the tech­ni­cal jar­gon, and read­ing the newslet­ter gives them that feel­ing. (This would not be a con­scious de­ci­sion on their part.)

I sam­pled 25 peo­ple uniformly at ran­dom from the sub­scribers. Of these, I have met 8 of them, and have heard of 2 more. I would cat­e­go­rize the 25 peo­ple in the fol­low­ing rough cat­e­gories: x-risk com­mu­nity (4), AI re­searchers sym­pa­thetic to x-risk (2), stu­dents (3), peo­ple in­ter­ested in AI and x-risk (3), peo­ple in­volved with AI star­tups (2), re­searcher with no pub­li­cly ob­vi­ous in­ter­est in x-risk (6), and could not be found eas­ily (5). But re­ally the most salient out­come was that for any­one I didn’t already know, I found it very hard to figure out why they were sub­scribed to the newslet­ter.

Im­pact of the newsletter

In which I try and fail to figure out whether the benefits out­weigh the costs.

Benefits

Here are the main sources of value from the newslet­ter that I see:

  • Caus­ing tech­ni­cal re­searchers to know more about other ar­eas of the field be­sides their own sub­field.

  • Field build­ing, by giv­ing new en­trants into AI safety a way to build up their knowl­edge with­out re­quiring men­tor­ship.

  • Im­prov­ing the rep­u­ta­tion of the field of AI safety (es­pe­cially among the wider AI re­search com­mu­nity), by demon­strat­ing a level of dis­course above the norm, par­tic­u­larly in con­junc­tion with good writ­ing about cur­rent AI top­ics. There’s a mix­ture of rea­son­ing about cur­rent AI and spec­u­la­tive fu­ture pre­dic­tions that clearly demon­strates that I’m not some ran­dom out­sider cri­tiquing AI re­searchers.

  • Creat­ing a strong rep­u­ta­tion for my­self and CHAI, such that peo­ple will have jus­tified rea­son to listen to CHAI and/​or me in the fu­ture.

  • Pro­vid­ing some sort of value to the sub­scribers who are not in long-term AI safety or AI strat­egy/​policy.

When I started the newslet­ter, I was aiming pri­mar­ily for the first one, by tel­ling re­searchers what they should be read­ing. I con­tinue to op­ti­mize mainly for that, though now I of­ten try to provide enough in­for­ma­tion that re­searchers don’t have to read the origi­nal pa­per/​post. I knew about the sec­ond source of value, but didn’t think it would be very large; I’m now more un­cer­tain about how im­por­tant it is. The rep­u­ta­tional effects were more un­ex­pected, since I didn’t think the newslet­ter would be­come as large as it cur­rently is. I don’t know much about the last source of value and am ba­si­cally ig­nor­ing it (i.e. pre­tend­ing it is zero) in the rest of the anal­y­sis.

I’m ac­tu­ally quite un­cer­tain about how much value comes from each of these sub­points, mainly be­cause there’s a strik­ing lack of com­ments or feed­back on the newslet­ter. Ex­clud­ing one per­son at CHAI who I talk to fre­quently, I get a com­ment on the con­tent of the newslet­ter maybe once ev­ery 3-4 weeks. I can un­der­stand that peo­ple who get it as an email newslet­ter may not see an ob­vi­ous way to com­ment (re­ply­ing to a newslet­ter email is an un­usual thing to do), but the newslet­ter is cross­posted to LessWrong, the Align­ment Fo­rum, and Twit­ter. Why aren’t there com­ments there?

One pos­si­bil­ity is that peo­ple treat the newslet­ter as a cu­ra­tion of in­ter­est­ing pa­pers and posts, in which case there isn’t much need to com­ment. How­ever, I’m fairly con­fi­dent that many read­ers also find value in the sum­maries and opinions. You could in­stead in­ter­pret this as ev­i­dence that the things I’m say­ing are rea­son­able—af­ter all, if I was wrong on the In­ter­net, surely some­one would let me know. On the other hand, if I’m only say­ing things that peo­ple already be­lieve, am I ac­tu­ally ac­com­plish­ing any­thing? It’s hard to say.

I think the most likely story is that I say things that peo­ple didn’t know but agree with once I say them—but I share Rae­mon’s in­tu­ition that peo­ple aren’t re­ally learn­ing much if that’s the case. (The rest of that post has many more thoughts on com­ments that ap­ply to the newslet­ter.)

Over­all it still feels like in ex­pec­ta­tion most of the value comes from widen­ing the set of fields that any in­di­vi­d­ual tech­ni­cal re­searcher is fol­low­ing, but it seems en­tirely pos­si­ble that the newslet­ter does not do that at all and as a re­sult only has rep­u­ta­tional benefits. (I am fairly con­fi­dent that the rep­u­ta­tional benefits are pos­i­tive and non-zero.) I’d re­ally like to get more clar­ity on this, so if you read the newslet­ter, please take the sur­vey!

Costs

The main cost of the newslet­ter is the op­por­tu­nity cost of our time. Each newslet­ter takes about 15 hours of my time. The newslet­ter has got­ten more de­tailed over time, but this isn’t re­flected in the to­tal hours I put in be­cause it has been ap­prox­i­mately offset by new con­tent writ­ers (Richard Ngo and Dan Hendrycks) who took some of the bur­den of sum­ma­riz­ing off of me. Cur­rently I’d es­ti­mate that the newslet­ter takes 15-20 hours in to­tal (with 2-5 hours from Richard and Dan). This can be bro­ken down into time I would have spent read­ing and sum­ma­riz­ing pa­pers any­way, and time that I spent only be­cause the newslet­ter ex­ists, which we could call “ex­tra hours”. Ini­tially, I wanted to read and sum­ma­rize a lot of pa­pers for my own benefit, so the newslet­ter took about 4-5 ex­tra hours per week. Now, I’m less in­clined to read a ton of pa­pers, and it take 8-10 ex­tra hours per week.

This means in ag­gre­gate I’ve spent 700-800 hours on the newslet­ter, of which about 300-400 were hours that I wouldn’t have spent oth­er­wise. Even only count­ing the 300-400 hours, this is com­pa­rable to the time I spent on state of the world and learn­ing bi­as­espro­jects to­gether, in­clud­ing all of the time spent on pa­per writ­ing, blog posts, and talks in ad­di­tion to the re­search it­self.

In ad­di­tion to time costs, the newslet­ter could do harm. While there are many ways this could hap­pen, the only one that feels suffi­ciently im­por­tant to con­sider is the risk of caus­ing in­for­ma­tion cas­cades. Since nearly ev­ery­one in the field is read­ing the newslet­ter, we may all end up with some be­lief B just be­cause it was in a newslet­ter. We might then have way too much con­fi­dence in B since ev­ery­one else also be­lieves B.

Over­all I’m not too wor­ried. There’s so much con­tent in the newslet­ter that I se­ri­ously doubt a sin­gle idea could spread widely as a re­sult of the newslet­ter—in­evitably some peo­ple won’t re­mem­ber that par­tic­u­lar idea. So we only need to worry about “big” ideas that are re­peated of­ten in the newslet­ter. The most salient ex­am­ple of that would be my gen­eral op­po­si­tion to the Bostrom/​Yud­kowsky paradigm of AI safety, but it still seems quite preva­lent amongst re­searchers. In ad­di­tion I’d be re­ally sur­prised if ex­ist­ing re­searchers were con­vinced of a “big” idea or paradigm solely be­cause other re­searchers be­lieved it (though they might put un­due weight on it).

Is the newslet­ter worth it?

If the only benefit of the newslet­ter were the rep­u­ta­tional effects, it would not be worth my time (even ig­nor­ing Richard and Dan’s time). How­ever, I get enough thanks from peo­ple in the field that the newslet­ter must be pro­vid­ing value to them, even though I don’t have a great model of what the value is. My cur­rent best guess is that there is a lot of value, which makes the newslet­ter worth the cost, but I think there is a non-neg­ligible chance that this would be re­versed if I had a good model of what value ev­ery­one was get­ting from it.

Go­ing forward

In which I figure out what about the newslet­ter should change in the fu­ture.

Struc­ture of the newsletter

So far I’ve only talked about whether the newslet­ter is worth­while as a whole. But of course we can also an­a­lyze in­di­vi­d­ual as­pects of the newslet­ter and figure out how im­por­tant they are.

Opinions are prob­a­bly the key fea­ture of the newslet­ter. Many pa­pers and blog posts are aimed more at ap­pear­ing im­pres­sive rather than con­vey­ing facts. Even the ones that are truth seek­ing are sub­ject to pub­li­ca­tion bias: they are writ­ten by peo­ple who think that the ideas within are im­por­tant, and so will be bi­ased to­wards pos­i­tivity. As a re­sult, an opinion from a re­searcher who didn’t do the work can help con­tex­tu­al­ize the re­sults that makes it eas­ier for less in­volved read­ers to figure out the im­por­tance of the ideas. (As a corol­lary, I worry about the lack of a fresh per­spec­tive on posts that I write, but don’t see an ob­vi­ous easy solu­tion to that prob­lem.) I think this also con­tributes to the suc­cess of Im­port AI and ChinAI, which are also quite heavy on opinions.

I think the sum­maries are also quite im­por­tant. I aim for the longer sum­maries to be suffi­ciently in­for­ma­tive that you don’t have to read the blog post /​ pa­per un­less you want to do a deep dive and re­ally un­der­stand the re­sults. For pa­pers, I of­ten roughly aim for it to be more use­ful to read my sum­mary than to read the ab­stract, in­tro, and con­clu­sion of the pa­per. In the world where the newslet­ter didn’t have sum­maries, I think re­searchers would not keep up as much with the state of the field.

Over­all, I think I’m pretty happy with the cur­rent struc­ture of the newslet­ter, and don’t cur­rently in­tend to change it. But if I get more clar­ity on what value the newslet­ter pro­vides to re­searchers, I wouldn’t be sur­prised if I would change the struc­ture as a re­sult.

Scal­ing up

In the year that I’ve been writ­ing the newslet­ter, the amount of writ­ing that I want to cover has gone up quite a lot, es­pe­cially with the launch of the Align­ment Fo­rum. I ex­pect this will con­tinue, and I won’t be able to keep up.

By de­fault, I would cover less and less of it. How­ever, it would be nice for the spread­sheet to be a some­what com­pre­hen­sive database of the AI safety liter­a­ture. This is not what we cur­rently have, be­cause I of­ten don’t cover good Agent Foun­da­tions work be­cause it’s hard for me to un­der­stand and I don’t have pre-2018 con­tent, but it is pretty good for the sub­fields of AI safety that I’m most knowl­edge­able about.

There has been some out­sourc­ing of work as Richard Ngo and Dan Hendrycks have joined, but it still does not seem sus­tain­able to con­tinue this long-term, due to co­or­di­na­tion challenges and challenges with main­tain­ing qual­ity. That said, it’s not im­pos­si­ble that this could work:

  • Per­haps I could pay peo­ple to do this sum­ma­riza­tion, with the hope that this would help me find peo­ple who could put in more time. This would al­low more work to get done while keep­ing the team small (which keeps co­or­di­na­tion costs and qual­ity main­te­nance costs small).

  • I could cre­ate a sys­tem that al­lows ran­dom peo­ple to eas­ily con­tribute sum­maries of pa­pers and posts they have read, while writ­ing the opinions my­self. It may be eas­ier to vet and fix sum­maries than to write them my­self.

  • I could in­vest in de­vel­op­ing good guides for new sum­ma­riz­ers, in or­der to de­crease the cost of on­board­ing and on­go­ing co­or­di­na­tion.

That said, in all of these cases, it feels bet­ter to in­stead just sum­ma­rize a smaller frac­tion of all the work, es­pe­cially since the newslet­ter is already long enough that peo­ple prob­a­bly don’t read all of it, while still adding links to pa­pers that I haven’t read to the spread­sheet. The main value of sum­ma­riz­ing ev­ery­thing is hav­ing a more com­pre­hen­sive spread­sheet, but I don’t think this is suffi­ciently valuable to war­rant the ap­proaches above. That said, I could imag­ine that this con­clu­sion be­ing over­turned by hav­ing a bet­ter model of how the newslet­ter adds value for tech­ni­cal safety re­searchers.

Sourcing

So far, I have found pa­pers and ar­ti­cles from newslet­ters, blogs, Arxiv San­ity and Twit­ter. How­ever, Twit­ter has be­come worse over time, pos­si­bly be­cause it has learned to show me non-aca­demic stuff that is more at­ten­tion-grab­bing or con­tro­ver­sial, de­spite me try­ing not to click on those sorts of things. Arxiv San­ity was my main source for aca­demic work, but re­cently it’s been get­ting worse, and is ba­si­cally not work­ing any more, and I’m not sure why. So I’m now try­ing to figure out a new way to find rele­vant liter­a­ture—does any­one have sug­ges­tions?

If I con­tinue to have trou­ble, I might sum­ma­rize ran­dom aca­demic pa­pers I’m in­ter­ested in in­stead of the ones that have come out very re­cently.

Appearance

It’s rather an­noy­ing that the newslet­ter is a gi­ant wall of text; it’s prob­a­bly not fun to read as a re­sult. In ad­di­tion to the cat­e­gories, which were partly meant to give struc­ture to the wall of text, I’ve been try­ing to break things into more para­graphs, but re­ally it needs some­thing much more dras­tic. How­ever, I also don’t want it to be even more work to get a newslet­ter out.

So, if any­one wants to vol­un­teer to make the newslet­ter vi­su­ally nicer that would be ap­pre­ci­ated, but it shouldn’t cost me too much more time (maybe half an hour a week, if it was sig­nifi­cantly nicer). One easy pos­si­bil­ity would be to in­clude an image at the be­gin­ning of the newslet­ter—any sug­ges­tions for what should go there?

Fu­ture of the newsletter

Given the un­cer­tainty of the value of the newslet­ter, it’s not in­con­ceiv­able that I de­cide to stop writ­ing it in the fu­ture, or scale back sig­nifi­cantly. That said, I think there is value in sta­bil­ity. It is gen­er­ally bad for a pro­ject to have “fits and starts” where its qual­ity varies with the mo­ti­va­tion of the per­son run­ning them, or for the pro­ject to po­ten­tially be can­cel­led solely based on how valuable the cre­ator thinks it is. (I’m aware I haven’t ar­gued for this; feel free to ask me about it if it seems wrong.)

Due to this and re­lated rea­sons, when I started the newslet­ter, I had an in­ter­nal com­mit­ment to con­tinue writ­ing it for at least six months, as long as most other peo­ple thought it was still valuable. Ob­vi­ously, if ev­ery­one agreed that the newslet­ter was not use­ful or ac­tively harm­ful, then I’d stop writ­ing it: this is more to deal with the case where I no longer think the newslet­ter is use­ful, even though other peo­ple think it is use­ful.

Now I’m treat­ing it as an on­go­ing three-month com­mit­ment: that is, I am always com­mit­ting to con­tinue writ­ing the newslet­ter for at least three months as long as most other peo­ple think it is valuable. At any point I can de­cide to stop the on­go­ing com­mit­ment (pre­sum­ably when I think it is no longer worth my time to write it); there would then be three months where I would con­tinue to write the newslet­ter for sta­bil­ity, and figure out what would hap­pen with the newslet­ter af­ter the three months.

Feed­back I’d like

There are a bunch of ques­tions I have, that I’d love to get opinions on ei­ther anony­mously in the 3-minute sur­vey (which you should fill out!) or in the com­ments. (Com­ments preferred be­cause then other peo­ple can build off of them.) I’ve listed the ques­tions roughly in or­der of im­por­tance:

  • What is the value of the newslet­ter for you?

  • What is the value of the newslet­ter for other peo­ple?

  • How should I deal with the grow­ing amount of AI safety re­search?

  • What can I do to get more feed­back on the newslet­ter on an on­go­ing ba­sis (rather than hav­ing to sur­vey peo­ple at fixed times)?

  • Am I un­der­es­ti­mat­ing the risk of caus­ing in­for­ma­tion cas­cades? Re­gard­less, how can I miti­gate this risk?

  • How can I make the newslet­ter more vi­su­ally ap­peal­ing /​ less of a wall of text, with­out ex­pend­ing too much weekly effort?

  • Should I pub­li­cize the newslet­ter on Twit­ter? How valuable is the marginal reader?

  • Should I pub­li­cize the newslet­ter to AI re­searchers? How valuable is the marginal reader?

  • How can I find good pa­pers out of academia now that Arxiv San­ity isn’t work­ing as well as it used to?

Ap­pendix: Align­ment Newslet­ter FAQ

All of these are in the ap­pendix be­cause I don’t par­tic­u­larly care if peo­ple read it or not. It’s not very rele­vant to any of the con­tent in the main post. It is rele­vant to any­one who might want to start their own newslet­ter, or their own pro­ject more gen­er­ally.

What’s the his­tory of the Align­ment Newslet­ter?

Dur­ing one of the CHAI sem­i­nars, some­one sug­gested that we each take turns find­ing and col­lect­ing new re­search pa­pers and send­ing them out to each other. I already had a sys­tem in place do­ing ex­actly this, so I vol­un­teered to do this my­self (rather than tak­ing turns). I also figured that to save even more CHAI-re­searcher-time, it would make sense to give a quick sum­mary and then tell peo­ple un­der what cir­cum­stances they should read the pa­per. (I was already sum­ma­riz­ing pa­pers for my own notes.)

This pretty quickly proved to be valuable, and I thought about mak­ing it pub­lic for even more time sav­ings. How­ever, it still seemed pretty nascent and in flux, so I con­tinued iter­at­ing on it within CHAI, while think­ing about how it could be made to be pub­lic-fac­ing. (See also the “Things done right” sec­tion.) After a lit­tle un­der two months of writ­ing the newslet­ter within CHAI, I made it pub­lic. At that time, the goal was to provide a list of rele­vant read­ings for tech­ni­cal AI safety re­searchers that had been pub­lished each week; and help them de­cide whether or not they should read them.

Over time, my sum­maries and opinions be­came longer and more de­tailed. I don’t know ex­actly why this hap­pened. Re­gard­less, at some point I started aiming for some of my sum­maries to be de­tailed enough that re­searchers could just read the sum­mary and not read the pa­per/​post it­self.

In Septem­ber, Richard Ngo vol­un­teered to con­tribute sum­maries to the newslet­ter on a va­ri­ety of top­ics, and Dan Hendrycks joined soon af­ter fo­cus­ing on ro­bust­ness and un­cer­tainty.

Why do you never have strong nega­tive opinions?

One of the de­sign de­ci­sions made at the be­gin­ning of the newslet­ter was to avoid strong cri­tiques of any par­tic­u­lar piece of re­search. This was for a few rea­sons:

  • As a gen­eral rule, any crit­i­cism I have of a pa­per is of­ten too strong or based on a mi­s­un­der­stand­ing. If I have a nega­tive im­pres­sion of a pa­per or re­search agenda, I would pre­dict that with ~90% prob­a­bil­ity af­ter I talk to the au­thor(s) my opinion of the work will have im­proved. I don’t think this is par­tic­u­lar to me—this should be ex­pected of any sum­ma­rizer since the au­thors have much more in­tu­ition about why their par­tic­u­lar ap­proach will be use­ful, be­yond what is writ­ten in the blog post or pa­per.

  • The newslet­ter prob­a­bly shapes the views of a sig­nifi­cant frac­tion of peo­ple think­ing about AI safety, and so leads to a risk of in­for­ma­tion cas­cades. Miti­gat­ing this means giv­ing space to views that I dis­agree with, sum­ma­riz­ing them as best I can, and not at­tack­ing what will in­evitably be a straw­man of their view.

  • Re­gard­less of the ac­cu­racy of the crit­i­cism, I would like to avoid alienat­ing peo­ple.

Of course, this de­ci­sion has down­sides as well:

  • Since I’m not ac­cu­rately say­ing ev­ery­thing I be­lieve, it be­comes more likely that I ac­ci­den­tally say false things, con­vey wrong im­pres­sions, or oth­er­wise make it harder to get to the truth.

  • Disagree­ments are one of the main ways in which in­tel­lec­tual progress is made. They help iden­tify points of con­fu­sion, and al­low peo­ple to merge their mod­els in or­der to get some­thing (hope­fully) bet­ter.

While the first down­side seems like a real cost, the sec­ond down­side is about in­hibit­ing in­tel­lec­tual progress in AI safety re­search. I think this is okay: in­tel­lec­tual progress does not need to hap­pen in the newslet­ter. In most of these cases I ex­press stronger dis­agree­ments in chan­nels more con­ducive to in­tel­lec­tual progress (e.g. the Align­ment Fo­rum, emails/​mes­sages, talk­ing in per­son, the ver­sion of the newslet­ter in­ter­nal to CHAI).

Another prob­a­ble effect of avoid­ing nega­tivity is re­duced read­er­ship, since it is likely much more in­ter­est­ing to read a newslet­ter with ac­tive dis­agree­ments and ar­gu­ments than one that dryly sum­ma­rizes a re­search pa­per. I don’t yet know whether this is a pro or a con (even ig­nor­ing other effects of nega­tivity).

Mistakes

I don’t know of very many mis­takes, even in hind­sight. I think this is pri­mar­ily be­cause I don’t get feed­back on the newslet­ter, not be­cause ev­ery­thing has gone perfectly. It seems quite likely that there are still things that are mis­takes; but I don’t know it yet be­cause I don’t have the data to tell.

An­a­lyz­ing other newslet­ters. The one thing that I wish I had done was to an­a­lyze other newslet­ters like Im­port AI in more de­tail be­fore start­ing this one. I think it’s plau­si­ble that I could have re­al­ized the value of opinions and more de­tailed sum­maries right at the be­gin­ning, rather than evolv­ing in that di­rec­tion over a cou­ple of months.

De­lays. I did fall over a week be­hind on the newslet­ter over the last month or two. While this is bad, I wouldn’t re­ally call it a Mis­take: I don’t think of the newslet­ter as a weekly com­mit­ment or obli­ga­tion. I very much value the flex­i­bil­ity to al­lo­cate time to what­ever seems most press­ing; if the newslet­ter was more of a com­mit­ment (such that fal­ling be­hind is a Mis­take), I think I would have to be much more care­ful about what I agree to do, and this would pre­vent me from do­ing other im­por­tant things. In­stead, my ap­proach is to have the newslet­ter as a fairly im­por­tant goal that I try to sched­ule enough time for, but if I find my­self run­ning out of time and have to cut some­thing, it’s not a tragedy if it means the newslet­ter is de­layed. That’s es­sen­tially what hap­pened over the last month or two.

Things done right

I spent a de­cent amount of time think­ing about the de­sign of the newslet­ter be­fore im­ple­ment­ing it, and I think this was in hind­sight a very good idea. Here I list a few things that worked out well.

A pol­ished product. I was par­tic­u­larly con­scious of the fact that at launch the newslet­ter would be us­ing up the limited com­mon re­source of “peo­ple’s will­ing­ness to try out new things”. Both in or­der to make sure peo­ple stuck with the pro­ject, and in or­der to not use up the com­mon re­source un­nec­es­sar­ily, I wanted to be fairly con­fi­dent that this would be a good product be­fore launch­ing. As a re­sult, I iter­ated for a lit­tle un­der two months within CHAI, in or­der to figure out product-mar­ket fit. You can see the evolu­tion over time—this is the first in­ter­nal newslet­ter, whereas this is the first pub­lic newslet­ter. (They’re all available here.)

  • By the fourth in­ter­nal newslet­ter, I re­al­ized that I couldn’t ac­tu­ally sum­ma­rize all the links I found, so I switched to a ver­sion where some links would be sent with­out sum­maries.

  • Cat­e­go­riza­tion seemed im­por­tant, so I did more of it.

This is not to say that the newslet­ter has been static since launch; it has changed sig­nifi­cantly. Most no­tably, while origi­nally I was aiming to give peo­ple enough in­for­ma­tion to de­cide whether or not to read the pa­per/​post, I now some­times aim for in­clud­ing enough de­tail that peo­ple don’t need to read the pa­per/​post. But the point is that a lot of the early im­prove­ments hap­pened within CHAI with­out con­sum­ing the com­mon re­source.

I’m not sure to what ex­tent this is differ­ent from stan­dard startup ad­vice of iter­at­ing quickly and test­ing product-mar­ket fit: it de­pends on whether it counts as test­ing for product-mar­ket fit to trial the newslet­ter within CHAI. To the ex­tent that there is a differ­ence, it’s mainly that I’m ar­gu­ing for more plan­ning, es­pe­cially be­fore con­sum­ing com­mon re­sources (whereas with star­tups, the fierce com­pe­ti­tion means that you do not worry about con­sum­ing com­mon re­sources).

Con­sid­ered sta­bil­ity and com­mit­ment. As I men­tioned above, I had an in­ter­nal com­mit­ment to con­tinue writ­ing the newslet­ter for at least six months, as long as other peo­ple thought it was valuable. In ad­di­tion to the value of sta­bil­ity, I viewed this as part of co­op­er­a­tively us­ing the com­mon re­source of peo­ple’s will­ing­ness to try things. If you’re go­ing to use the re­source and fail, ideally you would have learned that it is ac­tu­ally in­fea­si­ble to suc­ceed in that do­main, as op­posed to e.g. lack of mo­ti­va­tion on the au­thor’s part.

Here’s an­other way to see this. I think it would have been a lot harder for the newslet­ter to be suc­cess­ful if there had been 2-5 at­tempts to cre­ate a newslet­ter in the past that had then fiz­zled out, be­cause peo­ple would ex­pect newslet­ters to fail and wouldn’t sub­scribe. My ini­tial com­mit­ment helps pre­vent me from be­ing one of those failures for “bad” rea­sons (e.g. me los­ing mo­ti­va­tion) while still al­low­ing me to fail for “good” rea­sons (e.g. no one ac­tu­ally wants to read a newslet­ter about AI al­ign­ment).

I can’t point to any ac­tu­ally good out­comes that re­sulted from this policy; nonethe­less I think it was a good thing to have done.

In­vest­ing in flex­ible au­to­mated sys­tems. I had cre­ated the pri­vate ver­sion of the spread­sheet be­fore the first pub­lic newslet­ter, in or­der to have a database of read­ings for my­self (re­plac­ing my pre­vi­ous Google Doc database), and I wrote a script to gen­er­ate the email from this database. While lots of ink has been spilled on the value of au­toma­tion, it doesn’t usu­ally em­pha­size flex­i­bil­ity. By not us­ing a tech­nol­ogy meant for one spe­cific pur­pose, I was able to do a few things that I wouldn’t ex­pect to be able to do with a more spe­cial­ized ver­sion:

  • Create con­sis­tency checks. For ex­am­ple, throw­ing an er­ror when there’s an opinion but no sum­mary, or when the name of the sum­ma­rizer is not “Richard”, “Dan H” or “” (in­di­cat­ing me).

  • Creat­ing a pri­vate and pub­lic ver­sion of the newslet­ter. (Any strong cri­tiques go into the pri­vate ver­sion, which is in­ter­nal to CHAI, and are re­moved from the pub­lic ver­sion.)

But re­ally, the key value of flex­i­bil­ity is that it al­lows you to adapt to cir­cum­stances that you had never even con­sid­ered when cre­at­ing the sys­tem:

  • When Richard Ngo joined, I added a “Sum­ma­rizer” column to the sheet, changed a few lines of code, and was done. (Note how I needed flex­i­bil­ity over both the data for­mat and the anal­y­sis code.)

  • I’ve found my­self link­ing to a bunch of pre­vi­ous newslet­ter en­tries and hav­ing to copy a lot of links. Re­cently I added a new tag that I can use in sum­maries and opinions that au­to­mat­i­cally ex­tracts and links the en­try I’m refer­ring to. (I’m a bit em­bar­rassed at how long it took me to re­al­ize that this was a thing I could do; I could have saved a lot more te­dious work if I had re­al­ized it was a pos­si­bil­ity the first time I got an­noyed at this pro­cess.)

Thought about po­ten­tial nega­tive effects. I’m pretty sure I thought of most of the points about nega­tivity (listed above) be­fore pub­li­ciz­ing the newslet­ter. This is dis­cussed a lot; I don’t think I have any­thing sig­nifi­cant to add.

This sec­tion seems to in­di­cate that I thought of things ini­tially and they were all im­por­tant—this is al­most cer­tainly not the case. I’m sure I’m ra­tio­nal­iz­ing some of these with hind­sight and didn’t ac­tu­ally think of all the benefits then, and I also prob­a­bly thought of other con­sid­er­a­tions that didn’t end up be­ing im­por­tant that I’ve now for­got­ten.