Alignment Newsletter One Year Retrospective

Rohin Shah10 Apr 2019 7:00 UTC

62 points

Organization updates Alignment Newsletter AI alignment Postmortems & retrospectives AI safety

Crossposted from the Alignment Forum.

On April 9, 2018, the first Alignment Newsletter was sent out to me and one test recipient. A year later, it has 889 subscribers and two additional content writers, and is the thing for which I’m best known. In this post I look at the impact of the newsletter and try to figure out what, if anything, should be changed in the future.

(If you don’t know about the newsletter, you can learn about it and/or sign up here.)

Summary

In which I badger you to take the 3-minute survey, and summarize some key points.

Actions I’d like you to take

If you have read at least one issue of the newsletter in the last two months, take the 3-minute survey! If you’re going to read this post anyway, I’d prefer you first read the post and then take the survey; but it’s much better to take the survey without reading this post than to not take it at all.
Bookmark or otherwise make sure to know about the spreadsheet of papers, which includes everything sent in the newsletter, and a few other papers as well.
Now that the newsletter is available in Mandarin (thanks Xiaohu!), I’d be excited to see the newsletter spread to AI researchers in China.
Give me feedback in the comments so that I can make the newsletter better! I’ve listed particular topics that I want input on at the end of the post (before the appendix).

Everything else

The number of subscribers dwarfs the number of people working in AI safety. I’m not sure who the other subscribers are, or what value they get from the newsletter.
The main benefits of the newsletter are: helping technical researchers keep up with the field, helping junior researchers skill up without mentorship, and reputational effects. The first of these is both the most important one, and the most uncertain one.
I spent a counterfactual 300-400 hours on the newsletter over the last year.
Still, in expectation the newsletter seems well worth the time cost, but due to the high uncertainty on the benefits to researchers, it’s plausible that the newsletter is not worthwhile.
There are a bunch of questions I’d like feedback on. Most notably, I want to get a better model of how the newsletter adds value to technical safety researchers.

Newsletter updates

In which I tell you about features of the newsletter that you probably didn’t know about.

Spreadsheet

Many of you probably know me as the guy who summarizes a bunch of papers every week. I claim you should instead think of me as the guy who maintains a giant spreadsheet of alignment-related papers, and incidentally also sends out a changelog of the spreadsheet every week. You could use the spreadsheet by reading the changelog every week, but you could also use it in other ways:

Whenever you want to do a literature review, you find the relevant categories in the spreadsheet and use the summaries to decide which of the papers to read in full.
When you come across a new, interesting paper, you first Ctrl+F for it in the spreadsheet and read the summary and opinion if they are present, before deciding whether to read the paper in full. I expect most summaries to be more useful for this purpose than reading the abstract; the longer summaries can be more useful than reading the abstract, introduction and conclusion. Perhaps you should do it right now, with (say) “Prosaic AI alignment”, just to intuitively get how trivial it is to do.
When you find an interesting idea or concept, search for related words in the spreadsheet to find other writing on the topic. (This is most useful for non-academic ideas—for academic ones, Google Scholar is the way to go.)

I find myself using the spreadsheet a couple of times a week, often to remind me of what I thought about a paper or post that I had read a long time ago, but also for literature reviews and finding papers that I vaguely remember that are relevant to what I’m currently thinking about. Of course, I have a better grasp of the spreadsheet making search easy; the categories make intuitive sense to me; and I read far more than the typical researcher, so I’d expect it to significantly more useful to me than to other people. (On the other hand, I don’t benefit from discovering new material in the spreadsheet, since I’m usually the one who put it there.)

Translation

Xiaohu Zhu has offered to translate the Alignment Newsletter to Mandarin! His translations can be found here; I also copy them over to the main Alignment Newsletter page. I’d be excited to see more Chinese AI researchers reading the newsletter content.

Newsletter stats

In which I present raw data and questions of uncertainty. This might be useful to understand newsletters broadly, but I won’t be drawing any big conclusions. The main takeaway is that lots of people read the newsletter; in particular, there are more subscribers than researchers in the field. Knowing that, you can skip ahead to “Impact of the newsletter” and things should still make sense.

Growth

As of Friday April 5, according to Mailchimp, there are 889 subscribers to the newsletter. Typically, the open rate is just over 50%, and the click-through rate is 10-15%. My understanding is that this is very high relative to other online mailing lists; but that could be because of online shopping mailing lists, where you are incentivized to send lots of emails at the expense of open and click-through rates. There are probably also readers who read the newsletter on the Alignment Forum, LessWrong, or Twitter.

The newsletter typically gets a steady trickle of 0-25 new subscribers each week, and sometimes gets a large increase. Here are all of the weeks in which there were >25 new subscribers:

AN #1 → AN #2: 2 → 141 subscribers (+139), because of the initial announcement.

AN #3 → AN #4: 148 → 238 subscribers (+90), probably still because of the initial announcement, though I don’t know why it grew so little between #2 and #3.

AN #14 → AN #15: 328 → 405 subscribers (+77), don’t know why (though I think I did know at the time)

AN #16 → AN #17: 412 → 524 subscribers (+112), because of Miles Brundage’s tweet on July 23 about his favorite newsletters.

AN #17 → AN #18: 524 → 553 subscribers (+29), because of this SSC post on July 30 and the LessWrong curation of AN #13 on Aug 1.

AN #18 → AN #19: 553 → 590 subscribers (+37), because of residual effects from the past two weeks.

AN #30 → AN #31: 653 → 689 subscribers (+36), because of Rosie Campbell’s blog poston Oct 29 about her favorite newsletters.

Over time, the opens and clicks have gone down as a percentage of subscribers, but have gone up in absolute numbers. I would guess that the biggest effect is that the most interested people subscribed early, and so as time goes on the marginal subscriber is less interested and ends up bringing down the percentages. Another effect would be that over time people get less interested in the newsletter, and stop opening/clicking on it, but don’t unsubscribe. However, over the last few months, rates have been fairly stable, which suggests this effect is negligible.

On the other hand, during the last few months growth has been organic / word-of-mouth rather than through “publicity” like Miles’s tweet and Rosie’s blog post, so it’s possible that organic growth leads to more interested subscribers who bring up the rates, and this effect approximately cancels the decrease in rates from people getting bored of the newsletter. I could test this with more fine-grained data about individual subscribers but I don’t care enough.

So far, I have not been trying to publicize the newsletter beyond the initial announcement. I’m still not sure of the value of a marginal reader obtained via “publicity”. The newsletter seems to me to be both technical and insider-y (i.e. it assumes familiarity with basic AI safety arguments), while the marginal reader from “publicity” seems not very likely to be either. That said, I have heard from a few readers that the newsletter is reasonably easy to follow, so maybe I’m putting too much weight on this concern. I’d love to hear thoughts in the comments.

Composition of subscribers

I don’t know who these 889 subscribers are; it’s much larger than the size of the field of AI safety. Even if most of the technical safety researchers and strategy/policy researchers have subscribed, that would only get us to 100-200 subscribers. Some guesses on who the remaining people are:

There are lots of people who are intellectually interested in AI safety but don’t work on it full time; maybe a lot of them have subscribed.
A lot of technical researchers are interested in AI ethics, fairness, bias, explanations and so on. I occasionally cover these topics. In addition, if you’re interested in short-term effects of AI, you might be more likely to be interested in the long-term effects as well. (Mostly I’m putting this down because I’ve met a few people in this category who expressed interest in the newsletter.)
Non-technical researchers interested in the effects of AI might plausibly find it useful to read the newsletter to get a sense of what AI is capable of and how technical researchers are thinking about safety.

Regardless of the answer, I’m surprised that these people find the newsletter valuable. Most of the time I’m writing to technical safety researchers, and relying on an assumption of shared jargon and underlying intuitions that I don’t explain. It’s not as bad as it could be, since I try to make my explanations accessible both to people working in traditional AI as well as people at MIRI, but I would have guessed that it was still not easy to understand from the outside. Some hypotheses, only the first of which seems plausible:

I’m wrong about how difficult it is to understand the newsletter. Perhaps people can understand everything, or maybe they can still get a useful gist from summaries even if they don’t understand everything.
People use it only as a source of interesting papers, and ignore the summaries and opinions (because they are hard to understand).
Reading the summaries and opinions gives the illusion of understanding even though people don’t actually understand what I’m saying.
People like to feel like a part of an elite group who can understand the technical jargon, and reading the newsletter gives them that feeling. (This would not be a conscious decision on their part.)

I sampled 25 people uniformly at random from the subscribers. Of these, I have met 8 of them, and have heard of 2 more. I would categorize the 25 people in the following rough categories: x-risk community (4), AI researchers sympathetic to x-risk (2), students (3), people interested in AI and x-risk (3), people involved with AI startups (2), researcher with no publicly obvious interest in x-risk (6), and could not be found easily (5). But really the most salient outcome was that for anyone I didn’t already know, I found it very hard to figure out why they were subscribed to the newsletter.

Impact of the newsletter

In which I try and fail to figure out whether the benefits outweigh the costs.

Benefits

Here are the main sources of value from the newsletter that I see:

Causing technical researchers to know more about other areas of the field besides their own subfield.
Field building, by giving new entrants into AI safety a way to build up their knowledge without requiring mentorship.
Improving the reputation of the field of AI safety (especially among the wider AI research community), by demonstrating a level of discourse above the norm, particularly in conjunction with good writing about current AI topics. There’s a mixture of reasoning about current AI and speculative future predictions that clearly demonstrates that I’m not some random outsider critiquing AI researchers.
Creating a strong reputation for myself and CHAI, such that people will have justified reason to listen to CHAI and/or me in the future.
Providing some sort of value to the subscribers who are not in long-term AI safety or AI strategy/policy.

When I started the newsletter, I was aiming primarily for the first one, by telling researchers what they should be reading. I continue to optimize mainly for that, though now I often try to provide enough information that researchers don’t have to read the original paper/post. I knew about the second source of value, but didn’t think it would be very large; I’m now more uncertain about how important it is. The reputational effects were more unexpected, since I didn’t think the newsletter would become as large as it currently is. I don’t know much about the last source of value and am basically ignoring it (i.e. pretending it is zero) in the rest of the analysis.

I’m actually quite uncertain about how much value comes from each of these subpoints, mainly because there’s a striking lack of comments or feedback on the newsletter. Excluding one person at CHAI who I talk to frequently, I get a comment on the content of the newsletter maybe once every 3-4 weeks. I can understand that people who get it as an email newsletter may not see an obvious way to comment (replying to a newsletter email is an unusual thing to do), but the newsletter is crossposted to LessWrong, the Alignment Forum, and Twitter. Why aren’t there comments there?

One possibility is that people treat the newsletter as a curation of interesting papers and posts, in which case there isn’t much need to comment. However, I’m fairly confident that many readers also find value in the summaries and opinions. You could instead interpret this as evidence that the things I’m saying are reasonable—after all, if I was wrong on the Internet, surely someone would let me know. On the other hand, if I’m only saying things that people already believe, am I actually accomplishing anything? It’s hard to say.

I think the most likely story is that I say things that people didn’t know but agree with once I say them—but I share Raemon’s intuition that people aren’t really learning much if that’s the case. (The rest of that post has many more thoughts on comments that apply to the newsletter.)

Overall it still feels like in expectation most of the value comes from widening the set of fields that any individual technical researcher is following, but it seems entirely possible that the newsletter does not do that at all and as a result only has reputational benefits. (I am fairly confident that the reputational benefits are positive and non-zero.) I’d really like to get more clarity on this, so if you read the newsletter, please take the survey!

Costs

The main cost of the newsletter is the opportunity cost of our time. Each newsletter takes about 15 hours of my time. The newsletter has gotten more detailed over time, but this isn’t reflected in the total hours I put in because it has been approximately offset by new content writers (Richard Ngo and Dan Hendrycks) who took some of the burden of summarizing off of me. Currently I’d estimate that the newsletter takes 15-20 hours in total (with 2-5 hours from Richard and Dan). This can be broken down into time I would have spent reading and summarizing papers anyway, and time that I spent only because the newsletter exists, which we could call “extra hours”. Initially, I wanted to read and summarize a lot of papers for my own benefit, so the newsletter took about 4-5 extra hours per week. Now, I’m less inclined to read a ton of papers, and it take 8-10 extra hours per week.

This means in aggregate I’ve spent 700-800 hours on the newsletter, of which about 300-400 were hours that I wouldn’t have spent otherwise. Even only counting the 300-400 hours, this is comparable to the time I spent on state of the world and learning biasesprojects together, including all of the time spent on paper writing, blog posts, and talks in addition to the research itself.

In addition to time costs, the newsletter could do harm. While there are many ways this could happen, the only one that feels sufficiently important to consider is the risk of causing information cascades. Since nearly everyone in the field is reading the newsletter, we may all end up with some belief B just because it was in a newsletter. We might then have way too much confidence in B since everyone else also believes B.

Overall I’m not too worried. There’s so much content in the newsletter that I seriously doubt a single idea could spread widely as a result of the newsletter—inevitably some people won’t remember that particular idea. So we only need to worry about “big” ideas that are repeated often in the newsletter. The most salient example of that would be my general opposition to the Bostrom/Yudkowsky paradigm of AI safety, but it still seems quite prevalent amongst researchers. In addition I’d be really surprised if existing researchers were convinced of a “big” idea or paradigm solely because other researchers believed it (though they might put undue weight on it).

Is the newsletter worth it?

If the only benefit of the newsletter were the reputational effects, it would not be worth my time (even ignoring Richard and Dan’s time). However, I get enough thanks from people in the field that the newsletter must be providing value to them, even though I don’t have a great model of what the value is. My current best guess is that there is a lot of value, which makes the newsletter worth the cost, but I think there is a non-negligible chance that this would be reversed if I had a good model of what value everyone was getting from it.

Going forward

In which I figure out what about the newsletter should change in the future.

Structure of the newsletter

So far I’ve only talked about whether the newsletter is worthwhile as a whole. But of course we can also analyze individual aspects of the newsletter and figure out how important they are.

Opinions are probably the key feature of the newsletter. Many papers and blog posts are aimed more at appearing impressive rather than conveying facts. Even the ones that are truth seeking are subject to publication bias: they are written by people who think that the ideas within are important, and so will be biased towards positivity. As a result, an opinion from a researcher who didn’t do the work can help contextualize the results that makes it easier for less involved readers to figure out the importance of the ideas. (As a corollary, I worry about the lack of a fresh perspective on posts that I write, but don’t see an obvious easy solution to that problem.) I think this also contributes to the success of Import AI and ChinAI, which are also quite heavy on opinions.

I think the summaries are also quite important. I aim for the longer summaries to be sufficiently informative that you don’t have to read the blog post / paper unless you want to do a deep dive and really understand the results. For papers, I often roughly aim for it to be more useful to read my summary than to read the abstract, intro, and conclusion of the paper. In the world where the newsletter didn’t have summaries, I think researchers would not keep up as much with the state of the field.

Overall, I think I’m pretty happy with the current structure of the newsletter, and don’t currently intend to change it. But if I get more clarity on what value the newsletter provides to researchers, I wouldn’t be surprised if I would change the structure as a result.

Scaling up

In the year that I’ve been writing the newsletter, the amount of writing that I want to cover has gone up quite a lot, especially with the launch of the Alignment Forum. I expect this will continue, and I won’t be able to keep up.

By default, I would cover less and less of it. However, it would be nice for the spreadsheet to be a somewhat comprehensive database of the AI safety literature. This is not what we currently have, because I often don’t cover good Agent Foundations work because it’s hard for me to understand and I don’t have pre-2018 content, but it is pretty good for the subfields of AI safety that I’m most knowledgeable about.

There has been some outsourcing of work as Richard Ngo and Dan Hendrycks have joined, but it still does not seem sustainable to continue this long-term, due to coordination challenges and challenges with maintaining quality. That said, it’s not impossible that this could work:

Perhaps I could pay people to do this summarization, with the hope that this would help me find people who could put in more time. This would allow more work to get done while keeping the team small (which keeps coordination costs and quality maintenance costs small).
I could create a system that allows random people to easily contribute summaries of papers and posts they have read, while writing the opinions myself. It may be easier to vet and fix summaries than to write them myself.
I could invest in developing good guides for new summarizers, in order to decrease the cost of onboarding and ongoing coordination.

That said, in all of these cases, it feels better to instead just summarize a smaller fraction of all the work, especially since the newsletter is already long enough that people probably don’t read all of it, while still adding links to papers that I haven’t read to the spreadsheet. The main value of summarizing everything is having a more comprehensive spreadsheet, but I don’t think this is sufficiently valuable to warrant the approaches above. That said, I could imagine that this conclusion being overturned by having a better model of how the newsletter adds value for technical safety researchers.

Sourcing

So far, I have found papers and articles from newsletters, blogs, Arxiv Sanity and Twitter. However, Twitter has become worse over time, possibly because it has learned to show me non-academic stuff that is more attention-grabbing or controversial, despite me trying not to click on those sorts of things. Arxiv Sanity was my main source for academic work, but recently it’s been getting worse, and is basically not working any more, and I’m not sure why. So I’m now trying to figure out a new way to find relevant literature—does anyone have suggestions?

If I continue to have trouble, I might summarize random academic papers I’m interested in instead of the ones that have come out very recently.

Appearance

It’s rather annoying that the newsletter is a giant wall of text; it’s probably not fun to read as a result. In addition to the categories, which were partly meant to give structure to the wall of text, I’ve been trying to break things into more paragraphs, but really it needs something much more drastic. However, I also don’t want it to be even more work to get a newsletter out.

So, if anyone wants to volunteer to make the newsletter visually nicer that would be appreciated, but it shouldn’t cost me too much more time (maybe half an hour a week, if it was significantly nicer). One easy possibility would be to include an image at the beginning of the newsletter—any suggestions for what should go there?

Future of the newsletter

Given the uncertainty of the value of the newsletter, it’s not inconceivable that I decide to stop writing it in the future, or scale back significantly. That said, I think there is value in stability. It is generally bad for a project to have “fits and starts” where its quality varies with the motivation of the person running them, or for the project to potentially be cancelled solely based on how valuable the creator thinks it is. (I’m aware I haven’t argued for this; feel free to ask me about it if it seems wrong.)

Due to this and related reasons, when I started the newsletter, I had an internal commitment to continue writing it for at least six months, as long as most other people thought it was still valuable. Obviously, if everyone agreed that the newsletter was not useful or actively harmful, then I’d stop writing it: this is more to deal with the case where I no longer think the newsletter is useful, even though other people think it is useful.

Now I’m treating it as an ongoing three-month commitment: that is, I am always committing to continue writing the newsletter for at least three months as long as most other people think it is valuable. At any point I can decide to stop the ongoing commitment (presumably when I think it is no longer worth my time to write it); there would then be three months where I would continue to write the newsletter for stability, and figure out what would happen with the newsletter after the three months.

Feedback I’d like

There are a bunch of questions I have, that I’d love to get opinions on either anonymously in the 3-minute survey (which you should fill out!) or in the comments. (Comments preferred because then other people can build off of them.) I’ve listed the questions roughly in order of importance:

What is the value of the newsletter for you?
What is the value of the newsletter for other people?
How should I deal with the growing amount of AI safety research?
What can I do to get more feedback on the newsletter on an ongoing basis (rather than having to survey people at fixed times)?
Am I underestimating the risk of causing information cascades? Regardless, how can I mitigate this risk?
How can I make the newsletter more visually appealing / less of a wall of text, without expending too much weekly effort?
Should I publicize the newsletter on Twitter? How valuable is the marginal reader?
Should I publicize the newsletter to AI researchers? How valuable is the marginal reader?
How can I find good papers out of academia now that Arxiv Sanity isn’t working as well as it used to?

Appendix: Alignment Newsletter FAQ

All of these are in the appendix because I don’t particularly care if people read it or not. It’s not very relevant to any of the content in the main post. It is relevant to anyone who might want to start their own newsletter, or their own project more generally.

What’s the history of the Alignment Newsletter?

During one of the CHAI seminars, someone suggested that we each take turns finding and collecting new research papers and sending them out to each other. I already had a system in place doing exactly this, so I volunteered to do this myself (rather than taking turns). I also figured that to save even more CHAI-researcher-time, it would make sense to give a quick summary and then tell people under what circumstances they should read the paper. (I was already summarizing papers for my own notes.)

This pretty quickly proved to be valuable, and I thought about making it public for even more time savings. However, it still seemed pretty nascent and in flux, so I continued iterating on it within CHAI, while thinking about how it could be made to be public-facing. (See also the “Things done right” section.) After a little under two months of writing the newsletter within CHAI, I made it public. At that time, the goal was to provide a list of relevant readings for technical AI safety researchers that had been published each week; and help them decide whether or not they should read them.

Over time, my summaries and opinions became longer and more detailed. I don’t know exactly why this happened. Regardless, at some point I started aiming for some of my summaries to be detailed enough that researchers could just read the summary and not read the paper/post itself.

In September, Richard Ngo volunteered to contribute summaries to the newsletter on a variety of topics, and Dan Hendrycks joined soon after focusing on robustness and uncertainty.

Why do you never have strong negative opinions?

One of the design decisions made at the beginning of the newsletter was to avoid strong critiques of any particular piece of research. This was for a few reasons:

As a general rule, any criticism I have of a paper is often too strong or based on a misunderstanding. If I have a negative impression of a paper or research agenda, I would predict that with ~90% probability after I talk to the author(s) my opinion of the work will have improved. I don’t think this is particular to me—this should be expected of any summarizer since the authors have much more intuition about why their particular approach will be useful, beyond what is written in the blog post or paper.
The newsletter probably shapes the views of a significant fraction of people thinking about AI safety, and so leads to a risk of information cascades. Mitigating this means giving space to views that I disagree with, summarizing them as best I can, and not attacking what will inevitably be a strawman of their view.
Regardless of the accuracy of the criticism, I would like to avoid alienating people.

Of course, this decision has downsides as well:

Since I’m not accurately saying everything I believe, it becomes more likely that I accidentally say false things, convey wrong impressions, or otherwise make it harder to get to the truth.
Disagreements are one of the main ways in which intellectual progress is made. They help identify points of confusion, and allow people to merge their models in order to get something (hopefully) better.

While the first downside seems like a real cost, the second downside is about inhibiting intellectual progress in AI safety research. I think this is okay: intellectual progress does not need to happen in the newsletter. In most of these cases I express stronger disagreements in channels more conducive to intellectual progress (e.g. the Alignment Forum, emails/messages, talking in person, the version of the newsletter internal to CHAI).

Another probable effect of avoiding negativity is reduced readership, since it is likely much more interesting to read a newsletter with active disagreements and arguments than one that dryly summarizes a research paper. I don’t yet know whether this is a pro or a con (even ignoring other effects of negativity).

Mistakes

I don’t know of very many mistakes, even in hindsight. I think this is primarily because I don’t get feedback on the newsletter, not because everything has gone perfectly. It seems quite likely that there are still things that are mistakes; but I don’t know it yet because I don’t have the data to tell.

Analyzing other newsletters. The one thing that I wish I had done was to analyze other newsletters like Import AI in more detail before starting this one. I think it’s plausible that I could have realized the value of opinions and more detailed summaries right at the beginning, rather than evolving in that direction over a couple of months.

Delays. I did fall over a week behind on the newsletter over the last month or two. While this is bad, I wouldn’t really call it a Mistake: I don’t think of the newsletter as a weekly commitment or obligation. I very much value the flexibility to allocate time to whatever seems most pressing; if the newsletter was more of a commitment (such that falling behind is a Mistake), I think I would have to be much more careful about what I agree to do, and this would prevent me from doing other important things. Instead, my approach is to have the newsletter as a fairly important goal that I try to schedule enough time for, but if I find myself running out of time and have to cut something, it’s not a tragedy if it means the newsletter is delayed. That’s essentially what happened over the last month or two.

Things done right

I spent a decent amount of time thinking about the design of the newsletter before implementing it, and I think this was in hindsight a very good idea. Here I list a few things that worked out well.

A polished product. I was particularly conscious of the fact that at launch the newsletter would be using up the limited common resource of “people’s willingness to try out new things”. Both in order to make sure people stuck with the project, and in order to not use up the common resource unnecessarily, I wanted to be fairly confident that this would be a good product before launching. As a result, I iterated for a little under two months within CHAI, in order to figure out product-market fit. You can see the evolution over time—this is the first internal newsletter, whereas this is the first public newsletter. (They’re all available here.)

By the fourth internal newsletter, I realized that I couldn’t actually summarize all the links I found, so I switched to a version where some links would be sent without summaries.
Categorization seemed important, so I did more of it.

This is not to say that the newsletter has been static since launch; it has changed significantly. Most notably, while originally I was aiming to give people enough information to decide whether or not to read the paper/post, I now sometimes aim for including enough detail that people don’t need to read the paper/post. But the point is that a lot of the early improvements happened within CHAI without consuming the common resource.

I’m not sure to what extent this is different from standard startup advice of iterating quickly and testing product-market fit: it depends on whether it counts as testing for product-market fit to trial the newsletter within CHAI. To the extent that there is a difference, it’s mainly that I’m arguing for more planning, especially before consuming common resources (whereas with startups, the fierce competition means that you do not worry about consuming common resources).

Considered stability and commitment. As I mentioned above, I had an internal commitment to continue writing the newsletter for at least six months, as long as other people thought it was valuable. In addition to the value of stability, I viewed this as part of cooperatively using the common resource of people’s willingness to try things. If you’re going to use the resource and fail, ideally you would have learned that it is actually infeasible to succeed in that domain, as opposed to e.g. lack of motivation on the author’s part.

Here’s another way to see this. I think it would have been a lot harder for the newsletter to be successful if there had been 2-5 attempts to create a newsletter in the past that had then fizzled out, because people would expect newsletters to fail and wouldn’t subscribe. My initial commitment helps prevent me from being one of those failures for “bad” reasons (e.g. me losing motivation) while still allowing me to fail for “good” reasons (e.g. no one actually wants to read a newsletter about AI alignment).

I can’t point to any actually good outcomes that resulted from this policy; nonetheless I think it was a good thing to have done.

Investing in flexible automated systems. I had created the private version of the spreadsheet before the first public newsletter, in order to have a database of readings for myself (replacing my previous Google Doc database), and I wrote a script to generate the email from this database. While lots of ink has been spilled on the value of automation, it doesn’t usually emphasize flexibility. By not using a technology meant for one specific purpose, I was able to do a few things that I wouldn’t expect to be able to do with a more specialized version:

Create consistency checks. For example, throwing an error when there’s an opinion but no summary, or when the name of the summarizer is not “Richard”, “Dan H” or “” (indicating me).
Creating a private and public version of the newsletter. (Any strong critiques go into the private version, which is internal to CHAI, and are removed from the public version.)

But really, the key value of flexibility is that it allows you to adapt to circumstances that you had never even considered when creating the system:

When Richard Ngo joined, I added a “Summarizer” column to the sheet, changed a few lines of code, and was done. (Note how I needed flexibility over both the data format and the analysis code.)
I’ve found myself linking to a bunch of previous newsletter entries and having to copy a lot of links. Recently I added a new tag that I can use in summaries and opinions that automatically extracts and links the entry I’m referring to. (I’m a bit embarrassed at how long it took me to realize that this was a thing I could do; I could have saved a lot more tedious work if I had realized it was a possibility the first time I got annoyed at this process.)

Thought about potential negative effects. I’m pretty sure I thought of most of the points about negativity (listed above) before publicizing the newsletter. This is discussed a lot; I don’t think I have anything significant to add.

This section seems to indicate that I thought of things initially and they were all important—this is almost certainly not the case. I’m sure I’m rationalizing some of these with hindsight and didn’t actually think of all the benefits then, and I also probably thought of other considerations that didn’t end up being important that I’ve now forgotten.

Rohin Shah10 Apr 2019 7:00 UTC

62 points

22 comments21 min readEA link

Organization updates Alignment Newsletter AI alignment Postmortems & retrospectives AI safety

Milan Griffes 10 Apr 2019 17:15 UTC
5 points
0 ∶ 0
Now that the newsletter is available in Mandarin (thanks Xiaohu!)
dope.
SoerenMind 14 Apr 2019 16:12 UTC
2 points
0 ∶ 0
To cover more content that’s not new but important, you could use a new source on one topic to summarize the state of that topic. I like that papers do this in the introduction and literature review and I think more posts and the like should do it.
- Rohin Shah 14 Apr 2019 17:25 UTC
  1 point
  0 ∶ 0
  Parent
  Yeah, I’ve been doing this occasionally (though that started recently).
SoerenMind 14 Apr 2019 16:06 UTC
2 points
0 ∶ 0
Google scholar also lists recommended new papers on its homepage.
Rohin Shah 10 Apr 2019 7:04 UTC
1 point
0 ∶ 0
Comment thread for the question: What is the value of the newsletter for you?
- Milan Griffes 10 Apr 2019 17:31 UTC
  3 points
  0 ∶ 0
  Parent
  Helps maintain my situational awareness of the AI alignment ecosystem. This is quite valuable to me.
Rohin Shah 10 Apr 2019 7:04 UTC
1 point
0 ∶ 0
Comment thread for the question: What is the value of the newsletter for other people?
Rohin Shah 10 Apr 2019 7:03 UTC
1 point
0 ∶ 0
Comment thread for the question: How should I deal with the growing amount of AI safety research?
- matthew.vandermerwe 10 Apr 2019 8:33 UTC
  10 points
  0 ∶ 0
  Parent
  General comment: Huge fan of the newsletter, and think it’s awesome you’re doing this sort of review. I should also caveat that I’m not an AIS researcher, so not exactly target audience.
  My first guess is that there’s significant value in someone maintaining an open, exhaustive database of AIS research. My main uncertainty is whether you are the best positioned to do this as things ramp up. It is plausible to me that an org with a safety team (e.g. DeepMind/OpenAI) is already doing this in-house, or planning to do so. It’s less clear that they would be willing to maintain a public resource. I’d want to verify this, and make sure that you’re coordinating with them to avoid any unnecessary duplication. More broadly, these labs might have some good systems in place for maintaining databases of new research in areas with a much higher volume than AIS, so could potentially share some best-practices.
  - Rohin Shah 10 Apr 2019 22:44 UTC
    1 point
    0 ∶ 0
    Parent
    My first guess is that there’s significant value in someone maintaining an open, exhaustive database of AIS research.
    Yeah, I agree. But there’s also significant value in doing more AIS research, and I suspect that on the current margin for a full-time researcher (such as myself) it’s better to do more AIS research compared to writing summaries of everything.
    Note that I do intend to keep adding all of the links to the database, it’s the summaries that won’t keep up.
    It is plausible to me that an org with a safety team (e.g. DeepMind/OpenAI) is already doing this in-house, or planning to do so.
    I’m 95% confident that no one is already doing this, and if they were seriously planning to do so I’d expect they would check in with me first. (I do know multiple people at all of these orgs.)
    More broadly, these labs might have some good systems in place for maintaining databases of new research in areas with a much higher volume than AIS, so could potentially share some best-practices.
    You know, that would make sense as a thing to exist, but I suspect it does not. Regardless that’s a good idea, I should make sure to check.
- Milan Griffes 10 Apr 2019 17:23 UTC
  2 points
  0 ∶ 0
  Parent
  My intuition is that this would be a good time to formalize the structure of the newsletter somewhat, especially given that there are multiple contributors & you are starting to function more as an editor.
  Could do by incorporating a small publishing organization that produces the newsletter, or by housing the newsletter in an existing organization. The former would be more work, but also seems better (less worry that we’re getting DeepMind (or whoever’s) spin if it’s coming out of an independent org).
  Plausibly it’s fine to keep it as an informal research product, but I’d guess that “AI alignment newsletter editor” could basically be (or soon become) a full-time job.
  - Milan Griffes 10 Apr 2019 17:27 UTC
    2 points
    0 ∶ 0
    Parent
    incorporating a small publishing organization that produces the newsletter...
    Could get a grant to fund this, or could do a pay-per-subscription model (a la Ben Thompson’s Stratechery, which I believe has > $1 million in annual revenue entirely from $10/month subscribers).
  - Rohin Shah 10 Apr 2019 22:50 UTC
    1 point
    0 ∶ 0
    Parent
    My intuition is that this would be a good time to formalize the structure of the newsletter somewhat, especially given that there are multiple contributors & you are starting to function more as an editor.
    Certainly more systems are being put into place, which is kind of like “formalizing the structure”. Creating an organization feels like a high fixed cost for not much benefit—what do you think the main benefits would be? (Maybe this is combined with paying content writers and editors, in which case an organization might make more sense?)
    Plausibly it’s fine to keep it as an informal research product, but I’d guess that “AI alignment newsletter editor” could basically be (or soon become) a full-time job.
    If I were to make this my full-time job, the newsletter would approximately double in length (assuming I found enough content to cover), and I’d expect that people wouldn’t read most of it. (People already don’t read all of it, I’m pretty sure.) What do you think would be the value of more time put into the newsletter?
    - Milan Griffes 10 Apr 2019 22:59 UTC
      2 points
      0 ∶ 0
      Parent
      (Maybe this is combined with paying content writers and editors, in which case an organization might make more sense?)
      Right, that’s what I was gesturing towards.
      What do you think would be the value of more time put into the newsletter?
      This was in response to “the growing amount of AI safety research.”
      Presumably as there is more research, it takes more time to read & assess the forthcoming literature to figure out what’s important / worth including in the newsletter.
      - Rohin Shah 11 Apr 2019 1:50 UTC
        3 points
        0 ∶ 0
        Parent
        This was in response to “the growing amount of AI safety research.”
        Yeah, I think I phrased that question poorly. The question is both “should all of it be summarized” and “if yes, how can that be done”.
        Presumably as there is more research, it takes more time to read & assess the forthcoming literature to figure out what’s important / worth including in the newsletter.
        I feel relatively capable of that—I think I can figure out for any given reading whether I want to include it in ~5 minutes or so with relatively high accuracy. It’s actually reading and summarizing it that takes time.
Rohin Shah 10 Apr 2019 7:03 UTC
1 point
0 ∶ 0
Comment thread for the question: What can I do to get more feedback on the newsletter on an ongoing basis (rather than having to survey people at fixed times)?
- Milan Griffes 10 Apr 2019 17:28 UTC
  3 points
  0 ∶ 0
  Parent
  Periodically shoot emails to people you respect, asking for feedback.
  Probably tailor these somewhat so they don’t become formulaic.
Rohin Shah 10 Apr 2019 7:03 UTC
1 point
0 ∶ 0
Comment thread for the question: Am I underestimating the risk of causing information cascades? Regardless, how can I mitigate this risk?
- Milan Griffes 10 Apr 2019 17:30 UTC
  2 points
  0 ∶ 0
  Parent
  Seems okay so far, from my very ill-informed perspective.
  Interesting to think about what governance the newsletter should have in place re: info hazards, confidentiality, etc.
  What did you guys do for GPT-2?
  - Rohin Shah 10 Apr 2019 22:56 UTC
    3 points
    0 ∶ 0
    Parent
    Interesting to think about what governance the newsletter should have in place re: info hazards, confidentiality, etc.
    Currently we only write about public documents, so I don’t think these concerns arise. I suppose you could imagine that someone writes about something they shouldn’t have and we amplify it, but I suspect this is a rare case and one that should be up to my discretion.
    What did you guys do for GPT-2?
    Not sure what specifically you’re asking about here. You can see the relevant newsletter here.
    - Milan Griffes 10 Apr 2019 23:01 UTC
      3 points
      0 ∶ 0
      Parent
      … you could imagine that someone writes about something they shouldn’t have and we amplify it, but I suspect this is a rare case and one that should be up to my discretion.
      A crux here is probably how rare a case we think this is.
      From my present vantage, the AI alignment newsletter is becoming a pretty prominent clearinghouse for academic AI alignment research updates. (I wouldn’t be surprised if it were the primary source of such for a sizable portion of newsletter subscribers.)
      To the extent that’s true, the amplification effects seem possibly strong.
      - Rohin Shah 11 Apr 2019 2:00 UTC
        3 points
        0 ∶ 0
        Parent
        From my present vantage, the AI alignment newsletter is becoming a pretty prominent clearinghouse for academic AI alignment research updates. (I wouldn’t be surprised if it were the primary source of such for a sizable portion of newsletter subscribers.)
        To the extent that’s true, the amplification effects seem possibly strong.
        I agree that’s true and that the amplification effects for AI safety researchers are strong; it’s much less strong of an amplification effect for any other category. My current model is that info hazards are most worrisome when they spread outside the AI safety community.
        On confidentiality, the downsides of the newsletter failing to preserve confidentiality seem sufficiently small that I’m not worried (if you ignore info hazards). Failures of confidentiality seem bad in that they harm your reputation and make it less likely that people are willing to talk to you—it’s similar to the reason you wouldn’t break a promise even if superficially the consequences of the thing you’re doing seem slightly negative. But in the case of the newsletter, we would amplify someone else’s failure to preserve confidentiality, which shouldn’t reflect all that poorly on us. (Obviously if we knew that the information was supposed to be confidential we wouldn’t publish it.)