Nate Thomas comments on We’re Redwood Research, we do applied alignment research, AMA

Nate Thomas 7 Oct 2021 22:01 UTC
26 points
0 ∶ 0
(I’ll use this comment to also discuss some aspects of some other questions that have been asked.)
I think there are currently something like three categories of bottlenecks on alignment research:
1. Having many tractable projects to work on that we expect will help (this may be limited by theoretical understanding / lack of end-to-end alignment solution)
2. Institutional structures that make it easy to coordinate to work on alignment
3. People who will attack the problem if they’re given some good institutional framework
Regarding 1 (“tractable projects / theoretical understanding”): Maybe in the next few years we will come to have clearer and more concrete schemes for aligning superhuman AI, and this might make it easier to scope engineering-requiring research projects that implement or test parts of those plans. ARC, Paul Christiano’s research organization, is one group that is working towards this.
Regarding 2 (“institutional structures”), I think of there being 5 major categories of institutions that could house AI alignment researchers:
- Alignment-focused research organizations (such as ARC or Redwood Research)
- Industry labs (such as OpenAI or DeepMind)
- Academia
- Independent work
- Government agencies (none exist currently that I’m aware of, but maybe they will in the future)
Redwood Research is currently focused on 2. One of the hypotheses behind Redwood’s current organizational structure is “it’s important for organizations to focus closely on alignment research if they want to produce a lot of high-quality alignment research” (see, for example, common startup advice such as “The most important thing for startups to do is to focus” (Paul Graham)). My guess is that it’s generally tricky to stay focused on the problems that are most likely to be core alignment problems, and I’m not sure how to do it well in some institutions. I’m excited about the prospect of alignment-focused research organizations that are carefully focused on x-risk-reducing alignment work and willing to deploy resources and increase headcount toward this work.
At Redwood, our current plan is to
- solicit project ideas that are theoretically motivated (ie they have some compelling story for how they are either analogous to or directly solving xrisk-associated problems for alignment of superintelligent systems) from researchers across the field of x-risk-motivated AI alignment,
- hire researchers and engineers who we expect to help execute on those projects, and
- provide the managerial and operational support for them to successfully complete those projects.
There are various reasons why a focus on focus might not be the right call, such as “it’s important to have close contact with top ML researchers, even if they don’t care about working on alignment right now, otherwise you’ll be much worse at doing ML research” or “it’s important to use the latest technology, which could require developing that technology in house”. This is why I think industry labs may be a reasonable bet. My guess is that (with respect to quality-adjusted output of alignment research) they have lower variance but also lower upside. Roughly speaking, I am currently somewhat less excited about academia, independent work, and government agencies, but I’m also fairly uncertain, and also there are definitely people and types of work that might be much better in these homes.
To wildly speculate, I could imagine a good and achievable distribution across institutions being 500 in alignment-focused research organizations (who might be much more willing and able to productively absorb people for alignment research), 300 in industry labs, 100 in academia, 50 independent researchers, and 50 in government agencies (but plausibly these numbers should be very different in particular circumstances). Of course “number of people working in the field” is far from an ideal proxy for total productivity, so I’ve tried to adjust for targettedness and quality of their output in my discussion here.
I estimate the current size of the field of x-risk-reduction-motivated AI alignment research is 100 people (very roughly speaking, rounded to an order of magnitude), so 1000 people would constitute something like a 10x increase. (My guesses for the current distribution is 30 in alignment orgs, 30 in industry labs, 30 in academia, 10 independent researchers, and 0 in government (very rough numbers, rounded to nearest half order of magnitude).) I’d guess there are at this time something like 30 − 100 people who, though they are not currently working on x-risk-motivated AI alignment research, would start working on this if the right institutions existed. I would like this number (of potential people) to grow a lot in the future.
Regarding 3 (“people”), the spread of the idea that it would be good to reduce x-risks from TAI (and maybe general growth of the EA movement) could increase the size and quality of the pool of people who would develop and execute on alignment projects. I am excited for the work that Open Philanthropy and university student groups such as Stanford EA are doing towards this end.
I’m currently unsure what an appropriate fraction of the technical staff of alignment-focused research organizations should be people who understand and care a lot about x-risk-motivated alignment research. I could imagine that ratio being something like 10%, or like 90%, or in between.
I think there’s a case to be made that alignment research is bottlenecked by current ML capabilities, but I (unconfidently) don’t think that this is currently a bottleneck; I think there is a bunch more alignment research that could be done now with current capabilities (eg my guess is that less than 50% of the alignment work that could be done at current levels of capabilities has been done—I could imagine there being something like 10 or more projects that are as helpful as “Deep RL from human preferences” or “Learning to summarize from human feedback”).