This is an anonymous account.
Omega
Critiques of prominent AI safety labs: Redwood Research
Hi Akash,
Thank you for sharing your thougths & those concrete action items—I agree it would be nice to have a set of recommendations in an ideal world.
This post took at least 50 hours (collectively) to write, and was delayed in publishing by a few days due to busy schedules. I think if we had more time, I would have shared the final version with a small set of non-redwood beta reviewers for comments which would have caught things like this (and e.g. Nunos’ comment).
We plan to do this for future posts (if you’re reading this and would like to give comments on future posts, please DM us!).
We’ll consider adding an intervention section to future reports time permitting (we still think there is value in sharing our observations, as a lot of this information is not available to people without relevant networks.
(I may come back (again, time permitting) and respond to your point on Redwood having many problems to deal with at a later stage)
Thanks for mentioning the $20M point Nate—I’ve edited the post to make this a little more clear and would suggest people use $14M as the number instead.
Hi Bill, yes your understanding is correct—we will be writing a post in the future abotu Constellation, and we will share a draft ahead of time with you / Redwood.
Hi Dawn!
What do you count as software engineering experience? The linked LinkedIn profile looks like he has > 10 years of experience in the field.
Our critique on lack of senior ML staff is focused specifically on lack of machine learning expertise (as opposed to general TAIS work). We are counting substantive software engineering experience such as his work at PayPal and TripleByte.
On the topic of general TAIS experience, I think Buck has at most 7 years experience as he joined MIRI in 2017. (It is our understanding that a decent portion of his time at MIRI was spent recruiting). That being said, years of experience is not the only measure of experience, Jacob Steinhardt comments above that he believes Buck is “a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning.”
Can you confirm that Redwood really fired them as opposed to them quitting? (The first is unusual in my experience; the second very common.) You mention employees quitting in various places but because they’re anonymous, I can’t tell whether that refers to the same people. Thanks!
To our knowledge, their more experienced ML research staff were let go. We refer to different employees quitting at later stages. In an earlier draft we had named a few of them, but decided to remove the names due to anonymity concerns.
We will edit this section to make it more clear, but the MIRI critique is the MIRI hyperlink—Paul Christiano’s critique of Eliezer.
Thanks for this detailed comment Jacob. We’re in agreement with your first point, but on re-reading the post we can see why it seems like we think the problem selection was also wrong—we don’t believe this. We will clarify the distinction between problem selection and execution in the main post soon.
Our main concerns was that we think it is important, when working on a problem where a lot of prior research has been done, to come in to it with a novel approach or insight. We think its possible the team could have done this via a more thorough literature review or engaging with domain experts. Where we may disagree is that our suggestion of doing more desk research before hand might result in researchers dismissing ideas too easily, and thus experimenting and learning less.
We think this is definitely possible, but feel it can be less costly in some cases, and in particular could have been useful in the case of the adversarial training project. As we write later on in the passage you quoted above, we think that the problem with the adversarial training project was that we think Redwood focused on an unusually challenging threat model (unrestricted adversarial examples), and although we think there were some aspects of the textual domain that make the problem easier, the large number of textual adversarial attacks indicated it was unlikely to be sufficient.
Hi Fay, Thank you for engaging with the post. We appreciate you taking the time to check the claims we make.
1) Redwood Funding
Regarding OP’s investment in OpenAI—you are correct that OpenAI received a larger amount of money. We didn’t include this because in since the grant in 2017, OpenAI transitioned to a capped for-profit. I (the author of this particular comment) was actually not aware that OpenAI had been at one point a research non-profit at one point. I wil be updating the original post to add this information in—we appreciate you flagging it.
In general, we disagree that the correct reference class for evaluating Redwood’s funding is for-profit alignment labs like OpenAI, Anthropic or DeepMind because they have significantly more funding from (primarily non-EA) investors, and have different core objectives and goals. We think the correct reference class for Redwood is other TAIS labs (academic and research nonprofit) such as CHAI, CAIS, FAR AI and so on. I will add some clarification to the original post with more context.
(We will discuss the point on OP having board seats at Redwood in a separate comment)
Field Experience
Many research scientist roles at AI research labs (e.g. DeepMind and Google Brain[1]) expect researchers to have PhD’s in ML—this would be a minimum of 5 years doing relevant research.
Not all labs have a strict requirement on ML PhD’s. Many people at OpenAI and Anthropic don’t have PhD’s in ML either, but often have PhD’s in related fields like Maths or Physics. There are a decent number of people at OpenAI without PhD’s, (Anthropic is relatively stricter on this than OpenAI). Labs like MIRI don’t require this, but they are doing more conceptual researchly, and relatively little, if any, ML research (to the best of our knowledge, they are private by default).
- ^
Note that while we think for-profit AI labs are not the right reference class for comparing funding, we do think that all AI labs (academic, non-profit or for-profit) are the correct reference class when considering credentials for research scientists.
Hi Jakub, these are standard rates for EECS PhD students (PhD students in other disciplines get paid less). Here are a couple as an example:
Berkeley EECS PhD students are paid $45K per year at the PhD level. (from personal acquaintances at in the Berkeley EECS program)
MIT EECS PhD students are paid ~$49.2K per year at the PhD level. (source)
Update: this has now been edited in the original post.
I will be updating the original post to add this information in—we appreciate you flagging it.
Update: This has now been edited in the original post.
My personal judgment is that Buck is a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning. I do think Buck will do best in a setting where he’s either paired with a good empirical ML researcher or gains more experience there himself (he’s already gotten a lot better in the past year). But overall I view Buck as on par with a research scientist at a top ML university.
Thank you for this comment, some of the contributors of this post have updated their views of Buck as a researcher as a result.
This section has now been updated
Meta note: We believe this response is the 80⁄20 in terms of quality vs time investment. We think it’s likely we could improve the comment with more work, but wanted to share our views earlier rather than later.
We think one thing we didn’t spell out very explicitly in this post, was the distinction between 1) how effectively we believed Redwood spent their resources and 2) whether we think OP should have funded them (and at what amount). As this post is focused on Redwood, I’ll focus more on 1) and comment briefly on 2) - but note that we plan to expand on this further in a follow-up post. We will add a paragraph which disambiguates between these two points more clearly.
Argument 1): We think Redwood could produce at least the same quality and quantity of research, with fewer resources (~$4-8 million over 2 years)
The key reasons we think 1) are:
If they had more senior ML staff or advisors, they could have avoided some mistakes on their agenda that we see as avoidable. This wouldn’t necessarily come at a large monetary cost given their overall budget (around $200-300K for 1 FTE).
We estimate as much as 25-30% of their spending went towards scaling up projects (e.g. REMIX) before they had a clear research agenda they were confident in. To be fair to Redwood, this premature scaling was more defensible prior to the FTX collapse when the general belief was that there was a “funding overhang”. Nate in his comment also mentions that scaling was raised by both Holden and Ajeya (at OP), and now sees this as an error on their part.
Argument 2): OP should have spent less on Redwood, 2a) and there were other comparable funding opportunities
The key reasons we think 2) are:
There are other TAIS labs (academic and not) that we believe could absorb and spend considerably more funding than they currently receive. Example non-profits include CAIS and FAR AI and underfunded safety-interested academic groups include David Krueger and Dylan Hadfield-Menell’s groups. Opportunities are more limited if focusing specifically on interpretability, but there are still a number of promising options. For example, Neel Nanda mentioned three academics he considers do good interpretability work: OP has funded one of them (David Bau) but as far as we know not the other two (of course, they may not have room for more funding, or OP may have investigated and decided not to fund them for other reasons).
A key reason OP may not think some of these labs are worth funding on the margin is that they are substantially more bullish on certain safety research agendas than others. We have some concerns about how the OP LT team decide which agendas to support but will explore this further in our Constellation post, so won’t comment in more depth at this point. As one of the main funders of TAIS work, in a field which is very speculative and new, we think OP should be more open to a broad range of research agendas than they are.We think that small, young organizations without a track record beyond founder reputation should in general be given smaller grants and build up a track record before trying to scale. We think it’s plausible that several of the issues we pointed out could have been mitigated by this funding structure.
(written in first person because one post author wrote it)
As Nuno notes, I can;t see how else to spend $20M to get more good interp work (naively, I’m not claiming no such ways exist)
I think this is the area we disagree on the most. Examples of other ideas:
1. Generously fund the academics who you do think are doing good work (as far as I can tell, two of them—Christopher Pott and Martin Watternberg—get no funding from OP, and David Bau gets an order of magnitude less). This is probably more on OP than Redwood, but Redwood could also explore funding academics and working on projects in collaboration with them.
2. Poach experienced researchers who are executing well on interpretability but working on what (by Redwood’s lights) are less important problems, and redirect them to more important problems. Not everyone would want to be “redirected”, but there’s a decent fraction of people who would love to work on more ambitious problems but are currently not incentivized to do so, and a broader range of people are open to working on a wide range of problems so long as they are interesting. I would expect these individuals to cost a comparable amount to what Redwood currently pays (somewhat less if poaching from academia, somewhat more if poaching from industry) but be able to execute more quickly as well as spread valuable expertise around the organization.
3. Make one-year seed grants of around $100k to 20 early-career researchers (PhD students, independent researchers) to work on interpretability, nudging them towards a list of problems viewed important by Redwood. Provide low-touch mentorship (e.g. once a month call). Scale up the grants and/or hire people from the projects that did well after the one-year trial.
I wouldn’t confidently claim that any of these approaches would necessarily best Redwood, but there’s a large space of possibilities that could be explored and largely has not been. Notably, the ideas above differ from Redwood’s high-level strategy to date by: (a) making bets on a broad portfolio of agendas; (b) starting small and evaluating projects before scaling; (c) bringing in external expertise and talent.
I also broadly think that publishing and engaging with the broader ML community is less obviously good for interpretability, as noted I just don’t think most work is very relevant. I think it’s a bet worth making (and am excited about interp in the wild and my grokking work getting into ICLR!), but definitely not obviously worth the effort, eg I think it’s probably the right call that Anthropic doesn;t try to publish their work. Putting pre-prints on Arxiv seems pretty cheap, and I’m pro that, but I think seriously aiming for academic publications is a lot of work (more than 10-20% of a project IMO) and I feel pretty good about Redwood only trying for this when they have employees who are particularly excited about it.
I think I largely agree the percentage of interpretability papers that are relevant to large-scale alignment is disappointingly low. However, the denominator is very large, so I still expect the majority of TAIS-relevant interpretability work to happen outside TAIS organizations. Given this I’d argue there’s considerable value communicating to this subset of the ML research community. Perhaps a peer-reviewed publication is not the best way to do this: I’d be happy to see Redwood staff e.g. giving talks at a select subset of academic labs, but to the best of our knowledge this hasn’t happened.
I agree that getting from the stage of “scrappy preprint / blog post that your close collaborators can understand” to “peer-reviewed publication” can be 10-20% of a project’s time. However, in my experience the clarity of the write-up and rigor of the results often increase considerably in that 10-20%. There are some parts of the publication process that are complete wastes of time (reformatting from single to double column, running an experiment that you already know the results of but that reviewer 2 really wants to see), but in my experience these have been a minority of the work—no more than 5% of the overall project time. I’m curious if you view this as being significantly more costly than I do, or the improvements to the project from peer-review as being less significant.
Regarding 3) Publishing is relative to productivity, we are not entirely sure what you mean, but can try to clarify our point a little more.
We think it’s plausible that Redwood’s total volume of publicly available output is appropriate relative to the quantity of high-quality research they have produced. We have heard from some Redwood staff that there are important insights that have not been made publicly available outside of Redwood, but to some extent this is true of all labs, and it’s difficult for us to judge without further information whether these insights would be worth staff time to write up.
The main area we are confident in suggesting Redwood change is making their output more legible to the broader ML research community. Many of their research projects, including what Redwood considers their most notable project to date—causal scrubbing—are only available as Alignment Forum blog posts. We believe there is significant value in writing them up more rigorously and following a standard academic format, and releasing them as arXiv preprints. We would also suggest Redwood more frequently submit their results to peer-reviewed venues, as the feedback from peer review can be valuable for honing the communication of results, but acknowledge that it is possible to effectively disseminate findings without this: e.g. many of OpenAI and Anthropic’s highest-profile results were never published in a peer-reviewed venue.
Releasing arXiv preprints would have two dual benefits. First, it would make it significantly more likely to be noticed, read and cited by the broader ML community. This makes it more likely that others build upon the work and point out deficiencies in it. Second, the more structured nature of an academic paper forces a more detailed exposition, making it easier for reader’s to judge, reproduce and build upon. If, for example, we compare Neel’s original grokking blog post to the grokking paper, it is clear the paper is significantly more detailed and rigorous. This level of rigor may not be worth the time for every project, but we would at least expect it for an organization’s flagship projects.
Hi Joseph, that quote is meant to be facetious. The scientist who originally said the quote was trying to encourage the opposite to his students—that researching before experimenting can save them time.
Yep that’s right. This is probably an underestimate, but we would need to spend some time figuring it out. We’ve spent at least 10 hours replying to cc
Thanks Nuno, I’m sharing this comment with the other contributors and will respond in depth soon. I think you’re right that we could be more explicit on 3).