jsteinhardt comments on Critiques of prominent AI safety labs: Redwood Research

jsteinhardt Apr 1, 2023, 5:31 AM
149 points
29 ∶ 1
I’ll briefly comment on a few parts of this post since my name was mentioned (lack of comment on other parts does not imply any particular position on them). Also, thanks to the authors for their time writing this (and future posts)! I think criticism is valuable, and having written criticism myself in the past, I know how time-consuming it can be.
I’m worried that your method for evaluating research output would make any ambitious research program look bad, especially early on. Specifically:
The failure of Redwood’s adversarial training project is unfortunately wholly unsurprising given almost a decade of similarly failed attempts at defenses to adversarial robustness from hundreds or even thousands of ML researchers.
I think for any ambitious research project that fails, you could tell a similarly convincing story about how it’s “obvious in hindsight” it would fail. A major point of research is to find ideas that other people don’t think will work and then show that they do work! For many of my most successful research projects, people gave me advice not to work on them because they thought it would predictably fail, and if I had failed then they could have said something similar to what you wrote above.
I think Redwood’s failures here are ones of execution and not of problem selection—I thought the problem they picked was pretty interesting but they could have much more quickly realized the particular approaches they were taking to it were unlikely to pan out. If they had done that, perhaps they would have switched to other approaches that ended up succeeding, or just pivoted to interpretability faster. In any case, I definitely wouldn’t want to discourage them or future organizations from using a similar problem selection process.
(If you asked a random ML researcher if the problem seemed feasible, they would have said no. But I wouldn’t have used that as a reason not to work on the project.)
CTO Buck Shlegeris has 3 years of software engineering experience and a limited ML research background.
My personal judgment is that Buck is a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning. I do think Buck will do best in a setting where he’s either paired with a good empirical ML researcher or gains more experience there himself (he’s already gotten a lot better in the past year). But overall I view Buck as on par with a research scientist at a top ML university.
What links here?
- Omega Apr 2, 2023, 7:11 AM
  23 points
  0 ∶ 0
  Parent
  My personal judgment is that Buck is a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning. I do think Buck will do best in a setting where he’s either paired with a good empirical ML researcher or gains more experience there himself (he’s already gotten a lot better in the past year). But overall I view Buck as on par with a research scientist at a top ML university.
  Thank you for this comment, some of the contributors of this post have updated their views of Buck as a researcher as a result.
- Omega Apr 2, 2023, 12:09 AM
  14 points
  2 ∶ 1
  Parent
  Thanks for this detailed comment Jacob. We’re in agreement with your first point, but on re-reading the post we can see why it seems like we think the problem selection was also wrong—we don’t believe this. We will clarify the distinction between problem selection and execution in the main post soon.
  Our main concerns was that we think it is important, when working on a problem where a lot of prior research has been done, to come in to it with a novel approach or insight. We think its possible the team could have done this via a more thorough literature review or engaging with domain experts. Where we may disagree is that our suggestion of doing more desk research before hand might result in researchers dismissing ideas too easily, and thus experimenting and learning less.
  We think this is definitely possible, but feel it can be less costly in some cases, and in particular could have been useful in the case of the adversarial training project. As we write later on in the passage you quoted above, we think that the problem with the adversarial training project was that we think Redwood focused on an unusually challenging threat model (unrestricted adversarial examples), and although we think there were some aspects of the textual domain that make the problem easier, the large number of textual adversarial attacks indicated it was unlikely to be sufficient.
  What links here?
  - Critiques of prominent AI safety labs: Redwood Research by Omega (Mar 31, 2023, 8:58 AM; 339 points)
  - Critiques of prominent AI safety labs: Redwood Research by Omega. (LessWrong; Apr 17, 2023, 6:20 PM; 4 points)
  - jsteinhardt Apr 2, 2023, 4:34 PM
    18 points
    1 ∶ 0
    Parent
    Thanks for this! I think we still disagree though. I’ll elaborate on my position below, but don’t feel obligated to update the post unless you want to.
    * The adversarial training project had two ambitious goals, which were the unrestricted threat model and also a human-defined threat model (e.g. in contrast to synthetic L-infinity threat models that are usually considered).
    * I think both of these were pretty interesting goals to aim for and at roughly the right point on the ambition-tractability scale (at least a priori). Most research projects are less ambitious and more tractable, but I think that’s mostly a mistake.
    * Redwood was mostly interested in the first goal and the second was included somewhat arbitrarily iirc. I think this was a mistake and it would have been better to start with the simplest case possible to examine the unrestricted threat model. (It’s usually a mistake to try to do two ambitious things at once rather than nailing one, moreso if one of the things is not even important to you.)
    * After the original NeurIPS paper Redwood moved in this direction and tried a bunch of simpler settings with unrestricted threat models. I was an advisor on this work. After several months with less progress than we wanted, we stopped pursuing this direction. It would have been better to get to a point where we could make this call sooner (after 1-2 months). Some of the slowness was indeed due to unfamiliarity with the literature, e.g. being stuck on something for a few weeks that was isomorphic to a standard gradient hacking issue. My impression (not 100% certain) is Redwood updated quite a bit in the direction of caring about related literature as a result of this, and I’d guess they’d be a lot faster doing this a second time, although still with room to improve.
    Note by academic standards the project was a “success” in the sense of getting into NeurIPS, although the reviewers seemed to most like the human-defined aspect of the threat model rather than the unrestricted aspect.
  - Omega Apr 2, 2023, 7:13 AM
    3 points
    0 ∶ 0
    Parent
    This section has now been updated