I agree with this.
jsteinhardt
Thanks for this! I think we still disagree though. I’ll elaborate on my position below, but don’t feel obligated to update the post unless you want to.
* The adversarial training project had two ambitious goals, which were the unrestricted threat model and also a human-defined threat model (e.g. in contrast to synthetic L-infinity threat models that are usually considered).
* I think both of these were pretty interesting goals to aim for and at roughly the right point on the ambition-tractability scale (at least a priori). Most research projects are less ambitious and more tractable, but I think that’s mostly a mistake.
* Redwood was mostly interested in the first goal and the second was included somewhat arbitrarily iirc. I think this was a mistake and it would have been better to start with the simplest case possible to examine the unrestricted threat model. (It’s usually a mistake to try to do two ambitious things at once rather than nailing one, moreso if one of the things is not even important to you.)
* After the original NeurIPS paper Redwood moved in this direction and tried a bunch of simpler settings with unrestricted threat models. I was an advisor on this work. After several months with less progress than we wanted, we stopped pursuing this direction. It would have been better to get to a point where we could make this call sooner (after 1-2 months). Some of the slowness was indeed due to unfamiliarity with the literature, e.g. being stuck on something for a few weeks that was isomorphic to a standard gradient hacking issue. My impression (not 100% certain) is Redwood updated quite a bit in the direction of caring about related literature as a result of this, and I’d guess they’d be a lot faster doing this a second time, although still with room to improve.Note by academic standards the project was a “success” in the sense of getting into NeurIPS, although the reviewers seemed to most like the human-defined aspect of the threat model rather than the unrestricted aspect.
I’ll briefly comment on a few parts of this post since my name was mentioned (lack of comment on other parts does not imply any particular position on them). Also, thanks to the authors for their time writing this (and future posts)! I think criticism is valuable, and having written criticism myself in the past, I know how time-consuming it can be.
I’m worried that your method for evaluating research output would make any ambitious research program look bad, especially early on. Specifically:
The failure of Redwood’s adversarial training project is unfortunately wholly unsurprising given almost a decade of similarly failed attempts at defenses to adversarial robustness from hundreds or even thousands of ML researchers.
I think for any ambitious research project that fails, you could tell a similarly convincing story about how it’s “obvious in hindsight” it would fail. A major point of research is to find ideas that other people don’t think will work and then show that they do work! For many of my most successful research projects, people gave me advice not to work on them because they thought it would predictably fail, and if I had failed then they could have said something similar to what you wrote above.
I think Redwood’s failures here are ones of execution and not of problem selection—I thought the problem they picked was pretty interesting but they could have much more quickly realized the particular approaches they were taking to it were unlikely to pan out. If they had done that, perhaps they would have switched to other approaches that ended up succeeding, or just pivoted to interpretability faster. In any case, I definitely wouldn’t want to discourage them or future organizations from using a similar problem selection process.
(If you asked a random ML researcher if the problem seemed feasible, they would have said no. But I wouldn’t have used that as a reason not to work on the project.)
CTO Buck Shlegeris has 3 years of software engineering experience and a limited ML research background.
My personal judgment is that Buck is a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning. I do think Buck will do best in a setting where he’s either paired with a good empirical ML researcher or gains more experience there himself (he’s already gotten a lot better in the past year). But overall I view Buck as on par with a research scientist at a top ML university.
- Critiques of prominent AI safety labs: Redwood Research by 31 Mar 2023 8:58 UTC; 339 points) (
- 1 Apr 2023 16:46 UTC; 3 points) 's comment on Critiques of prominent AI safety labs: Redwood Research by (
- Critiques of prominent AI safety labs: Redwood Research by 17 Apr 2023 18:20 UTC; 1 point) (LessWrong;
Thanks for this thoughtful and excellently written post. I agree with the large majority of what you had to say, especially regarding collective vs. individual epistemics (and more generally on the importance of good institutions vs. individual behavior), as well as concerns about insularity, conflicts of interest, and underrating expertise and overrating “value alignment”. I have similarly been concerned about these issues for a long time, but especially concerned over the past year.
I am personally fairly disappointed by the extent to which many commenters seem to be dismissing the claims or disagreeing with them in broad strokes, as they generally seem true and important to me. I would value the opportunity to convince anyone in a position of authority in EA that these critiques are both correct and critical to address. I don’t read this forum often (was linked to this thread by a friend), but feel free to e-mail me (jacob.steinhardt@gmail.com) if you’re in this position and want to chat.
Also, to the anonymous authors, if there is some way I can support you please feel free to reach out (also via e-mail). I promise to preserve your anonymity.
This is kind of tangential, but anyone who is FODMAP-sensitive would be unable to eat any of Soylent, Huel, or Mealsquares as far as I’m aware.
Relevant blog post I wrote: https://bounded-regret.ghost.io/film-study/
Thanks for writing this! One thing that might help would be more examples of Phase 2 work. For instance, I think that most of my work is Phase 2 by your definition (see here for a recent round-up). But I am not entirely sure, especially given the claim that very little Phase 2 work is happening. Other stuff in the “I think this counts but not sure” category would be work done by Redwood Research, Chris Olah at Anthropic, or Rohin Shah at DeepMind (apologies to any other people who I’ve unintentionally left out).
Another advantage of examples is it could help highlight what you want to see more of.
I’m teaching a class on forecasting this semester! The notes will all be online: http://www.stat157.com/
It seems clear that none of the content in the paper comes anywhere close to your examples. These are also more like “instructions” than “arguments”, and Rubi was calling for suppressing arguments on the danger that they would be believed.
At the same time, what occurred mostly sounded reasonable to me, even if it was unpleasant. Strong opinions were expressed, concerns were made salient, people may have been defensive or acted with some self-interest, but no one was forced to do anything. Now the paper and your comments are out, and we can read and react to them. I have heard much worse in other academic and professional settings.
I don’t think “the work got published, so the censorship couldn’t have been that bad” really makes sense as a reaction to claims of censorship. You won’t see work that doesn’t get published, so this is basically a catch-22 (either it gets published, in which cases there isn’t censorship, or it doesn’t get published, in which case no one ever hears about it).
Also, most censorship is soft rather than hard, and comes via chilling effects.
(I’m not intending this response to make any further object-level claims about the current situation, just that the quoted argument is not a good argument.)
I also agree with you. I would find it very problematic if anyone was trying to “ensure harmful and wrong ideas are not widely circulated”. Ideas should be argued against, not suppressed.
Re: Bayesian thinking helping one to communicate more clearly. I agree that this is a benefit, but I don’t think it’s the fastest route or the one with the highest marginal value. For instance, when you write:
A lot of expressed beliefs are “fake beliefs”: things people say to express solidarity with some group (“America is the greatest country in the world”), to emphasize some value (“We must do this fairly”), to let the listener hear what they want to hear (“Make America great again”), or simply to sound reasonable (“we will balance costs and benefits”) or wise (“I don’t see this issue as black or white”).
I’m immediately reminded of Orwell’s essay Politics and the English Language. I would generally expect people to learn more about clear, truth-seeking communication from reading Orwell (and other good books on writing) than by being Bayesian. Indeed, I find many Bayesian rationalists to be highly obscurantist in practice, perhaps moreso than the average similarly-educated person, and I feel that rationalist community norms tend to reward rather than punish this, because many people are drawn to deep but difficult-to-understand truths.
I would say that the value of the rationalist project so far has been in generating important hypotheses, rather than in clear communication around those hypotheses.
I just don’t think this is very relevant to whether outreach to debaters is good. A better metric would be to look at life outcomes of top debaters in high school. I don’t have hard statistics on this but the two very successful debaters I know personally are both now researchers at the top of their respective fields, and certainly well above average in truth-seeking.
I also think the above arguments are common tropes in the “maths vs fuzzies” culture war, and given EA’s current dispositions I suspect we’re systematically more likely to hear and be receptive to anti-debate than to pro-debate talking points. (I say this as someone who loved to hate on debate in high school, especially as it was one of the main competitors with math team for recruiting smart students. But with hindsight from seeing my classmates’ life outcomes I think most of the arguments I made were overrated.)
Thanks, and sorry for not responding to this earlier (was on vacation at the time). I really appreciated this and agree with willbradshaw’s comment below :).
I think we just disagree about what a downvote means, but I’m not really that excited to argue about something that meta :).
As another data point, I appreciated Dicentra’s comment elsewhere in the thread. I haven’t decided whether I agree with it, but I thought it demonstrated empathy for all sides of a difficult issue even while disagreeing with the OP, and articulated an important perspective.
I think your characterization of my thought process is completely false for what it’s worth. I went out of my way multiple times to say that I was not expressing disapproval of Dale’s comment.
Edit: Maybe it’s helpful for me to clarify that I think it’s both good for Dale to write his comment, and for Khorton to write hers.
I didn’t downvote Dale, nor do I wish to express social disapproval of his post (I worry that the length of this thread might lead Dale to feel otherwise, so I want to be explicit that I don’t feel that way).
To your question, if I were writing a post similar to Dale, what I would do differently is be more careful to make sure I was responding to the actual content of the post. The OP asked people to support Asian community members who were upset, while at least the last paragraph of Dale’s post seemed to assume that OP was arguing that we should be searching for ways to reduce violence against Asians. Whenever I engage on an emotionally charged topic I re-read the original post and my draft response to make sure that I actually understood the original post’s argument, and I think this is good practice.
Another mistake I think Dale’s post makes is assuming that whether the Atlanta attacks are racially motivated is a crux for most people’s emotional response. I think Dale’s claim may well be correct (I could see both arguments), but the larger context is a significant increase in violent incidents against Asians, at least some of which seem obviously racially motivated (the increase is also larger than other races). These have taken a constant emotional toll on Asians for a while now, and the particular Atlanta shootings are simply the first instance where it actually penetrated the broader public consciousness.
I can’t think of an easy-to-implement rule that would avoid this mistake. The best would be “try harder to think from the perspective of the listener”, but this is of course very difficult especially when there is a large gap in experience between the speaker and the listener. If I were trying super-hard I would run the post by an Asian friend to see if they felt like it engaged with the key arguments, but I think it would be unreasonable to expect, or expend, that level of effort for forum comments.
Again, I think people make communication mistakes like this all the time and do not find them particularly blameworthy and would normally not bother to comment on them. I am only pointing them out in detail because you asked me to.
I think it’s good for people to point out ways that criticism can be phrased more sympathetically, and even aligned with your goal of encouraging more critical discussion (which I am also in favor of). As someone who often gives criticism, sometimes unpopular criticism, I both appreciate when people point out ways I could phrase it better but also strongly desire people to be forgiving when I fail to do so. If no one took the time to point these out to me, I would be less capable of offering effective criticism.
Along these lines, my guess is that you and Khorton are interpreting downvotes differently? I didn’t take Khorton’s downvote to be claiming “You should not be posting this on the forum” but instead “Next time you post something like this I wish you’d spend a bit more effort exercising empathy”. And if Dale totally ignores this advice, the penalty is… mild social disapproval from Khorton and lots of upvotes from other people, as far as I can tell.
They being Laaunch? I agree they do a lot of different things. Hate is a Virus seemed to be doing even more scattered things, some of which didn’t make sense to me. Everything Laaunch was doing seemed at least plausibly reasonable to me, and some, like the studies and movement-building, seemed pretty exciting.
My guess is that even within Asian advocacy, Laaunch is not going to look as mission-focused and impact-driven as say AMF. But my guess is no such organization exists—it’s a niche cause compared to global poverty, so there’s less professionalization—though I wouldn’t be surprised if I found a better organization with more searching. I’m definitely in the market for that if you have ideas.
To push back on this point, presumably even if grantmaker time is the binding resource and not money, Redwood also took up grantmaker time from OP (indeed I’d guess that OP’s grantmaker time on RR is much higher than for most other grants given the board member relationship). So I don’t think this really negates Omega’s argument—it is indeed relevant to ask how Redwood looks compared to grants that OP hasn’t made.
Personally, I am pretty glad Redwood exists and think their research so far is promising. But I am also pretty disappointed that OP hasn’t funded some academics that seem like slam dunks to me and think this reflects an anti-academia bias within OP (note they know I think this and disagree with me). Presumably this is more a discussion for the upcoming post on OP, though, and doesn’t say whether OP was overvaluing RR or undervaluing other grants (mostly the latter imo, though it seems plausible that OP should have been more critical about the marginal $1M to RR especially if overhiring was one of their issues).