This is an anonymous account.
Omega
Hi Nuno thanks for the question. Not sure if I am fully answering the question, so feel free to let us know if this doesn’t address your point.
We added this paragraph as a result of Conjecture’s feedback, which accounts for his change of mind:
Although this section focuses on the negatives, there are of course positive aspects to Connor’s character. He is clearly a highly driven individual, who has built a medium-sized organization in his early twenties. He has shown a willingness to engage with arguments and change his mind on safety concerns, for example delaying the release of his GPT-2 replication. Moreover, in recent years Connor has been a vocal public advocate for safety: although we disagree in some cases with the framing of the resulting media articles, in general we are excited to see greater public awareness of safety risks.
We also modified this paragraph:
We don’t want to unfairly hold people’s mistakes from their college days against them – many people exaggerate or overestimate (intentionally or not) their own accomplishments. Even a partial replica of GPT-2 is an impressive technical accomplishment for an undergraduate, so this project does attest to Connor’s technical abilities. It is also positive that he admitted his mistake publicly. However, overall we do believe the project demonstrates a lack of attention to detail and rigor. Moreover, we haven’t seen signs that his behavior has dramatically changed.
We haven’t attempted to quantify or sum up the positives and negatives in a more quantitative way that might make it easier to judge our perecption of this. On net we are still fairly concerned. Some things that might update us positively in favor of Connor are:
Seeing a solid plan for balancing profit and safety motives
Hearing about significant improvements in Conjecture and Connor’s representation of themselves to external parties
Connor acknowledging his role in the formation of Stability AI and it’s contribution to race dynamics
Critiques of prominent AI safety labs: Conjecture
Thank you for the kind words Linch!
We’ve fixed the footnotes! There was an issue when we converted Markdown to EA Forum Docs editor it seems.
it would be nice to exclude text e.g. the appendix / pre-amble / introduction from the “time to read” estimate.
E.g. in our upcoming post the time to read the core pieces is 22 minutes, but the total read time is showing 39 minutes (almost double) because of our lenghty appendix and some introductory context.
If you’d like to help edit our posts (incl. copy-editing—basic grammar etc, but also tone & structure suggestions and fact-checking/steel-manning), please email us at anonymouseaomega@gmail.com!
We’d like to improve the pace of our publishing and think this is an area that external perspectives could help us
Make sure our content & tone is neutral & fair
Save us time so we can focus more on research and data gathering
(We missed this submission, apologies to the poster for not sharing this in a more timely fashion).
A male constellation Member (current or former Redwood Staff) & MLAB / REMIX program participant writes:
One thing I think was missed: the spending culture seemed a little over the top. There were some servers that had been unused racking up $10k+ bills that weren’t wound down with any urgency.
Quick updates:
Our next critique (on Conjecture) will be published in 2 weeks.
The critqiue after that will be on Anthropic. If you’d like to be a reviewer, or have critiques you’d like to share, please message us or email anonymouseaomega@gmail.com.
Some quick thoughts from writing the critique post (from the perspective of the main contributor / writer w/o a TAIS background)
If you’re a non-subject matter expert (SME) who can write, but who knows that other SME’s have good/thoughtful critiques, I think it’s worth sitting down with them and helping them write it. Often SME’s lack the time and energy to write a critique. I think not being a SME gave me a bit of an outsider’s perspective and I think I pushed back more on pieces because they weren’t obvious to non-technical people, which I think made some of the technical critiques more specific.
Overall, we are all really happy with the response this post has gotten, the quality of critiques / comments, and the impact it seems to be making in relevant circles. I would be happy to give feedback on others’ critiques, if they share similar goals (improving information asymmetry, genuinely truth seeking).
Writing anonymously has made this post better quality because I feel less ego / attached to the critiques we made, and feel like i can be more in truth seeking mode rather than worrying about protecting my status / reputation. On the flipside, we put a lot of effort into this post and i feel sad that this won’t be recognized, because i’m proud of this work.
Things we will change in future posts (keen to get feedback on this!)
We will have a section which states our bottom-line opinions very explicitly and clearly (e.g. org X should receive less funding, we don’t recommend people work at org Y) and then cite which reasons we think support each critique. I think a handful of comments raised points that we had thought about, but weren’t made clear on the page. I feel a little hesitatnt to not say the bottom-line view because I worry people will think we are being overly negative, but I think if we can communicate our uncertainties and cavesat them, it could be okay.
There were several contributors to this post. I think (partly due to being busy, time constraints and not wanting to delay publishing or be bottlenecked on a contributor getting back to me) I didn’t scrutinize some contributions as thoroughly as I should have prior to publishing. I will aim to reduce that in future posts.
I will be sharing all future drafts with 5-10 other SME reviewers (both people we think would agree & disagree with us) prior to publication, because I think a the comments on this post improved it substantialy.
(minor) I would add a little more context on the flavor of feedback we are aiming to get from the org we are critiquing
Omega’s Quick takes
Update: the post has been edited.
Hi Larks, thanks for the pushback here. We agree that this is hard to judge. Unfortunately, some of what this was was about the general atmosphere of the place which is unfortunately a bit fuzzy.
People said they feel a pressure conform / defer to these people as well for example at lunchtime conversations. People have also said they can’t act as free or as loose as they would like in Constellation. So it’s maybe something like feeling like you have to behave in a certain way or in line with what you perceive the funders and senior leadership want in order to fit in.
Although this may be present in other offices, we think this pressure is more pronounce at Constellation than other coworking spaces like the Open Phil offices or Lightcone, where we think there is more of an ability to say and do what you want.
We know this probably isn’t as satisfying as it could be, but appreciate you taking the time to point this out and we will edit the post to acknowledge this.
Yep that’s right. This is probably an underestimate, but we would need to spend some time figuring it out. We’ve spent at least 10 hours replying to cc
Hi Joseph, that quote is meant to be facetious. The scientist who originally said the quote was trying to encourage the opposite to his students—that researching before experimenting can save them time.
Regarding 3) Publishing is relative to productivity, we are not entirely sure what you mean, but can try to clarify our point a little more.
We think it’s plausible that Redwood’s total volume of publicly available output is appropriate relative to the quantity of high-quality research they have produced. We have heard from some Redwood staff that there are important insights that have not been made publicly available outside of Redwood, but to some extent this is true of all labs, and it’s difficult for us to judge without further information whether these insights would be worth staff time to write up.
The main area we are confident in suggesting Redwood change is making their output more legible to the broader ML research community. Many of their research projects, including what Redwood considers their most notable project to date—causal scrubbing—are only available as Alignment Forum blog posts. We believe there is significant value in writing them up more rigorously and following a standard academic format, and releasing them as arXiv preprints. We would also suggest Redwood more frequently submit their results to peer-reviewed venues, as the feedback from peer review can be valuable for honing the communication of results, but acknowledge that it is possible to effectively disseminate findings without this: e.g. many of OpenAI and Anthropic’s highest-profile results were never published in a peer-reviewed venue.
Releasing arXiv preprints would have two dual benefits. First, it would make it significantly more likely to be noticed, read and cited by the broader ML community. This makes it more likely that others build upon the work and point out deficiencies in it. Second, the more structured nature of an academic paper forces a more detailed exposition, making it easier for reader’s to judge, reproduce and build upon. If, for example, we compare Neel’s original grokking blog post to the grokking paper, it is clear the paper is significantly more detailed and rigorous. This level of rigor may not be worth the time for every project, but we would at least expect it for an organization’s flagship projects.
(written in first person because one post author wrote it)
As Nuno notes, I can;t see how else to spend $20M to get more good interp work (naively, I’m not claiming no such ways exist)
I think this is the area we disagree on the most. Examples of other ideas:
1. Generously fund the academics who you do think are doing good work (as far as I can tell, two of them—Christopher Pott and Martin Watternberg—get no funding from OP, and David Bau gets an order of magnitude less). This is probably more on OP than Redwood, but Redwood could also explore funding academics and working on projects in collaboration with them.
2. Poach experienced researchers who are executing well on interpretability but working on what (by Redwood’s lights) are less important problems, and redirect them to more important problems. Not everyone would want to be “redirected”, but there’s a decent fraction of people who would love to work on more ambitious problems but are currently not incentivized to do so, and a broader range of people are open to working on a wide range of problems so long as they are interesting. I would expect these individuals to cost a comparable amount to what Redwood currently pays (somewhat less if poaching from academia, somewhat more if poaching from industry) but be able to execute more quickly as well as spread valuable expertise around the organization.
3. Make one-year seed grants of around $100k to 20 early-career researchers (PhD students, independent researchers) to work on interpretability, nudging them towards a list of problems viewed important by Redwood. Provide low-touch mentorship (e.g. once a month call). Scale up the grants and/or hire people from the projects that did well after the one-year trial.
I wouldn’t confidently claim that any of these approaches would necessarily best Redwood, but there’s a large space of possibilities that could be explored and largely has not been. Notably, the ideas above differ from Redwood’s high-level strategy to date by: (a) making bets on a broad portfolio of agendas; (b) starting small and evaluating projects before scaling; (c) bringing in external expertise and talent.
I also broadly think that publishing and engaging with the broader ML community is less obviously good for interpretability, as noted I just don’t think most work is very relevant. I think it’s a bet worth making (and am excited about interp in the wild and my grokking work getting into ICLR!), but definitely not obviously worth the effort, eg I think it’s probably the right call that Anthropic doesn;t try to publish their work. Putting pre-prints on Arxiv seems pretty cheap, and I’m pro that, but I think seriously aiming for academic publications is a lot of work (more than 10-20% of a project IMO) and I feel pretty good about Redwood only trying for this when they have employees who are particularly excited about it.
I think I largely agree the percentage of interpretability papers that are relevant to large-scale alignment is disappointingly low. However, the denominator is very large, so I still expect the majority of TAIS-relevant interpretability work to happen outside TAIS organizations. Given this I’d argue there’s considerable value communicating to this subset of the ML research community. Perhaps a peer-reviewed publication is not the best way to do this: I’d be happy to see Redwood staff e.g. giving talks at a select subset of academic labs, but to the best of our knowledge this hasn’t happened.
I agree that getting from the stage of “scrappy preprint / blog post that your close collaborators can understand” to “peer-reviewed publication” can be 10-20% of a project’s time. However, in my experience the clarity of the write-up and rigor of the results often increase considerably in that 10-20%. There are some parts of the publication process that are complete wastes of time (reformatting from single to double column, running an experiment that you already know the results of but that reviewer 2 really wants to see), but in my experience these have been a minority of the work—no more than 5% of the overall project time. I’m curious if you view this as being significantly more costly than I do, or the improvements to the project from peer-review as being less significant.
Meta note: We believe this response is the 80⁄20 in terms of quality vs time investment. We think it’s likely we could improve the comment with more work, but wanted to share our views earlier rather than later.
We think one thing we didn’t spell out very explicitly in this post, was the distinction between 1) how effectively we believed Redwood spent their resources and 2) whether we think OP should have funded them (and at what amount). As this post is focused on Redwood, I’ll focus more on 1) and comment briefly on 2) - but note that we plan to expand on this further in a follow-up post. We will add a paragraph which disambiguates between these two points more clearly.
Argument 1): We think Redwood could produce at least the same quality and quantity of research, with fewer resources (~$4-8 million over 2 years)
The key reasons we think 1) are:
If they had more senior ML staff or advisors, they could have avoided some mistakes on their agenda that we see as avoidable. This wouldn’t necessarily come at a large monetary cost given their overall budget (around $200-300K for 1 FTE).
We estimate as much as 25-30% of their spending went towards scaling up projects (e.g. REMIX) before they had a clear research agenda they were confident in. To be fair to Redwood, this premature scaling was more defensible prior to the FTX collapse when the general belief was that there was a “funding overhang”. Nate in his comment also mentions that scaling was raised by both Holden and Ajeya (at OP), and now sees this as an error on their part.
Argument 2): OP should have spent less on Redwood, 2a) and there were other comparable funding opportunities
The key reasons we think 2) are:
There are other TAIS labs (academic and not) that we believe could absorb and spend considerably more funding than they currently receive. Example non-profits include CAIS and FAR AI and underfunded safety-interested academic groups include David Krueger and Dylan Hadfield-Menell’s groups. Opportunities are more limited if focusing specifically on interpretability, but there are still a number of promising options. For example, Neel Nanda mentioned three academics he considers do good interpretability work: OP has funded one of them (David Bau) but as far as we know not the other two (of course, they may not have room for more funding, or OP may have investigated and decided not to fund them for other reasons).
A key reason OP may not think some of these labs are worth funding on the margin is that they are substantially more bullish on certain safety research agendas than others. We have some concerns about how the OP LT team decide which agendas to support but will explore this further in our Constellation post, so won’t comment in more depth at this point. As one of the main funders of TAIS work, in a field which is very speculative and new, we think OP should be more open to a broad range of research agendas than they are.We think that small, young organizations without a track record beyond founder reputation should in general be given smaller grants and build up a track record before trying to scale. We think it’s plausible that several of the issues we pointed out could have been mitigated by this funding structure.
This section has now been updated
My personal judgment is that Buck is a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning. I do think Buck will do best in a setting where he’s either paired with a good empirical ML researcher or gains more experience there himself (he’s already gotten a lot better in the past year). But overall I view Buck as on par with a research scientist at a top ML university.
Thank you for this comment, some of the contributors of this post have updated their views of Buck as a researcher as a result.
I will be updating the original post to add this information in—we appreciate you flagging it.
Update: This has now been edited in the original post.
(written by the non-technical contributor to the critique posts)
One challenge of writing critiques (understandably) is that they are really time consuming, and my technical co-author has a lot of counterfactual uses of their time. I have a lot of potential posts that would be pretty valuable but a lot of the critiques need to be fleshed out by someone more technical.
I would love to find someone who has a slightly lower opportunity cost, but still has the technical knowledge to be able to make meaningful contributions. It’s hard to find someone who can do that and cares deeply about effects of high-effort critiques on the broader EA / TAIS ecosystem (that can also be trusted and we can de-anonymize ourselves to).