Critiques of prominent AI safety labs: Redwood Research
Crossposted to LessWrong.
This is the first post in our sequence and covers Redwood Research (Redwood). We recommending reading our brief introduction to the sequence for added context on our motivations, who we are, and our overarching views on alignment research.
Redwood is a non-profit started in 2021 working on technical AI safety (TAIS) alignment research. Their approach is heavily informed by the work of Paul Christiano, who runs the Alignment Research Center (ARC), and previously ran the language model alignment team at OpenAI. Paul originally proposed one of Redwood’s original projects and is on Redwood’s board. Redwood has strong connections with central EA leadership and funders, has received significant funding since its inception, recruits almost exclusively from the EA movement, and partly acts as a gatekeeper to central EA institutions.
We shared a draft of this document with Redwood prior to publication and are grateful for their feedback and corrections (we recommend others also reach out similarly). We’ve also invited them to share their views in the comments of this post.
We would like to also invite others to share their thoughts in the comments openly if you feel comfortable, or contribute anonymously via this form. We will add inputs from there to the comments section of this post, but will likely not be updating the main body of the post as a result (unless comments catch errors in our writing).
Summary of our views
We believe that Redwood has some serious flaws as an org, yet has received a significant amount of funding from a central EA grantmaker (Open Philanthropy). Inadequately kept in check conflicts of interest (COIs) might be partly responsible for funders giving a relatively immature org lots of money and causing some negative effects on the field and EA community. We will share our critiques of Constellation (and Open Philanthropy) in a follow-up post. We also have some suggestions for Redwood that we believe might help them achieve their goals.
Redwood is a young organization that has room to improve. While there may be flaws in their current approach, it is possible for them to learn and adapt in order to produce more accurate and reliable results in the future. Many successful organizations made significant pivots while at a similar scale to Redwood, and we remain cautiously optimistic about Redwood’s future potential.
An Overview of Redwood Research
Grants: Redwood has received just over $21 million dollars in funding that we are aware of, for their own operations (2/3, or $14 million) and running Constellation (1/3 or $7 million) Redwood received $20 million from Open Philanthropy (OP) (grant 1 & 2) and $1.27 million from the Survival and Flourishing Fund. They also were granted (but never received) $6.6 million from FTX Future Fund.
Output:
Research: Redwood lists six research projects on their website: causal scrubbing, interpretability in the wild, polysemanticity and capacity in neural networks, adversarial training for high-stakes reliability, language models seem to be much better than humans at next-token prediction, and one-layer transformers aren’t equivalent to a set of skip-trigrams.
Field Building: Redwood has run two iterations of the Machine Learning Alignment Bootcamp (MLAB), and a mini-internship Redwood Mechanistic Interpretability Experiment (REMIX). Both programs are primarily focused on junior TAIS researchers.
Longtermist Office: Redwood runs the Constellation office space, an approximately 30,000 square foot office hosting staff from several technical AI safety focused and longtermist EA-aligned organizations such as OP, ARC, the Atlas Fellowship, CEA and OpenAI.
Relationships with primary funder: Two of Redwood’s leadership team have or have had relationships to an OP grant maker. A Redwood board member is married to a different OP grantmaker. A co-CEO of OP is one of the other three board members of Redwood. Additionally, many OP staff work out of Constellation, the office that Redwood runs. OP pays Redwood for use of the space.
Research Team: Redwood is notable for hiring almost exclusively from the EA community and having few senior ML researchers. Redwood’s most experienced ML researcher spent 4 years working at OpenAI prior to joining Redwood. This is comparable experience to someone straight out of a PhD program, which is typically the minimum experience level of research scientists at most major AI labs.[1] CTO Buck Shlegeris has 3 years of software engineering experience and a limited ML research background. He also worked as a researcher at MIRI for two years, but MIRI’s focus is quite distinct from contemporary ML. CEO Nate Thomas has a Physics PhD and published some papers on ML during his PhD. Redwood previously employed an individual with an ML PhD, but he recently left. Jacob Steinhardt (Assistant Professor at UC Berkeley) and Paul Christiano (CEO at ARC) have significant experience but are involved only in a part-time advisory capacity. At its peak, Redwood’s research team had 15 researchers (including people on work trials, 20 including interns). They currently have 10 researchers (including people on work trials).
Redwood has scaled rapidly and then gone through several rounds of substantial layoffs and other attrition, with around 10 people having departed. For example, two of the authors of the causal scrubbing technical report have departed Redwood.
Research Agenda: Their research agenda has pivoted several times. An initial focus was adversarial training but our understanding is this project has been largely canned. Circuit-style interpretability was a major focus of much of their published research (interpretability in the wild, polysemanticity and capacity in neural networks) but our understanding is Redwood is currently moving away from this.
Endorsements: Redwood received some high endorsements from prominent members of the EA and TAIS community when they launched. The endorsements were focused on Redwood’s value alignment and technical potential. Paul Christiano (ARC) wrote that Redwood was “unusually focused on finding problems that are relevant to alignment and unusually aligned with my sense of what is important. I think there is a good chance that they’ll significantly increase the total amount of useful applied alignment work that happens over the next 5-10 years.” Ajeya Cotra (Open Philanthropy) wrote that the org was “...experienced and competent at software engineering and engineering management”. Nate Soares (ED of MIRI) wrote that the Redwood team possessed “the virtue of practice, and no small amount of competence.” and that he was “excited about their ability to find and execute impactful plans that involve modern machine learning techniques. In my estimation, Redwood is among the very best places to do machine-learning based alignment research that has a chance of mattering.”
Criticisms and Suggestions
Lack of Senior ML Research Staff
Prima facie, the lack of experienced ML researchers at any ML research org is a cause for concern. We struggle to think of research organizations that have produced substantial results without strong senior leadership with ML experience (see our notes on Redwood’s team above). Redwood leadership does not seem to be attempting to address this gap. Instead, they have terminated some of their more experienced ML research staff.
To Redwood’s credit, their leadership does contain individuals with significant alignment experience, which is important for evaluating theories of change. The ideal set-up would be to have someone who’s experienced in both alignment and ML as part of the senior leadership, but we recognize that there are only a handful of such people and compromises are sometimes necessary. In this instance, we think it would be valuable for Redwood to have some experienced ML researchers on staff (and to prioritize recruiting those). These experienced ML researchers could then work closely with leadership to evaluate tractability and low-level research directions, complementing the leadership’s existing skills.
We think the lack of senior researchers at Redwood is partly responsible for at least two unnecessarily disruptive research pivots. Each pivot has resulted in multiple staff being let go, and a major shift in the focus of the org’s work. We have mixed feelings on Redwood’s agenda being in flux. It is commendable that they are willing to make major pivots to their agenda when they feel an existing approach is not leading to sufficiently high impact; we’ve seen many other organizations, and especially academic labs, ossify behind a single sub-par agenda. However, we think that Redwood would have achieved a higher hit-rate and avoided such major and disruptive pivots if they had de-risked their agenda by involving more senior researchers and soliciting feedback from a broader group of researchers before scaling them.
For example in Sep 2022, Redwood staff wrote that:
Our original aim was to use adversarial training to make a system that (as far as we could tell) never produced injurious completions. If we had accomplished that, we think it would have been the first demonstration of a deep learning system avoiding a difficult-to-formalize catastrophe with an ultra-high level of reliability.
[...]
Alas, we fell well short of that target. We still saw failures when just randomly sampling prompts and completions.
The failure of Redwood’s adversarial training project is unfortunately wholly unsurprising given almost a decade of similarly failed attempts at defenses to adversarial robustness from hundreds or even thousands of ML researchers. For example, the RobustBench benchmark shows the best known robust accuracy on ImageNet is still below 50% for adversarial attacks with a barely perceptible perturbation.
Moreover, Redwood’s project focuses on an even more challenging threat model: unrestricted adversarial examples. There has been an almost complete lack of progress towards solving that problem in the image domain. Although there may be some aspects of the textual domain that make the problem easier, the large number of textual adversarial attacks indicate that is unlikely to be sufficient. In the absence of any major new insight, we would expect this project to fail. It is likely that considerable time and money could have been saved by simply conducting a more thorough literature review, and engaging with domain experts from the adversarial robustness community.[2]
Our main concern with this project was not the problem selection, but that we think it’s plausible that if the team did more background research it’s possible they could have brought a novel approach or insight to the problem. That being said, we also think effort could have also been saved if, as Jacob Steinhardt points out they were able to realized their current approaches were unlikely to work and pivoted more quickly.[3]
To Redwood’s credit, they have at least partially learned from the mistake of attempting extremely ambitious projects with limited guidance, and have brought in external experienced ML researchers such as Jacob Steinhardt to advise for recent mechanistic interpretability projects. However, they have nonetheless continued to quickly scale these projects, for example temporarily bringing in 30-50 junior staff as part of their REMIX program to apply some of these mechanistic interpretability methods. From conversations, it seems that many of these projects had inadequate research mentorship resulting in predictable (but avoidable) failure. Furthermore, Redwood itself does not intend to pursue this agenda further beyond the next 6 months, raising questions as to whether this program was justified under even optimistic assumptions.
Our suggestions: We would encourage Redwood leadership to seek to recruit and retain senior ML researchers, giving senior researchers more autonomy and stability and producing more externally legible work. We recognize that some research prioritization decisions have been informed by advisors such as Paul Christiano and Jacob Steinhardt, which partially offsets the limited ML in-house expertise. However, unless Paul or Jacob are able to invest significantly more time into providing detailed feedback it would be judicious to build a broader group of advisors, which could include experts in relevant topics (e.g. ML interpretability) from outside the TAIS community.
Lack of Communication & Engagement with the ML Community
Redwood has deprioritized communicating their findings, with many internal research projects that have never been disseminated to the outside world. Moreover, existing communication is targeted to the effective altruism and rationality audience, not the broader ML research community.
Our understanding is that a significant fraction of Redwood’s research has never been written up and/or disseminated to the outside world. On the positive side, a substantial body of unpublished research could make Redwood’s cost-effectiveness significantly better than we would otherwise assess. On the negative side, the research may have limited impact if it is never published. This certainly makes evaluation of Redwood more challenging: our impression is that much of the unpublished research is of relatively low quality (and that this is part of the reason Redwood has not published it), but this is difficult to objectively evaluate as an outsider.
Many of the research results are only available on the Alignment Forum, a venue that ML researchers outside the EA or TAIS communities rarely frequent or cite. Only two Redwood papers have been accepted into conferences: “Adversarial training for high-stakes reliability” (NeurIPS 2022) and “Interpretability in the wild” (ICLR 2023).[4] We have heard that Redwood is planning to submit more papers in the next year though it seems like a lower priority than other projects.
The choice of whether to publish and communicate research at all, and whether to communicate it to the broader ML community, is often debated in the TAIS community. Below, we summarize the strongest considerations for and against publishing ML safety research, how they apply to Redwood, and our take on those reasons:
AGAINST: Publishing may disseminate research advances that inadvertently contribute to capabilities. If research findings could enable harmful activities then it might be appropriate to suppress them entirely or avoid publicizing them in communities that might take advantage. (See Conjecture’s Infohazards Policy for more on this topic). To the best of our knowledge this consideration is both not relevant for Redwood’s team when making these decisions, and (in our view) is not applicable to Redwood’s research.
FOR: Independent feedback loops by making work legible to the broader ML community working on closely related topics. The broader ML research community is orders of magnitude larger than the x-risk community, and includes many people with deep technical expertise in areas the current AI safety community is lacking, and provides a fresh and independent perspective. There are two main concerns: 1) that engaging in the broader community could be a waste of limited TAIS researcher time, e.g. some early stage work can be harder for the broader ML community to productively engage on; 2) engaging externally could worsen research quality (see MIRI’s (2019) view). We believe 1) and 2) are not relevant here, because Redwood’s historical focus area of mechanistic interpretability is similar to much existing academic work (see above regarding the RobustBench benchmark). This means it’s both more understandable, mitigating (1), and that (2) is unlikely because many mainstream ML researchers already have relevant domain expertise.
FOR: You can also get potential hires from the broader ML community by this method: Our impression is that there is an acute talent bottleneck in technical AI safety (and Redwood in particular) for senior staff who can effectively manage teams or develop research agendas. Given the relative success and positive reviews of Redwood’s prior recruiting efforts, such as its alignment bootcamp (MLAB) which focuses on developing basic ML engineering skills in (frequently though not exclusively) junior staff, we think that if Redwood improves its communications it is well placed to recruit more senior ML engineers to work on alignment.
In our view, it’s unlikely that Redwood will focus on this because (as per our observation) we believe they are more bullish on influencing younger individuals to switch careers to AI safety (informed by e.g. the results of this OP survey). Redwood also already has a strong reputation amongst some longtermist community builders and organizations,[5] so may not feel the need for such outreach for hiring. It seems like a highly tractable area for Redwood to improve on outreach and hire outside experts in full-time or consulting positions.
AGAINST: Sharing research in an externally-legible format (especially publishing in academic venues) takes time and effort away from other endeavors (e.g. research or strategizing). We believe Redwood’s primary concern is the cost of investing time in making work legible when a lot of the current research is intended primarily to inform future research strategy rather than to directly solve alignment. There is certainly some merit to this view: communicating well takes significant time, and there is little point in attempting to disseminate preliminary results that are still rapidly changing. However, in our experience the cost of writing up is only a modest part of the overall research effort, perhaps taking 10-20% of the time of the project. Moreover, a write-up is invaluable even internally for onboarding new people to a project, and for soliciting more detailed feedback than is practical in an informal presentation. The marginal cost of making a write-up externally available as a preprint is even lower than the initial cost. We think that if a research project was worth doing, it is most likely worth disseminating as well.
FOR: Publishing (or making unpublished research legible) helps external evaluators (e.g. funders) that you don’t have personal relationships with to make accurate judgements of your work. Redwood doesn’t have an incentive to publish for external evaluators because Redwood’s primary funder, Open Philanthropy, already has access to Redwood’s plans, current thinking and more since they have close connections to Redwood staff and board and work out of the Constellation office space. We have concerns that the relationship between Redwood and Open Philanthropy may prevent Open Philanthropy from making unbiased evaluations, and may write more about this separately.
FOR: Published research is more likely to get adopted, whether on its own merits or from the organization’s reputation: A solution to the alignment problem will not help if AGI researchers do not adopt it: disseminating research results allows other researchers to incorporate discoveries and techniques into their own work. You may not think this is a good idea if:
you don’t actually think the research you’re doing will directly help to solve alignment, or that adoption will not be useful;
you do think publishing will help, but only if the research is above a certain bar because:
it’s better to build the organization’s reputation; or
you intend to improve the overall quality of alignment and interpretability research
you think you’ll be able to influence relevant actors via personal relationships and connections such that building a public reputation is less important
We think Redwood is motivated by 1, 2 and 3. For 1), our impression is much of their research is intended to test ideas at a high level to inform future research directions. Regarding 2b) in particular, we’ve heard that some of Redwood’s senior leadership are concerned that papers by other alignment organizations (e.g. Anthropic) make false claims, and so they may be particularly concerned that publishing low quality papers could be net negative. Finally, part of Redwood’s theory of change is to build relationships with researchers at existing labs (via Constellation), so they may be further deprioritizing building a public reputation and instead focusing on building those relationships instead.
We are sympathetic to consideration 1, but would argue that other labs could benefit from these results to inform their own prioritization decisions, and that Redwood would benefit from external feedback. We also partially agree with consideration 2b: it is important to publish high-quality results, and we are concerned by the low signal-to-noise ratio of e.g. the Alignment Forum. However, we believe much of this concern can be addressed by communicating appropriate uncertainty in the write-up, such as discussing concerns in a Limitations section and avoiding over-claiming of results..
We are more skeptical of 2a. In particular, we believe many prestigious labs, such as DeepMind, produce papers of varying significance. Although heavily publicizing an insignificant result might hurt Redwood’s reputation, in general publishing more work (and communicating appropriately about its significance) is likely to only help gain an audience and credibility.
Consideration 3 argues that reputation and public profile is of limited importance if key actors can be directly influenced through personal connections. Firstly, we’re reluctant to rely on this: even senior staff at an organization cannot always influence its strategy, as showcased by many OpenAI staff spinning out Anthropic in protest of some of OpenAI’s decisions. Redwood building its own reputation and influence could therefore be of significant value.
Secondly, we expect much of the benefit of publication to come from others building on Redwood’s research in ways that Redwood staff could not have performed themselves. Ultimately, even if Redwood could reliably cause other organizations to adopt its plans, they’re unlikely to be able to solve alignment all by themselves. Publishing allows researchers with different backgrounds, skills and resources to contribute to progress on the problems Redwood feels are most important.
As another reference, Dan Hendrycks also writes more on publishing research.
Underwhelming Research Output
As an external evaluator, we find it hard to evaluate Redwood’s research output since there is not much public work. Our impression is that Redwood has produced several useful research outputs, but that the quantity and quality of output is underwhelming given the amount of money and staff time invested.
Of Redwood’s published research, we were impressed by Redwood’s interpretability in the wild paper, but would consider it to be no more impressive than progress measures for grokking via mechanistic interpretability, executed primarily by two independent researchers, or latent knowledge in language models without supervision, performed by two PhD students.[6] These examples are cherry-picked to be amongst the best of academia and independent research, but we believe this is a valid comparison because we also picked what we consider the best of Redwood’s research and Redwood’s funding is very high relative to other labs.
Some of our reviewers have seen Redwood’s unpublished research. From our observations, we believe that the published work is generally of higher quality than the unpublished work. Given this, we do not believe that focusing on the published work significantly underestimates the research progress made, although it is of course possible that there is significant unpublished work that we’re unaware of.
Redwood has amongst the largest annual budget of any non-profit AI safety research lab. This is particularly striking as it is a relatively young organization, and its first grant from Open Philanthropy in 2021 was for $10 million. At this point, Redwood had little to no track record beyond the reputation of its founders. Redwood received another $10 million grant in 2022. About $6.6 million (⅔ of this) went towards their internal operations and research and the remaining ⅓ went goes towards the Constellation office space for non-Redwood staff each year.
Redwood has fluctuated between 6 and 15 full-time equivalent research staff over the past 2 years. As a non-academic lab, its staff salaries are about 2-3x as much as academic labs, but low relative to for-profit labs in SF. A junior-mid level research engineer at Redwood would have a salary in the range of $150,000-$250,000 per year as compared to an academic researcher in the Bay, who would earn around $40,000-50,000 per year, and a comparable researcher in a for-profit lab, who earns $200,000-500,000.
Overall, Redwood’s funding is much higher than that of any other non-profit lab that OP funds, with the exception of their 2017 OpenAI grant of $30 million. We do not believe this grant is the right reference class as OpenAI later transitioned to a capped for-profit model in 2019, and has significant funding from other investors.[7] Comparing the funding of Redwood to other non-profit labs, the org that has received the closest in funding is CAIS ($5 million) although this grant is intended to last for more than a year. In comparison, the alignment-focused academic lab CHAI at UC Berkeley[8] received ~$3 million per year including non-OP grants ($2.2 million per year / $11 million over five years from OP), and Jacob Steinhardt’s lab received a $1.1 million grant over three years. These labs have 24 and 10 full-time equivalent research staff respectively. In other words, Redwood has a funding level 3.5 times CHAI, but we do not think its output is 3.5 times better than that of CHAI’s annual output. However this comparison looks better when comparing headcount, since Redwood and CHAI’s alignment-focused contingent are of comparable size (While Redwood is entirely focused on x-risk, our understanding is that around half of CHAI’s research staff work on x-risk relevant topics.)
Work Culture Issues
Redwood’s culture affects their research environment and impacts the broader TAIS community, so we believe that it is important to discuss it as best we can here. The following points are based on conversations with current and former Redwood staff, as well as people who have spent time in and around the Constellation office. Unfortunately we can’t go into some relevant details without compromising the confidentiality of people we have spoken with.
It’s important to note that Redwood is not the only organization in the Bay (or in EA) which has these problems with its culture. Since this post is about Redwood, we are focused on them, but we don’t mean to imply that other organizations are free from the same criticisms, or make a relative judgment of how good Redwood is on these issues compared to similar orgs.
Redwood’s leadership operates under the assumption that humanity will soon develop transformative AI technologies.[9] Based on conversations with Redwood leadership we believe that they don’t see work culture or diversity as a priority. We aren’t saying that leadership don’t think it matters—just that it doesn’t feel pressing. (Redwood commented privately that they disagree with this statement.) Some of their leadership believe that rapid turnover and multiple employees burning are not significant issues. However, we believe that both of these issues impede Redwood’s ability to be effective and achieve their stated goals.
Redwood is missing out on talent. We know of at least 4 people who have left, considered leaving, or turned down work opportunities with Redwood because of the work culture and lack of (primarily gender) diversity. We think it’s likely that there are others who have made similar decisions who we do not know.
Redwood has ambitious goals and if they want to be taken seriously in Silicon Valley and the ML research community, it’s likely that being a fringe, non-diverse and non-representative group will hurt their chances of doing this.
We are concerned that Redwood’s actions are more consequential than those of most EA organizations because they are a large, prominent EA organization in the EA ecosystem and serve as a gatekeeper when running Constellation and MLAB. In other words, their actions have second-order effects on the broader field of AI Safety.
Below, we go into more detail on each of the issues and our recommendations:
Creating an intense work culture where management sees few responsibilities towards employees
We believe that Redwood has an obligation to create a more healthy working environment for employees than they have done to date. Much of Julia Wise’s advice to a new EA org employee applies to Redwood. We’ve heard multiple cases of people being fired after something negative happens in their life (personal, conflict at work, etc) that causes them to be temporarily less productive at work. While Redwood management have made some efforts to offer support to staff (e.g. offering unpaid leave on some occasions), we believe it may not have been done consistently, and are aware of cases where termination happened with little warning. We also think it is somewhat alarming that the rate of burnout is so high at Redwood, resulting in multiple cases where taking unpaid leave is one of the employee’s better options. In defense of Redwood, this is likely only partially due to management style. It may also be due to the pool of people Redwood is recruiting from, and the fact that many people who work there have a shared belief in short timelines and a high probability of x-risk.
Redwood is known for offering work-trials before full-time jobs. While this is common amongst many EA-aligned organizations, such work-trials are usually brief, lasting for weeks rather than months. However, we have heard of Redwood work-trials lasting several months (the longest we are aware of is 4 months), and several work-trialers feeling stressed by the pressure and uncertainty. Work trials can create job insecurity and be stressful because trialling employees always feel like they’re under evaluation. We do recognize that work trials remain one of the more reliable methods of gauging mutual fit, so it is possible this cost is justified, but we would encourage placing more emphasis on supporting people during trial periods and keeping the period as short as practical.
This is a problem for two reasons:
First and foremost, people deserve to be treated better. Having an intense and mission-oriented work culture is not an excuse for hurtful behavior. We are concerned that management has in the past actively created unhealthy work environments, with some behaviors leading to negative consequences and contributing to burn-out.
It’s not productive or effective. We aren’t against working long hours or having an intense work culture in general—there are situations when it can be needed, or can be done in a sustainable way. However, we do believe that providing support, enabling people to improve, and building a healthy culture is generally more productive over time, even though it may increase some costs in the short term and add some ongoing maintenance costs. We do not believe that Redwood is making a well-calculated tradeoff that is increasing its productivity, and believe that it’s instead making short-sighted decisions that contribute to burnout and a bad overall culture. This is especially impactful given that Redwood also runs Constellation, which hosts other TAIS research organizations, major EA funders, and the Atlas team (which recruits and develops junior talent), and MLAB, which trains junior TAIS researchers.
Even if you believe that AI timelines are short, we still need people to be working on alignment for years to come. It doesn’t seem like the optimal strategy is to have them burn out in under a year. The cost of staff burning out is not just imposed on an individual organization, but on the technical AI safety ecosystem as a whole, and sets a bad precedent. And, as noted above, we do not think Redwood has been unusually productive as an organization, especially relative to the resources it has received. (Providing too much support to employees is also a failure mode, but we believe Redwood is very far away from erring in this direction).
We recommend that:
Redwood leadership read and consider this article by an MIT CS professor which is partly about how creating a sustainable work culture can actually increase productivity.
Redwood standardize work trial length, communicate clear expectations, have a well-formalized review process and consider offering work trial candidates jobs more quickly.
Redwood invest in on-the-job training and mentorship, and help connect employees who transition out of Redwood to find other jobs and opportunities.
We have heard that Redwood leadership is aware of the issues around letting people go and are shifting the responsibility of who manages the research team, as well as establishing norms such as giving people two months notice or feedback rather than abruptly letting them go. While this is an improvement, we are nevertheless concerned that the leadership team allowed this kind of behavior to happen in the first place.
Not prioritizing creating an inclusive workplace
Multiple EA community members have told us they feel uncomfortable using/visiting Constellation because of the unhealthy work culture, a lack of gender and racial diversity, and specific cultural concerns. We have heard multiple concerns around Constellation’s culture. About 10+ people (5 Constellation members) have mentioned that there they feel a pressure conform / defer to these people as well for example at lunchtime conversations. They have also said they can’t act as free or as loose as they would like in Constellation. We recognize that these critiques about atmosphere are harder to evaluate because they vague and less concrete than they could be, but we think they are worth raising because our impression is that these issues are more pronounced in Constellation than other coworking spaces like the Open Phil offices or Lightcone (although these issues may be present there as well).
We know this probably isn’t as satisfying as it could be, but appreciate you taking the time to point this out and we will edit the post to acknowledge this. We’ve heard from 5-10 people (2-3 Constellation members) who feel they are viewed more in terms of their dating potential and less like colleagues both in and out of Constellation. It is sometimes hard to distinguish between instances like this, especially with the personal and work overlap in the Bay Area EA community, and we recognize that isn’t Redwood’s fault and can make the situation more challenging. The people we spoke to are also concerned at the lack of attention that is paid to these issues by the leadership of these offices.
Some of these concerns are not exclusive to Constellation / Redwood. Technical AI safety is predominantly male, even more so than similar technical disciplines like software engineering. It isn’t Redwood’s fault that the ecosystem and talent pool it draws on is not diverse, however we believe Redwood is exacerbating this problem through its culture. Ultimately, we believe that organizations should strive to create environments that are inclusive to people from minority groups even if demographic parity is not reached.
We recommend creating formal and informal avenues for making complaints and generally encourage the leadership team to consider investing the time to create a culture where people feel they will be listened to if they raise concerns.
Conclusion
In sum, Redwood has produced some useful research, but much less than both the amount of funding and mindspace it has occupied. There are many labs that have produced equally good work, so it might be worth considering for funders whether some of the money that was invested in Redwood at an early stage where most orgs have growing pains would have been better used by investing in scaling existing labs and supporting a greater diversity of new labs instead.
We have discussed a number of significant problems with Redwood including a lack of senior ML researchers, lack of communication with the broader community and serious work culture issues. We don’t think it’s too late for Redwood to make some significant changes and improve on these issues. We hope this post may help spur change at Redwood, as well as inform the broader community, including potential employees, collaborators and funders.
Edit Log
[March 31st at 9:15am]: We made several grammar edits and fixed broken links / footnotes. We also clarified that we are only talking about inputs we received about Constellation in the section on creating an inclusive workplace, as the previous phrasing implied we were talking about Lightcone as well.
[March 31st at 10:39am]: We ran the workspace comment by the primary contributor and updated it to be more accurate. Specifically, we clarify what instances happened at Constellation (or related events) and which what were actions taken by Constellation members in other spaces as well. We removed one point (about people feeling uncomfortable about being flirted with) since the contributor mentioned this instance did not take place at Constellation, and we didn’t think it was fair to include this.
[March 31st at 1:59pm & 3:22pm]: More grammar edits.
[April 1st at 12:00am]: Cleaning up links, clarified the section on Redwood’s funding and the correct reference classes for it, clarified a point about Redwood’s adversarial training model
[April 4th at 11:23am]: We clarified the section on the atmosphere of Constellation based on the comment from Larks.
- ^
We go into more detail on this in a follow-up comment.
- ^
We cannot help but be reminded of Frank H. Westheimer’s advice to his research students: “Why spend a day in the library when you can learn the same thing by working in the laboratory for a month?”
- ^
Thanks to Jacob Steinhardt for helping us clarify this point.
- ^
As a benchmark example, Sergey Levine’s lab at UC Berkeley published 5 papers of comparable quality to the Redwood papers in 2022 (and 30 papers total, although the others were substantially lower quality, and note that the papers aren’t as relevant to alignment). Sergey Levine’s lab has a substantially lower budget than Redwood’s. However, in defense of Redwood, Sergey’s lab does have a head count comparable to or larger than Redwood: it is currently listed as comprising 2 post-docs, 22 graduate students and 29 (part-time) undergraduate researchers.
- ^
For example, a speaker at the ML Winter Camp that took place in Berkeley in winter 2022-2023 stated that they believed that the only person with a good research agenda was Paul Christiano, and he sent all his research ideas to Redwood. They then went on to say that the best thing for the participants to aim for was working for Redwood (or, if they were smart enough, ARC—but they weren’t smart enough). This reminds us a lot of the rhetoric from individuals talking to EA groups, and at AIRCS and CFAR workshops around MIRI’s research around 2015-2017. MIRI had not produced much legible work (eventually announcing they were non-disclosed by default) and people would essentially base their recommendations on trusting the MIRI staff. Eventually MIRI said that they failed at their current research directions, and there was a general switch in focus to large language models.
- ^
Redwood Research commented that they view their causal scrubbing work as more significant. We view this work as substantially more novel and working on an important problem (evaluating mechanistic interpretability explanations), but we’re unsure as to the degree to which causal scrubbing will provide a tractable solution to this.
- ^
More in this comment, thank you to @FayLadybug for pointing this out.
- ^
8⁄20 grad students / postdoc researchers at CHAI are mostly x-risk focused, plus a few ops staff and Stuart Russell
- ^
We couldn’t find a public statement on the topic (this post briefly mentions it), but this is common knowledge amongst the TAIS community
- Shallow review of live agendas in alignment & safety by 27 Nov 2023 11:10 UTC; 319 points) (LessWrong;
- There should be more AI safety orgs by 21 Sep 2023 14:53 UTC; 181 points) (LessWrong;
- Critiques of prominent AI safety labs: Conjecture by 12 Jun 2023 5:52 UTC; 150 points) (
- There should be more AI safety orgs by 21 Sep 2023 14:53 UTC; 117 points) (
- Shallow review of live agendas in alignment & safety by 27 Nov 2023 11:33 UTC; 76 points) (
- Ending Open Philanthropy Project by 2 Apr 2023 0:22 UTC; 47 points) (
- 18 Apr 2023 14:25 UTC; 38 points) 's comment on We’re losing creators due to our nitpicking culture by (
- Summaries of top forum posts (27th March to 16th April) by 17 Apr 2023 0:28 UTC; 31 points) (
- 12 Jun 2023 17:55 UTC; 28 points) 's comment on Critiques of prominent AI safety labs: Conjecture by (
- 4 Apr 2023 11:20 UTC; 24 points) 's comment on GAP Leadership Probably Participated in an Illegal Straw Donor Scheme by (
- Critiques of prominent AI safety labs: Conjecture by 12 Jun 2023 1:32 UTC; 14 points) (LessWrong;
- Summaries of top forum posts (27th March to 16th April) by 17 Apr 2023 0:28 UTC; 14 points) (LessWrong;
- 18 Mar 2024 21:01 UTC; 13 points) 's comment on Ambitious Impact launches a for-profit accelerator instead of building the AI Safety space. Let’s talk about this. by (
I’ll briefly comment on a few parts of this post since my name was mentioned (lack of comment on other parts does not imply any particular position on them). Also, thanks to the authors for their time writing this (and future posts)! I think criticism is valuable, and having written criticism myself in the past, I know how time-consuming it can be.
I’m worried that your method for evaluating research output would make any ambitious research program look bad, especially early on. Specifically:
I think for any ambitious research project that fails, you could tell a similarly convincing story about how it’s “obvious in hindsight” it would fail. A major point of research is to find ideas that other people don’t think will work and then show that they do work! For many of my most successful research projects, people gave me advice not to work on them because they thought it would predictably fail, and if I had failed then they could have said something similar to what you wrote above.
I think Redwood’s failures here are ones of execution and not of problem selection—I thought the problem they picked was pretty interesting but they could have much more quickly realized the particular approaches they were taking to it were unlikely to pan out. If they had done that, perhaps they would have switched to other approaches that ended up succeeding, or just pivoted to interpretability faster. In any case, I definitely wouldn’t want to discourage them or future organizations from using a similar problem selection process.
(If you asked a random ML researcher if the problem seemed feasible, they would have said no. But I wouldn’t have used that as a reason not to work on the project.)
My personal judgment is that Buck is a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning. I do think Buck will do best in a setting where he’s either paired with a good empirical ML researcher or gains more experience there himself (he’s already gotten a lot better in the past year). But overall I view Buck as on par with a research scientist at a top ML university.
Thank you for this comment, some of the contributors of this post have updated their views of Buck as a researcher as a result.
Thanks for this detailed comment Jacob. We’re in agreement with your first point, but on re-reading the post we can see why it seems like we think the problem selection was also wrong—we don’t believe this. We will clarify the distinction between problem selection and execution in the main post soon.
Our main concerns was that we think it is important, when working on a problem where a lot of prior research has been done, to come in to it with a novel approach or insight. We think its possible the team could have done this via a more thorough literature review or engaging with domain experts. Where we may disagree is that our suggestion of doing more desk research before hand might result in researchers dismissing ideas too easily, and thus experimenting and learning less.
We think this is definitely possible, but feel it can be less costly in some cases, and in particular could have been useful in the case of the adversarial training project. As we write later on in the passage you quoted above, we think that the problem with the adversarial training project was that we think Redwood focused on an unusually challenging threat model (unrestricted adversarial examples), and although we think there were some aspects of the textual domain that make the problem easier, the large number of textual adversarial attacks indicated it was unlikely to be sufficient.
Thanks for this! I think we still disagree though. I’ll elaborate on my position below, but don’t feel obligated to update the post unless you want to.
* The adversarial training project had two ambitious goals, which were the unrestricted threat model and also a human-defined threat model (e.g. in contrast to synthetic L-infinity threat models that are usually considered).
* I think both of these were pretty interesting goals to aim for and at roughly the right point on the ambition-tractability scale (at least a priori). Most research projects are less ambitious and more tractable, but I think that’s mostly a mistake.
* Redwood was mostly interested in the first goal and the second was included somewhat arbitrarily iirc. I think this was a mistake and it would have been better to start with the simplest case possible to examine the unrestricted threat model. (It’s usually a mistake to try to do two ambitious things at once rather than nailing one, moreso if one of the things is not even important to you.)
* After the original NeurIPS paper Redwood moved in this direction and tried a bunch of simpler settings with unrestricted threat models. I was an advisor on this work. After several months with less progress than we wanted, we stopped pursuing this direction. It would have been better to get to a point where we could make this call sooner (after 1-2 months). Some of the slowness was indeed due to unfamiliarity with the literature, e.g. being stuck on something for a few weeks that was isomorphic to a standard gradient hacking issue. My impression (not 100% certain) is Redwood updated quite a bit in the direction of caring about related literature as a result of this, and I’d guess they’d be a lot faster doing this a second time, although still with room to improve.
Note by academic standards the project was a “success” in the sense of getting into NeurIPS, although the reviewers seemed to most like the human-defined aspect of the threat model rather than the unrestricted aspect.
This section has now been updated
I’m missing a lot of context here, but my impression is that this argument doesn’t go through, or at least is missing some steps:
We think that the best Redwood research is of similar quality to work by [Neel Nanda, Tom Lieberum and others, mentored by Jacob Steinhardt]
Work by those others doesn’t cost $20M
Therefore the work by Redwood shouldn’t cost $20M
Instead, the argument which would go through would be:
Open Philanthropy spent $20M on Redwood Research
That $20M produced [such and such research]
This is how you could have spent $20M to produce [better research]
Therefore, Open Philanthropy shouldn’t have spent $20M on Redwood Research, but instead on [alternatives]
(or spent $20M on [alternatives] and on Redwood Research, if the value of Redwood Research is still above the bar)
But you haven’t shown step 3, the tradeoff against the counterfactual. It seems likely that the situation is such that producing good AI safety research depends on somewhat idiosyncratic non-monetary factors. Sometimes you will find a talented independent researcher or a PhD student that will produce quality research for relatively small amounts of money, sometimes you will spend $20M to get an outcome of a similar quality. I could see that being the case if the bottleneck isn’t money, which seems plausible.
Also note that building an institution is potentially much more scalable than funding one-off independent researchers.
As I said, I’m missing lots of context (i.e., I haven’t read Redwood’s research, seems within the normal range of possibility that it wouldn’t be worth $20M), but I thought I’d give my two cents.
I will clarify in my personal case that I did the grokking work as an independent research project and that Jacob only became involved in the project after I had done the core research, and his mentorship was specifically about the process of distillation and writing up the results (to be clear, his mentorship here was high value! But I think that the paper benefited less from his mentorship than is implied by the reference class of having him as the final author)
I agree with this.
Cheers
Also, no reputational harm intended, sorry.
Re your point about “building an institution” and step 3: We think the majority of our expected value comes from futures in which we produce more research value per dollar than in the past.
(Also, just wanted to note again that $20M isn’t the right number to use here, since around 1/3rd of that funding is for running Constellation, as mentioned in the post.)
Thanks for mentioning the $20M point Nate—I’ve edited the post to make this a little more clear and would suggest people use $14M as the number instead.
Cheers
Meta note: We believe this response is the 80⁄20 in terms of quality vs time investment. We think it’s likely we could improve the comment with more work, but wanted to share our views earlier rather than later.
We think one thing we didn’t spell out very explicitly in this post, was the distinction between 1) how effectively we believed Redwood spent their resources and 2) whether we think OP should have funded them (and at what amount). As this post is focused on Redwood, I’ll focus more on 1) and comment briefly on 2) - but note that we plan to expand on this further in a follow-up post. We will add a paragraph which disambiguates between these two points more clearly.
Argument 1): We think Redwood could produce at least the same quality and quantity of research, with fewer resources (~$4-8 million over 2 years)
The key reasons we think 1) are:
If they had more senior ML staff or advisors, they could have avoided some mistakes on their agenda that we see as avoidable. This wouldn’t necessarily come at a large monetary cost given their overall budget (around $200-300K for 1 FTE).
We estimate as much as 25-30% of their spending went towards scaling up projects (e.g. REMIX) before they had a clear research agenda they were confident in. To be fair to Redwood, this premature scaling was more defensible prior to the FTX collapse when the general belief was that there was a “funding overhang”. Nate in his comment also mentions that scaling was raised by both Holden and Ajeya (at OP), and now sees this as an error on their part.
Argument 2): OP should have spent less on Redwood, 2a) and there were other comparable funding opportunities
The key reasons we think 2) are:
There are other TAIS labs (academic and not) that we believe could absorb and spend considerably more funding than they currently receive. Example non-profits include CAIS and FAR AI and underfunded safety-interested academic groups include David Krueger and Dylan Hadfield-Menell’s groups. Opportunities are more limited if focusing specifically on interpretability, but there are still a number of promising options. For example, Neel Nanda mentioned three academics he considers do good interpretability work: OP has funded one of them (David Bau) but as far as we know not the other two (of course, they may not have room for more funding, or OP may have investigated and decided not to fund them for other reasons).
A key reason OP may not think some of these labs are worth funding on the margin is that they are substantially more bullish on certain safety research agendas than others. We have some concerns about how the OP LT team decide which agendas to support but will explore this further in our Constellation post, so won’t comment in more depth at this point. As one of the main funders of TAIS work, in a field which is very speculative and new, we think OP should be more open to a broad range of research agendas than they are.
We think that small, young organizations without a track record beyond founder reputation should in general be given smaller grants and build up a track record before trying to scale. We think it’s plausible that several of the issues we pointed out could have been mitigated by this funding structure.
My understanding is that, had Redwood not existed, OpenPhil would not have significantly increased their funding to these other places, and broadly has more money than they know what to do with (especially in the previous EA funding environment!). I don’t know whether those other places have applied for grants, or why they aren’t as funded as they could be, but this doesn’t seem that related to me. And more broadly there are a bunch of constraints on grant makers like time to evaluate a grant, having enough context to competently evaluate it or external advisors with context who they trust, etc. Eg, I’m a bit hesitant about funding Interpretability academics who I think will go full steam ahead on capabilities (I think it’s often worth doing anyway, but not obvious to me, and the one time I recommended a grant here it did consume quite a lot of my time to evaluate the nuances)
And that grant making is just really not an efficient market, and there’s lots of good grants that don’t happen fordumb reasons
Concretely, it’s plausible to me that taking themarginal 1 million given to Redwood and dividing it evenly among the other labs you mention seems good. But that doesn’t feel like the right counterfactual here.
To push back on this point, presumably even if grantmaker time is the binding resource and not money, Redwood also took up grantmaker time from OP (indeed I’d guess that OP’s grantmaker time on RR is much higher than for most other grants given the board member relationship). So I don’t think this really negates Omega’s argument—it is indeed relevant to ask how Redwood looks compared to grants that OP hasn’t made.
Personally, I am pretty glad Redwood exists and think their research so far is promising. But I am also pretty disappointed that OP hasn’t funded some academics that seem like slam dunks to me and think this reflects an anti-academia bias within OP (note they know I think this and disagree with me). Presumably this is more a discussion for the upcoming post on OP, though, and doesn’t say whether OP was overvaluing RR or undervaluing other grants (mostly the latter imo, though it seems plausible that OP should have been more critical about the marginal $1M to RR especially if overhiring was one of their issues).
My prior is that people who Jacob thinks are slam-dunks should basically always be getting funding, so I’m pretty surprised by this anecdote. (In general I also expect that there are a lot of complex details in cases like these, so it doesn’t seem implausible that it was the right call, but it seemed worth registering the surprise.)
I work at Open Philanthropy, and in the last few months I took on much of our technical AI safety grantmaking.
In November and December, Jacob sent me a list of academics he felt that someone at Open Phil should reach out to and solicit proposals from. I was interested in these opportunities, but at the time, I was full-time on processing grant proposals that came in through Open Philanthropy’s form for grantees affected by the FTX crash and wasn’t able to take them on.
This work tailed off in January, and since then I’ve focused on a few bigger grants, some writing projects, and thinking through how I should approach further grantmaking. I think I should have reached out to at least a few of the people Jacob suggested earlier (e.g. in February). I didn’t make any explicit decision to reject someone that Jacob thought was a slam dunk because I disagreed with his assessment — rather, I was slower to reach out to talk to people he thought I should fund than I could have been.
I plan to talk to several of the leads Jacob sent my way in Q2, and (while I would plan to think through the case for these grants myself to the extent I can) I expect to end up agreeing a lot with Jacob’s assessments.
With that said, Jacob and I do have more nebulous higher-level disagreements about things like how truth-tracking academic culture tends to be and how much academic research has contributed to AI alignment so far, and in some indirect way these disagreements probably contributed to me prioritizing these reach outs less highly than someone else might have.
This seems fair, I’m significantly pushing back on this as criticism of Redwood, and as focus on the “Redwood has been overfunded” narrative. I agree that they probably consumed a bunch of grant makers time, and am sympathetic to the idea that OpenPhil is making a bunch of mistakes here.
I’m curious which academics you have in mind as slam dunks?
Thanks Nuno, I’m sharing this comment with the other contributors and will respond in depth soon. I think you’re right that we could be more explicit on 3).
Cheers
Thanks to the authors for taking the time to think about how to improve our organization and the field of AI takeover prevention as a whole. I share a lot of the concerns mentioned in this post, and I’ve been spending a lot of my attention trying to improve some of them (though I also have important disagreements with parts of the post).
Here’s some information that perhaps supports some of the points made in the post and adds texture, since it seems hard to properly critique a small organization without a lot of context and inside information. (This is adapted from my notes over the past few months.)
Most importantly, I am eager to increase our rate of research output – and critically to have that increase be sustainable because it’s done by a more stable and well-functioning team. I don’t think we should be satisfied with the current output rate, and I think this rate being too low is in substantial part due to not having had the right organizational shape or sufficiently solid management practices (which, in empathy with the past selves of the Redwood leadership team, is often a tricky thing for young organizations to figure out, and is perhaps especially tricky in this field).
I think the most important error that we’ve made so far is trying to scale up too quickly. I feel bad about the ways in which this has contributed to people who’ve worked here having an unexpectedly bad experience. I believe this was upstream of other organizational mistakes and that it put stress on our relative inexperience in management. While having fewer staff gives fewer people a chance to have roles working on our type of AI alignment research, I expect it will help increase the management quality per person. For example, I think there will be more and better opportunities for researchers at Redwood to grow, which is something I’ve been excited to focus on. I think scaling too quickly was somewhat downstream of not having an extremely clear articulation of what specific flavor of research output we are aiming to produce and, in turn, having a tested organization that we believe reliably produces those outputs.
I think this was an unforced error on our part – for example, Holden and Ajeya expressed concerns to me about this multiple times. My thinking at the time was something like “this sure seems like a pretty confusing field in a lot of ways, and (something something act-omission bias) I’m worried that if we chose an unrealistically high standard for clarity to gate on for organizational growth, then we might learn more slowly than we might otherwise, and fail to give people opportunities to contribute to the field.” I now think that I was wrong about this.
With that said, I’ll also briefly note some of the ways I disagree with the content and framing of this post:
We think our “causal scrubbing” work is our most significant output so far – substantially more important than, for example, our “Interpretability in the Wild” work.
At the beginning of our adversarial training project, we reviewed the literature (including the papers in the list that the above post links to) and discussed the project proposal with relevant experts. I think we made important mistakes in that project, but I don’t think that we failed to understand the state of the field.
I am moderately optimistic about Redwood’s current trajectory and our potential to contribute to making the future go well. I feel substantially better about the place that we’re in now relative to where we were, say, 6 months ago. We remain a relatively young organization making an unusual bet.
I really appreciate feedback, and if anyone reading this wants to send feedback to us about Redwood, you can email info at rdwrs.com or, if you prefer anonymity, visit www.admonymous.co/redwood.
Ditto pseudonym, I recognize from another comment that there is an upcoming Constellation post from the original poster and a more effortful response forthcoming there, but I still think that despite receiving this piece in advance I am kind of surprised the following were not responded to?
Lack of Senior ML Research Staff
Lack of Comm… w/ ML Community
Conflicts of interest with funders
I guess people are busy and this is not a priority—seems like people are mostly thinking about Underwhelming Research Output (and Nate himself seems to say as much here)
Hi Nate, can you comment a bit more about this section?
I feel like this would be among the more negative updates I would make about Redwood if true, but think it would be possible that there are differences in how a specific event is seen by different parties. Specifically, these seem to reflect weaker organizational or management practices that aren’t to do with Redwood making an “unusual bet” (though relevant to it being a young organization).
Specifically:
Has Redwood ever terminated someone for losing productivity that they otherwise wouldn’t have, due to a personal life event?
Does Redwood have a policy around leave that includes support for personal life events?
Does Redwood have a clear termination process including warnings before a termination where reasonable, and opportunities for an employee to course-correct with the support of the organization?
I think I’m an unusual case, but I found out a short term contract had been ended early through an automated email, and I received no response when contacting several Redwood staff to check if I had been terminated.I think this is very uncharacteristic though: they’re all good people and I’m net optimistic about Redwood’s future. I think they can improve their communication around hiring/trialling/firing processes though.
Edit: I’ve chatted with Buck and it seems like this was a communication problem.
I know nothing about this organisation, and very little about this field, but this is an impressively humble and open response from a leader of an org in the face of a very critical article. No comment on content, but I appreciate the approach @Nate Thomas
Thanks for taking the time to write thoughtful criticism. Wanted to add a few quick notes (though note that I’m not really impartial as I’m socially very close with Redwood)
- I personally found MLAB extremely valuable. It was very well-designed and well-taught and was the best teaching/learning experience I’ve had by a fairly wide margin
- Redwood’s community building (MLAB, REMIX and people who applied to or worked at Redwood) has been a great pipeline for ARC Evals and our biggest single source for hiring (we currently have 3 employees and 2 work triallers who came via Redwood community building efforts).
- It was also very useful for ARC Evals to be able to use Constellation office space while we were getting started, rather than needing to figure this out by ourselves.
- As a female person I feel very comfortable in Constellation. I’ve never felt that I needed to defer or was viewed for my dating potential rather than my intellectual contributions. I do think I’m pretty happy to hold my ground and sometimes oblivious to things that bother other people, so that might not be a very strong evidence that it isn’t an issue for other people. However, I have been bothered in the past by places that try to make up the gender balance by hiring a lot of women for non-technical roles. In these places, people assume that the women who are there are non-technical. I think it would make the environment worse for me personally if there was pressure for Constellation to balance the gender ratios.
- I think there have been various ways in which Redwood culture and management style were not great. I think some of this was due to difficult tradeoffs or normal challenges of being a new organization, and some of it was unforced errors. I think they are mostly aware of the issues and taking steps to fix them, although I don’t think I expect them to be excellent at management that soon. Some of my recommendations (which I’ve told them before and think they have mostly taken on board):
-- If Buck is continuing to manage people (and maybe also if not), he should get management coaching
—Give employees lots of concrete positive feedback (at least once per week)
-- When letting people go, be very clear that hiring is noisy, people perform differently at different organizations; Redwood is a challenging and often low-management environment that, like a PhD program, is not a good fit for everyone; they shouldn’t be too discouraged. (I think Redwood believes this but hasn’t been as clear as they could be about communicating it)
-- Make sure expectations are clear for work trials
—Make growth for their employees a serious priority, especially for their top performers—this should be something that is done deliberately with time set aside for it
Strong +1, I was really impressed with the quality of MLAB. I got a moderate amount out of doing it over the summer, and would have gotten much much more if I had done it a year or two before. I think that kind of outreach is high value, though plausibly a distraction from the core mission
At least as written, this is so broad as to be effectively meaningless. All organisations exert social pressure on members to act in a certain way (e.g. to wipe down the exercise machines after use). Similarly, basically all employers require some degree of deference to management; typical practice is that management solicit feedback from workers but in turn compliance with instructions is mandatory.
What you describe could be bad… or it could be totally typical. There’s no real way for the reader to judge based on what you’ve written.
Hi Larks, thanks for the pushback here. We agree that this is hard to judge. Unfortunately, some of what this was was about the general atmosphere of the place which is unfortunately a bit fuzzy.
People said they feel a pressure conform / defer to these people as well for example at lunchtime conversations. People have also said they can’t act as free or as loose as they would like in Constellation. So it’s maybe something like feeling like you have to behave in a certain way or in line with what you perceive the funders and senior leadership want in order to fit in.
Although this may be present in other offices, we think this pressure is more pronounce at Constellation than other coworking spaces like the Open Phil offices or Lightcone, where we think there is more of an ability to say and do what you want.
We know this probably isn’t as satisfying as it could be, but appreciate you taking the time to point this out and we will edit the post to acknowledge this.
Update: the post has been edited.
One quick point: I feel pretty confused about the “Lack of Senior ML Research Staff” criticism. Senior ML research staff are one of the biggest bottlenecks in alignment, and so this feels particularly un-actionable as a criticism, especially given that you’re leading with it. (That’s particularly true when it comes to hiring for full-time roles, but I expect also relevant when it comes to recruiting good advisors.)
You concretely note that Redwood “terminated some of their more experienced ML research staff”, but once you’ve hired somebody you get a huge amount of data on their performance on many different axes, which makes it hard to interpret this as a bias against experienced ML researchers.
Seems like to the degree it’s valid, it’s actionable for people who might consider working with or funding Redwood.
Good critique, my main conclusion is that redwood seems reasonable overall and not far out of line from other ai safety orgs. Benchmarked against non-ai safety orgs, I would have my usual critique that redwood (and other longtermist orgs) seems unreasonably expensive for reasons I don’t quite understand. Does salary really make that big a difference in attracting talent? If that is the case, what does that say about our community’s values?
In any case, remember that every org has issues. When listing every issue an org has in a row it can give an impression of things being worse than they really are. Would love a similar critique be made of the organization I co-founded once we grow to a similar size. More critique is good for the community.
We should be able to write scathing criticisms without getting mad at each other. We need to be able to read criticisms and not go completely ham and want to see the org and everyone associated guillotined.
Can you say more about what your implicit benchmark actually is here? Taken literally, “non-ai safety orgs” possibly describes almost all human organizations.
Startups would be another good reference class. VCs are incentivized to scale as fast as possible so they can cash out and reinvest their money, but they rarely give a new organization as much money as Redwood received.
Startups usually receive a seed round of ~$2M cash to cover the first year or two of business, followed by ~$10M for Series A to cover another year or two. Even Stripe, a VC wunderkind that’s raised billions privately while scaling to thousands of employees around the world, began with $2M for their first year, $38M for the next three years (2012-2014), and $70M for the next two years after that.
I’m not sure how long Redwood’s $21M is meant to cover, but if it’s less than a period of 4 years, then they’re spending more than the typical 5M/year for a Series A startup. There’s a good argument to be made that OP can be more risk tolerant than most VCs and take a big swing on scaling Redwood quickly. But beyond cost-effectiveness, another downside of fast funding is that scaling organizations effectively is very difficult, and it could be counterproductive to hire quickly before you have senior management in place with clear lines of tractable work.
Some numbers here (https://www.investopedia.com/articles/personal-finance/102015/series-b-c-funding-what-it-all-means-and-how-it-works.asp) and here (https://www.fundz.net/what-is-series-a-funding-series-b-funding-and-more). For Stripe funding numbers, google crunchbase Stripe Seed / Series A / Series B.
Thanks, this is helpful. One thing to flag is that I wouldn’t find the 2012-2014 numbers very convincing; my impression is that VC funding increased a lot until 2022, and 2021 was a year where capital was particularly cheap, for reasons that in hindsight were not entirely dissimilar to why longtermist EA was (relatively) well-funded in the last two years.
Yep that’s a good point. Here’s one source on it, funding amounts definitely increased throughout the 2010s. An alternative explanation could be that valuations have increased more than funding amounts. There’s some data to support this, but you’d need a more careful comparison of startups within the same reference class to be sure.
Thanks, appreciate the concrete data!
I appreciate this comment for giving concrete data that improves my model of the world. Thanks.
Maybe his own org and other global development orgs? I think it’s almost always a mistake for a non-profit to get this much money this quickly, regardless of how much potential they have or the good reputation of their founders. It is difficult to gradually build an org and organically make the inevitable mistakes when you are given 10 million dollars in the first year.
I won’t speak for @MathiasKB , but these agree some of my benchmarks outside the AI realm—he can share what he means :).
The Center for Effective aid policy Matthias and co is a brand new org, so don’t have evidence of outputs or financials yet. They were given 170,000US to start up. To many in the development world even 170k might still seem like a lot for an NGO to start with, but it’s still a lot less than 10 million.
Last year our org OneDay Health which has a decent chance of being effective employed 43 staff, launched 8 Health centers, treated 50,000 patients in the most remote rural parts of Uganda, and our total expenditure for the year was $104,000 US dollars.
If we are looking at a development org with a budget on a similar scale, Last Mile Health has a 10 year track record, grew steadily, has won countless awards (Social innovation, TED prizes etc), has been a crucial part of the global movement for rolling out community health workers impacting improving health access to millions of people accross 5+ countries, and employs hundreds of people both in the US and developing world. They spent about 26 million dollars last year. Which is a lot of money, but in the ballpark of Redwood research and only after many years of high performance, proven recognition and growth.
Even as a global development guy, I think AI alignment research is important, but it is somewhat hard to understand why it’s a good idea for a new, small org like this to get this much money from the getgo. Perhaps start with 1 million in the first year with the CEO and co-founder taking a low-ish salary while the org builds their reputation then ramp things up after that?
Mind you if we really do only have 5-20 years before potentially dangerous GAI, maybe we have to sacrifice sustainable growth and stewardship of money at the altar of having a chance to save the world?
ha good point! I specifically had non-ai EA orgs in mind, could have made that clearer!
The better reference class is adversarially mined examples for text models. Meta and other researchers were working on a similar projects before Redwood started doing that line of research. https://github.com/facebookresearch/anli is an example. (Reader: evaluate your model’s consistency for what counts as alignment research—does this mean non-x-risk-pilled Meta researchers do some alignment research, if we believe RR project constituted exciting alignment research too?)
Separately, I haven’t seen empirical demonstrations that pursuing this line of research can have limited capabilities externalities or result in differential technological progress. Robustifying models against some kinds of automatic adversarial attacks (1,2) does seem to be separable from improving general capabilities though, and I think it’d be good to have more work on that.
This researcher’s work attitude is only part of a spectrum. Many researchers find great returns working 80+ hours a week. Some labs differentiate themselves by having usual hours, but many successful labs have their members work a lot, and that works out well. For example, Dawn Song’s students work a ton, and some other Berkeley grad students in other labs are intimidated by her lab’s hours, but that’s OK because her graduate students find that environment suitable. It’d be nice if this post was more specific about how much of the work culture discontent is about hours vs other issues.
I agree that’s a good reference class. I don’t think Redwood’s project had identical goals, and would strongly disagree with someone saying it’s duplicative. But other work is certainly also relevant, and ex post I would agree that other work in the reference class is comparably helpful for alignment
Of course! I’m a bit unusual amongst the EA crowd in how enthusiastic I am about “normal” robustness research, but I’m similarly unusual amongst the EA crowd in how enthusiastic I am this proposed research direction for Redwood, and I suspect those things will typically go together.
I’m still not convinced by this perspective. I would frame the situation as:
There’s a task we really want future people to be good at—finding places where models behave in obviously-undesirable ways, and understanding the limitations of such evaluations and the consequences of training on adversarial inputs.
That task isn’t obviously improving automatically with model capabilities, it seems like something that requires knowledge and individual+institutional expertise.
So maybe we should practice a lot to get better at that task, sharing what we learn and building a larger community of researchers and engineers with relevant experience.
Your objection sounds like: “That may be true but there’s not a lot of evidence that this doesn’t also make models more capable, which would be bad.” And I don’t find that very persuasive—I don’t think there is such a strong default presumption that generic research accelerates capabilities enough to be a meaningful cost.
On the question of what generates differential technological progress, I think I’m comparably skeptical of all of the evidence on offer for claims of the form “doing research on X leads to differential progress on Y,” and the best guide we have (both in alignment and in normal academic research!) is basically common-sense arguments along the lines of “investigating and practicing doing X tends to make you better at doing X.”
I agree it is not duplicative. It’s been a while, but if I recall correctly the main difference seemed to be that they chose a task with gave them a extra nine of reliability (started with an initially easier task) and pursued it more thoroughly.
I think if we find that improvement of X leads to improvement on Y, then that’s some evidence, but it doesn’t establish that it’s differential. If we find that improvement on X also leads to progress on thing Z that is highly indicative of general capabilities, then that’s evidence against. If we find that it mainly affects Y but not other things Z, then that’s reasonable evidence it’s differential. For example, so far, transparency hasn’t affected general capabilities, so I read that as evidence of differential technological progress. As another example, I think trojan defense research differentially improves our understanding our trojans; I don’t see it making models better at coding or gaining new general instrumental skills.
I think commonsense is too unreliable of a guide when thinking about deep learning; deep learning findings are phenomena are often unintelligible even in hindsight (I still don’t understand why some of my research papers’ methods work). That’s why I’d prefer empirical evidence. Empirical research claiming to differentially improve safety should demonstrate a differential safety improvement empirically.
In my understanding, there was another important difference in Redwood’s project from the standard adversarial robustness literature: they were looking to eliminate only ‘competent’ failures (ie cases where the model probably ‘knows’ what the correct classification is), and would have counted it a success if there were still failures if the failure was due to a lack of competence on the model’s part (e.g. ‘his mitochondria were liberated’ → implies harm but only if you know enough biology)
I think in practice in their exact project this didn’t end up being a super clear conceptual line, but at the start it was plausible to me that only focusing on competent failures made the task feasible even if the general case is impossible.
Thanks for the comment Dan. I agree that the adversarially mined examples literature is the right reference class, of which the two that you mention (Meta’s Dynabench and ANLI) were the main examples (maybe the only examples? I forget) while we were working on this project.
I’ll note that Meta’s Dynabench sentiment model (the only model of theirs that I interacted with) seemed substantially less robust than Redwood’s classifier (e.g. I was able to defeat it manually in about 10 minutes of messing around, whereas I needed the tools we made to defeat the Redwood model).
I think the adversarial mining thing was hot in 2019. IIRC, Hellaswag and others did it; I’d venture maybe 100 papers did it before RR, but I still think it was underexplored at the time and I’m happy RR investigated it.
Thank you for the post!
I have long suspected that EA organizations in other cause areas have been put to higher standards of evaluation while getting funding (I am mainly referring to EA ones, but not only) than AI safety. I think I have slightly updated upward on the likeliness of this view being right after reading this post.
More information on the comparison I am suspecting and updating, using EA animal welfare organizations as example as I had some experience in this cause area. My suspicion is that, relative to AI safety grants animal welfare organizations receive much more scrutiny on their track records, experience of staff, work culture, etc.
Also, my observation is that in animal welfare organizations efforts to try to pay more sustainable and competitive salaries (from what are quite low levels and huge relative pay-cuts) to staff is not particularly welcome by all donors. (to be fair to the donors, some EA animal welfare organizations paying very low salaries is due to their management who refuse to pay higher). I am therefore puzzled why this kind of pressure doesn’t seem to exist as much in some other EA cause areas (and why it has to exist, in its current extent, in EA animal welfare). Granted, an underlying reason AI safety organizations pay high salaries is because the salaries people who can work in AI safety organizations can get in for profits are high(er) and they are already having huge pay-cuts to work in non-profit AI safety organizations. But it does seem to me judging from the salary levels said in this post Redwood might be experiencing much less pressure to suppress salary levels, comparatively. Also notice that they also earn significantly more than their peers who work in academia, which is something that isn’t generally seen in EA animal welfare.
I think I am not the only one having this kind of suspicion. At least 5 people from EA animal welfare have expressed to me their concerns, even complaints, that non-longtermist organizations are being treated unfairly relative to longtermist organizations, especially AI safety ones. According to my observation and I hope I am wrong, there seems to be some anti-longtermism/anti—AI safety sentiment flowing around in the animal welfare cause. I think this might be causing some community building problems within EA and maybe worth addressing. (Fwiw I endorse some form of longtermism and I see a connection between animal welfare and longtermism. I now work on AI’s impact on animals)
I find salary pretty confusing. My current guess is that EAs are too willing to flatten salary across different counterfactuals and experience levels, rather than too unwilling. In particular, one intuitive heuristic in my head is something like “many people are willing to give up 20-50% of salary to do the right thing, but relatively few people are willing to give up >>70%.”
Maybe this is wrong? I know there’s empirical research that people with more money benefit less from percentage increases in their spending, so I can see why e.g. someone with a 50k salary taking a 25% paycut is similarly (or more!) costly to someone with a 300k salary taking a 70% paycut. But it’s not very intuitive to me, and I’m confused why this point is not more often brought up when discussing questions of salary fairness.
Hey this is Bill—I help run Constellation. Thank you for sharing feedback.
It sounds like you’re planning to write a future post on Constellation that I imagine might have more specifics and that we will have an opportunity to engage with in advance, so maybe it makes sense to respond more then. At the risk of oversimplifying a complex topic, it’s important to Constellation’s mission that Constellation is a good place to work and talk with others, and we care a lot about the culture being welcoming and comfortable for members and visitors.
We really appreciate feedback. If anyone has feedback on Constellation that they’d feel comfortable sharing with me, you can email me at bill@rdwrs.com.
Hi Bill, yes your understanding is correct—we will be writing a post in the future abotu Constellation, and we will share a draft ahead of time with you / Redwood.
Minor note that an anonymous feedback form might help to elicit negative feedback here. I appreciate the openness to criticism! (I don;t have significant negative feedback, I like constellation a lot, this is just a general note)
Agreed. We have a Constellation-internal anonymous form that isn’t set up well for external feedback, and I didn’t want to block on setting it up before replying.
I’m flattered, but as Nuno notes I think this is a poor and somewhat unfair argument.
I think that causal scrubbing is probably on par with interpretability in the wild for good interp work, and has helped influence work at other interpretability labs (within TAIS)
Most academic interp work, in my biased opinion, is just not very good when it comes to genuinely scaling to large LLMs, or being relevant to my work (with exceptions—I think Christopher Pott’s lab, David Bau’s and Martin Watternberg’s do good work)
Afaict, interpretability in the wild was not an organisational priority, and predominantly worked on by 3 junior staff (with advice from Jacob Steinhardt).
I personally really like interp in the wild, and it’s influenced my research much more than most other interp work
As Nuno notes, I can;t see how else to spend $20M to get more good interp work (naively, I’m not claiming no such ways exist)
Redwood is mostly pursuing a different path to interp, I personally think this is less promising, but I like having a diverse range of agendas and wish more power to them
I also broadly think that publishing and engaging with the broader ML community is less obviously good for interpretability, as noted I just don’t think most work is very relevant. I think it’s a bet worth making (and am excited about interp in the wild and my grokking work getting into ICLR!), but definitely not obviously worth the effort, eg I think it’s probably the right call that Anthropic doesn;t try to publish their work. Putting pre-prints on Arxiv seems pretty cheap, and I’m pro that, but I think seriously aiming for academic publications is a lot of work (more than 10-20% of a project IMO) and I feel pretty good about Redwood only trying for this when they have employees who are particularly excited about it.
To be clear, I totally think Redwood’s output can increase substantially, that this is worthy goal, and that some of the criticism here is directionally correct and could be high value. But I think this is a claim of “Redwood are already one of the top 5 interp labs in the world by my lights (but I have a pretty low bar...), and I’d love them to be even better”
I’m commenting on the parts of this post that I most disagreed with and feel most qualified to opine upon. I broadly think this was a good post, agree with much of the criticism (though disagree with much of it too). Thanks for writing it! I hope it helps Redwood become a better org.
(written in first person because one post author wrote it)
I think this is the area we disagree on the most. Examples of other ideas:
1. Generously fund the academics who you do think are doing good work (as far as I can tell, two of them—Christopher Pott and Martin Watternberg—get no funding from OP, and David Bau gets an order of magnitude less). This is probably more on OP than Redwood, but Redwood could also explore funding academics and working on projects in collaboration with them.
2. Poach experienced researchers who are executing well on interpretability but working on what (by Redwood’s lights) are less important problems, and redirect them to more important problems. Not everyone would want to be “redirected”, but there’s a decent fraction of people who would love to work on more ambitious problems but are currently not incentivized to do so, and a broader range of people are open to working on a wide range of problems so long as they are interesting. I would expect these individuals to cost a comparable amount to what Redwood currently pays (somewhat less if poaching from academia, somewhat more if poaching from industry) but be able to execute more quickly as well as spread valuable expertise around the organization.
3. Make one-year seed grants of around $100k to 20 early-career researchers (PhD students, independent researchers) to work on interpretability, nudging them towards a list of problems viewed important by Redwood. Provide low-touch mentorship (e.g. once a month call). Scale up the grants and/or hire people from the projects that did well after the one-year trial.
I wouldn’t confidently claim that any of these approaches would necessarily best Redwood, but there’s a large space of possibilities that could be explored and largely has not been. Notably, the ideas above differ from Redwood’s high-level strategy to date by: (a) making bets on a broad portfolio of agendas; (b) starting small and evaluating projects before scaling; (c) bringing in external expertise and talent.
I think I largely agree the percentage of interpretability papers that are relevant to large-scale alignment is disappointingly low. However, the denominator is very large, so I still expect the majority of TAIS-relevant interpretability work to happen outside TAIS organizations. Given this I’d argue there’s considerable value communicating to this subset of the ML research community. Perhaps a peer-reviewed publication is not the best way to do this: I’d be happy to see Redwood staff e.g. giving talks at a select subset of academic labs, but to the best of our knowledge this hasn’t happened.
I agree that getting from the stage of “scrappy preprint / blog post that your close collaborators can understand” to “peer-reviewed publication” can be 10-20% of a project’s time. However, in my experience the clarity of the write-up and rigor of the results often increase considerably in that 10-20%. There are some parts of the publication process that are complete wastes of time (reformatting from single to double column, running an experiment that you already know the results of but that reviewer 2 really wants to see), but in my experience these have been a minority of the work—no more than 5% of the overall project time. I’m curious if you view this as being significantly more costly than I do, or the improvements to the project from peer-review as being less significant.
Sorry for the long + rambly comment! I appreciate the pushback, and found clarifying my thoughts on this useful
I broadly agree that all of the funding ideas you point to seem decent. My biggest crux is that the counterfactual of not funding Redwood is not that one of those gets funded, and that the real constraints here around logistical effort, grantmaker time, etc. I wrote a comment downthread with further thoughts on these points.
And that it is not Redwood’s job to solve this—they’re pursuing a theory of change that does not depend on these, and it seems very unreasonable to suggest that they should pursue one of these other uses of money instead, even if you think that the use of money is a great idea.
Re 1, concretely, I’ve been trying to help one of those professors get more funding for his lab, and think this is a high impact use of money. But think that evaluating professors is hard, thinking through capabilities externalities is hard, figuring out a lab’s room for more funding is hard, it’s harder to burn a ton of money productively in academia, eg >$1mn (eg, it’s pretty hard to just hire a bunch of engineers, and interp doesn’t really need a ton of compute). There’s also dumb network problems where the academics don’t know how to get funding, it’s not very legible how to apply to OpenPhil, not everyone is comfortable taking EA money, etc (I would like these problems to be solved, to be clear). I don’t think it’s a matter of just having more money.
I don’t know anyone like this. If you do, please intro me! (I met someone vaguely in this category and helped them to get an FTX grant at the start of November.… But they only tangentially fit your description). I’m pretty unconvinced there’s many people like this out there who could be redirected to productively do what I consider good interp work—beyond just motivation and interest in doing independent-ish work, there’s also significant considerations of research taste, having mentorship to do work I think is important, etc.
Seems good, I’d be excited about this happening. I consider my MATS scholars to be vaguely in the spirit of this, and I’ve been very impressed with them. But, like, this is so not bottlenecked on money. It’s a substantial program that would take effort to run, it’s not clear to me that these people would do good work without mentorship (1/month might be sufficient), it’s not clear that this adds much value beyond existing independent researcher grants, etc. But I do think it’s a decent idea—if anyone is interested in making this happen, please reach out!
There’s some work I think is cool, but it tends to be concentrated in a small handful of actually good labs (eg I like ROME and Emergent World Representations a lot). There’s a bunch of work I think isn’t great, but sometimes has great gems in it. But honestly I think that well over a majority of impact weighted TAIS work was done by the TAIS community (specifically, Chris Olah + collaborator’s work is quite possibly a majority in my mind). I’d be interested in being pointed to work that you think is great that I’m missing—I personally find literature reviews to be pretty tedious, and think I underinvest in this kind of thing.
More broadly, my position is that engaging with academia is a theory of change, but one of many. It’s a significant investment of time, some people are much better at it than others (eg, I personally just hate writing papers, and am much worse at it than just directly trying to do good research, or write blog posts/educational materials/good tooling), it’s hard to direct in targeted ways, benefits a bunch from legible signalling and credentials, etc. I also think Redwood are more pessimistic on it than I am, and eg I am personally not convinced that trying to get grokking into ICLR was a good use of time and effort (though I hope it was!). I think Redwood are making a pretty reasonable bet here.
As a negative example here, I think Distill was a major investment of effort into influencing academia, including on doing better interp work, and it basically failed as far as I can tell (despite, to my eyes, Distill papers being notably higher quality and more interesting than conference papers)
I want to distinguish two things—putting in the effort to make a write-up really good, and putting in the effort to eg get it accepted at ICLR/ICML/NeurIPS. I am pretty pro making write-ups really good (I personally am not very good at it and try to avoid it where possible, but this is a personal taste not a value judgement). Eg I really like Anthropic interp papers (though am biased) and think the effort put into presentation and clarity is pretty well spent. And I think that part of submitting to a top conference is making things tightly and clearly phrased, having good figures, making them well presented, having good evidence for your results.
IMO the biggest cost is shaping the results and narrative of your work to fit the kind of thing that reviewers look for, and think is good. I broadly think this just isn’t that correlated with what good interp work looks like. I think this can be extremely expensive if you let it shape your research process, choice of projects, etc for “this would make a good publication”. In cases like grokking, I did the research I wanted to do, and we then decided to go for a publication, which I think was basically fine, and much less costly. But it did involve significant reshaping and optimisation of the narrative (I am personally sad that progress measures got into the title lol).
Idk, these are complex questions, and there are people I respect who are way more or less pro academia + publishing than me. I am personally pretty biased against academia and publishing, and this affects my value judgements here.
I think it’s great that you’re releasing some posts that criticize/red-team some major AIS orgs. It’s sad (though understandable) that you felt like you had to do this anonymously.
I’m going to comment a bit on the Work Culture Issues section. I’ve spoken to some people who work at Redwood, have worked at Redwood, or considered working at Redwood.
I think my main comment is something like you’ve done a good job pointing at some problems, but I think it’s pretty hard to figure out what should be done about these problems. To be clear, I think the post may be useful to Redwood (or the broader community) even if you only “point at problems”, and I don’t think people should withhold these write-ups unless they’ve solved all the problems.
But in an effort to figure out how to make these critiques more valuable moving forward, here are some thoughts:
If I were at Redwood, I would probably have a reaction along the lines of “OK, you pointed out a list of problems. Great. We already knew about most of these. What you’re not seeing is that there are also 100 other problems that we are dealing with: lack of management experience, unclear models of what research we want to do, an ever-evolving AI progress landscape, complicated relationships we need to maintain, interpersonal problems, a bunch of random ops things, etc. This presents a tough bind: on one hand, we see some problems, and we want to fix them. On the other hand, we don’t know any easy ways to fix them that don’t trade-off against other extremely important priorities.”
As an example, take the “intense work culture” point. The most intuitive reaction is “make the work culture less intense—have people work fewer hours.” But this plausibly has trade-offs with things like research output. You could make the claim that “on the margin, if Redwood employees worked 10 fewer hours per week, we expect Redwood would be more productive in the long-run because of reduced burnout and a better culture”, but this is a substantially different (and more complicated) claim to make. And it’s not obviously-true.
As another example, take the “people feel pressure to defer” point. I personally agree that this is a big problem for Redwood/Constellation/the Bay Area scene. My guess is Buck/Nate/Bill agree. It’s possible that they don’t think it’s a huge deal relative to the other 100 things on their plate. And maybe they’re wrong about that, but I think that needs to be argued for if you want them to prioritize it. Alternatively, the problem might be that they simply don’t know what to do. Like, maybe they could put up a sign that says “please don’t defer—speak your mind!” Or maybe they could say “thank you” more when people disagree, or something. But I think often the problem is that people don’t know what interventions would be able to fix well-known problems (again, without trading off against something else that is valuable).
I’m also guessing that there are some low-hanging fruit interventions that external red-teamers could identify. For example, here are three things that I think Redwood should do:
Hire a full-time productivity coach/therapist for the Constellation offices. (I recommended this to Nate many months ago. He seemed to (correctly, imo) predict that burnout would be a big problem for Redwood employees, and he said he’d think about the therapist/coach suggestion. I believe they haven’t hired one.)
Hire an external red-teamer to interview current and former employees, identify work culture issues, and identify interventions to improve things. Conditional on this person/team identifying useful (and feasible) interventions, work with leadership to actually get them implemented. (I’m not sure if they’re doing this, and also maybe your group is already doing this, but the post focused on problems rather than interventions?)
Have someone red-team communications around employee expectations, work-trial expectations, and expectation-setting during the onboarding process. I think I’m fine with some people opting-in to a culture that expects them to work X hours a week and has Y intensity aspects. I’m less fine with people feeling misled or people feeling unable to communicate about their needs. It seems plausible to me that many of the instances of “Person gets fired or quits and then feels negatively toward Redwood & encourages people not to work there” (which happens, btw) could be avoided/lessened through really good communication/onboarding/expectation-setting. (I have no idea what Redwood’s current procedure is like, but I’d predict that a sharp red-teamer would be able to find 3+ improvements).
These are three examples of interventions that seem valuable and (relatively) low-cost to me. I’d be excited to see if your team came up with any intervention ideas, and I’d be excited to see a “proposed intervention” section in future reports. (Though again, I don’t think you should feel like you need to do this, and I think it’s good to get things out there even if they’re just raising awareness about problems).
Hi Akash,
Thank you for sharing your thougths & those concrete action items—I agree it would be nice to have a set of recommendations in an ideal world.
This post took at least 50 hours (collectively) to write, and was delayed in publishing by a few days due to busy schedules. I think if we had more time, I would have shared the final version with a small set of non-redwood beta reviewers for comments which would have caught things like this (and e.g. Nunos’ comment).
We plan to do this for future posts (if you’re reading this and would like to give comments on future posts, please DM us!).
We’ll consider adding an intervention section to future reports time permitting (we still think there is value in sharing our observations, as a lot of this information is not available to people without relevant networks.
(I may come back (again, time permitting) and respond to your point on Redwood having many problems to deal with at a later stage)
I am sympathetic to several of the high level criticisms in this post but have a few relatively minor criticisms.
1) Redwood Funding
This post says “Redwood’s funding is very high relative to other labs.”
I think this is very false: OpenAI, Anthropic, and DeepMind have all recieved hundreds of millions of dollars, an order of magnitude above Redwood’s funding.
This post says “Redwood’s funding is much higher than any other non-profit lab that [OpenPhil funds].”
This is false, OpenAI was a non-profit when it received 30 million dollars from OpenPhil (link to grant), 50% more than this post cites Redwood as receiving.
This post casts OP having seats on the Board of Redwood as a negative. I think that in fact, having board seats on a place you fund is pretty normal I think, and considered responsible—the lack of this by VCs was a noted failure after the FTX collapse.
2) Field Experience
The post says:
This does not strike me as true—modern ML Research is an extremely new field, many research scientists in it did not start out with PhDs in ML.
3) Publishing is Relative to Productivity
I think it plausible that Redwood publishes a normal amount relative to their research productivity. This post seems to agree with that. I think them publishing more, absent them doing more research, would be bad, as it would lead to them publishing lower quality research.
My impression is also that Redwood’s published papers have stood out for being unusually thorough and informative about their research among ML papers.
Regarding 3) Publishing is relative to productivity, we are not entirely sure what you mean, but can try to clarify our point a little more.
We think it’s plausible that Redwood’s total volume of publicly available output is appropriate relative to the quantity of high-quality research they have produced. We have heard from some Redwood staff that there are important insights that have not been made publicly available outside of Redwood, but to some extent this is true of all labs, and it’s difficult for us to judge without further information whether these insights would be worth staff time to write up.
The main area we are confident in suggesting Redwood change is making their output more legible to the broader ML research community. Many of their research projects, including what Redwood considers their most notable project to date—causal scrubbing—are only available as Alignment Forum blog posts. We believe there is significant value in writing them up more rigorously and following a standard academic format, and releasing them as arXiv preprints. We would also suggest Redwood more frequently submit their results to peer-reviewed venues, as the feedback from peer review can be valuable for honing the communication of results, but acknowledge that it is possible to effectively disseminate findings without this: e.g. many of OpenAI and Anthropic’s highest-profile results were never published in a peer-reviewed venue.
Releasing arXiv preprints would have two dual benefits. First, it would make it significantly more likely to be noticed, read and cited by the broader ML community. This makes it more likely that others build upon the work and point out deficiencies in it. Second, the more structured nature of an academic paper forces a more detailed exposition, making it easier for reader’s to judge, reproduce and build upon. If, for example, we compare Neel’s original grokking blog post to the grokking paper, it is clear the paper is significantly more detailed and rigorous. This level of rigor may not be worth the time for every project, but we would at least expect it for an organization’s flagship projects.
Many research scientist roles at AI research labs (e.g. DeepMind and Google Brain[1]) expect researchers to have PhD’s in ML—this would be a minimum of 5 years doing relevant research.
Not all labs have a strict requirement on ML PhD’s. Many people at OpenAI and Anthropic don’t have PhD’s in ML either, but often have PhD’s in related fields like Maths or Physics. There are a decent number of people at OpenAI without PhD’s, (Anthropic is relatively stricter on this than OpenAI). Labs like MIRI don’t require this, but they are doing more conceptual researchly, and relatively little, if any, ML research (to the best of our knowledge, they are private by default).
Note that while we think for-profit AI labs are not the right reference class for comparing funding, we do think that all AI labs (academic, non-profit or for-profit) are the correct reference class when considering credentials for research scientists.
Fwiw, my read is that a lot of “must have an ML PhD” requirements are gatekeeping nonsense. I think you learn useful skills doing a PhD in ML, and I think you learn some skills doing a non-ML PhD (but much less that’s relevant, though physics PhDs are probably notably more relevant than maths). But also that eg academia can be pretty terrible for teaching you skills like ML engineering and software engineering, lots of work in academia is pretty irrelevant in the world of the bitter lesson, and lots of PhDs have terrible mentorship.
I care about people having skills, but think that a PhD is only an OK proxy for them, and would broadly respect the skills of someone who worked at one of the top AI labs for four years straight out of undergrad notably more than someone straight out of a PhD program
I particularly think that in interpretability, lots of standard ML experience isn’t that helpful, and can actively teach bad research taste and focus on pretty unhelpful problems
(I do think that Redwood should prioritise “hiring people with ML experience” more, fwiw, though I hold this opinion much more strongly around their adversarial training work than their interp work)
I completely agree.
I’ve worked in ML engineering and research for over 5 years at two companies, I have a PhD (though not in ML), and I’ve interviewed many candidates for ML engineering roles.
If I’m reviewing a resume and I see someone has just graduated from a PhD program (and does not have other job experience), my first thoughts are
This person might have domain experience that would be valuable in the role, but that’s not a given.
This person probably knows how to do lit searches, ingest information from an academic paper at a glance, etc., which are definitely valuable skills.
This person’s coding experience might consist only of low-quality, write-once-read-never rush jobs written to produce data/figures for a paper and discarded once the paper is complete.
More generally, this person might or might not adapt well to a non-academic environment.
I’ve never interviewed a candidate with 4 years at OpenAI on their resume, but if I had, my very first thoughts would involve things like
OpenAI might be the most accomplished AI capabilities lab in the world.
I’m interviewing for an engineering role, and OpenAI is specifically famous for moving capabilities forward through feats of software engineering (by pushing the frontier of huge distributed training runs) as opposed to just having novel ideas.
Anthropic’s success at training huge models, and doing extensive novel research on them, is an indication of what former OpenAI engineers can achieve outside of OpenAI in a short amount of time.
OpenAI is not a huge organization, so I can trust that most people there are contributing a lot, i.e. I can generalize from the above points to this person’s level of ability.
I dunno, I might be overrating OpenAI here?
But I think the comment in the post at least requires some elaboration, beyond just saying “many places have a PhD requirement.” That’s an easy way to filter candidates, but it doesn’t mean people in the field literally think that PhD work is fundamentally superior to (and non-fungible with) all other forms of job experience.
I agree re PhD skillsets (though think that some fraction of people gain a lot of high value skills during a PhD, esp re research taste and agenda settings).
I think you’re way overrating OpenAI though—in particular, Anthropic’s early employees/founders include more than half of the GPT-3 first authors!! I think the company has become much more oriented around massive distributed LLM training runs in the last few years though, so maybe your inference that people would gain those skills is more reasonable now.
Hi Fay, Thank you for engaging with the post. We appreciate you taking the time to check the claims we make.
Regarding OP’s investment in OpenAI—you are correct that OpenAI received a larger amount of money. We didn’t include this because in since the grant in 2017, OpenAI transitioned to a capped for-profit. I (the author of this particular comment) was actually not aware that OpenAI had been at one point a research non-profit at one point. I wil be updating the original post to add this information in—we appreciate you flagging it.
In general, we disagree that the correct reference class for evaluating Redwood’s funding is for-profit alignment labs like OpenAI, Anthropic or DeepMind because they have significantly more funding from (primarily non-EA) investors, and have different core objectives and goals. We think the correct reference class for Redwood is other TAIS labs (academic and research nonprofit) such as CHAI, CAIS, FAR AI and so on. I will add some clarification to the original post with more context.
(We will discuss the point on OP having board seats at Redwood in a separate comment)
Update: This has now been edited in the original post.
I’m surprised no one has commented about this yet; this seems incredibly problematic. Some things I’d like to know:
Who exactly are the people involved?
Did the timeline of these relationships line up with the timeline of the funding decisions?
Were the OP grant makers in the right position for their relationships to affect OP’s decision to fund Redwood?
As someone who started working at Redwood Research expecting an awesome experience and unfortunately had an awful one instead mainly because of the above, I wasn’t aware multiple others had similar experiences as well—though it doesn’t come as a surprise. Knowing my case was not an isolated one provides me with some solace, and will certainly contribute on healing the work-related trauma I’ve gained from the experience.
My condolences go to others who were unfairly treated there. I wish nothing but the best to RR leadership, and I hope they become better at interacting with employees and the broader community.
Here’s a note the moderators are starting to add on all anonymous content like this:
This was posted by an anonymous account. It’s pretty easy to create an anonymous account and post things like this without corroboration. This doesn’t mean that they’re always false, or that the things stated here are false, but I’d recommend that people use their best judgment and wait for serious evidence and/or corroboration before seriously updating on information shared here.
The moderation team doesn’t want to remove all content like this—it is in fact important to air issues like this sometimes, but it’s also important that we don’t naively accept everything posted by anonymous users.
Edit: moved the notice to the top of the comment, to make it clearer we’re not singling out this particular comment
Totally unrelated to the purpose of the post, but is this for real? $50,000 seems absurdly low, especially since the Bay Area has a high cost of living.
Academic salaries are crazy low (which is one of my many reasons for not wanting to do a PhD lol)
Hi Jakub, these are standard rates for EECS PhD students (PhD students in other disciplines get paid less). Here are a couple as an example:
Berkeley EECS PhD students are paid $45K per year at the PhD level. (from personal acquaintances at in the Berkeley EECS program)
MIT EECS PhD students are paid ~$49.2K per year at the PhD level. (source)
Ah ok, I thought “academic researcher” referred to professors/lecturers/postdocs, not PhD students.
(We missed this submission, apologies to the poster for not sharing this in a more timely fashion).
A male constellation Member (current or former Redwood Staff) & MLAB / REMIX program participant writes:
One thing I think was missed: the spending culture seemed a little over the top. There were some servers that had been unused racking up $10k+ bills that weren’t wound down with any urgency.
Thanks for this post—I’ve learned things about the AI safety community that I didn’t realize before. I wonder if much of the value of external criticism isn’t in changing the behavior of those being criticized, but rather in explicitly stating and making into common knowledge negative factors that by default are not talked about publically as much. (Both for future projects to do things differently, and for people today to update about how to relate to the entities involved).
Thanks for giving me some insight into an org I had previously only known by name!
Some clarifying questions:
What do you count as software engineering experience? The linked LinkedIn profile looks like he has > 10 years of experience in the field.
Can you confirm that Redwood really fired them as opposed to them quitting? (The first is unusual in my experience; the second very common.) You mention employees quitting in various places but because they’re anonymous, I can’t tell whether that refers to the same people. Thanks!
Hi Dawn!
Our critique on lack of senior ML staff is focused specifically on lack of machine learning expertise (as opposed to general TAIS work). We are counting substantive software engineering experience such as his work at PayPal and TripleByte.
On the topic of general TAIS experience, I think Buck has at most 7 years experience as he joined MIRI in 2017. (It is our understanding that a decent portion of his time at MIRI was spent recruiting). That being said, years of experience is not the only measure of experience, Jacob Steinhardt comments above that he believes Buck is “a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning.”
To our knowledge, their more experienced ML research staff were let go. We refer to different employees quitting at later stages. In an earlier draft we had named a few of them, but decided to remove the names due to anonymity concerns.
Thanks for clarifying!
I’m still so confused about the second point, but you probably also don’t know the details of what happened there.
None of the citations critique MIRI as far as I can tell. What critiques of MIRI did you have in mind?
We will edit this section to make it more clear, but the MIRI critique is the MIRI hyperlink—Paul Christiano’s critique of Eliezer.
Update: this has now been edited in the original post.
My mind comes back to points like this very often:
It’s an—or perhaps yet another—example of how sometimes the Bay Area/SE/entrepreneurial mindset is almost diametrically opposed to certain mindsets coming from academia and how this community is trying to balance them or get the best of both worlds (which isn’t a stupid thing to try to do per se, it just seems like sometimes it’s very tricky). In the spirit of the former you kinda want to move fast (and break things), but the latter wants you to remember the virtues of deliberately taking the time to demonstrate how thorough and methodical you are being (and partly so that you don’t, say, squander your resources by running a foreseeably dud experiment)
I feel like it was only a year or so ago that the standard critique of the AI safety community was that they were too abstract, too theoretical, that they lacked hands-on experience, lacked contact with empirical reality, etc...
I’m quite confused by this. Could you explain what your intended meaning is with this? It seems that the claim here is that spending a month to learn something is better than spending a day to learn something, which strikes me as very odd. Is there an implication here that working in a laboratory gives a person a better/deeper understanding, and therefore is better to the more light/superficial understanding from a library?
Hi Joseph, that quote is meant to be facetious. The scientist who originally said the quote was trying to encourage the opposite to his students—that researching before experimenting can save them time.
Thanks for the effort you have invested to research and write up the constructive feedback. How much time did you roughly spend on this?
The author commented elsewhere that it took at least 50 hours.
Yep that’s right. This is probably an underestimate, but we would need to spend some time figuring it out. We’ve spent at least 10 hours replying to cc
If a permanent ban went into effect today on training ML models on anything larger than a single consumer-grade GPU card, e.g., Nvidea RTX 40 series, the work of MIRI researcher Scott Garrabrant would not be affected at all. How much of Redwood’s research would stop?
This needs more specificity. Obviously for Garrabrant’s work to have any effect, it will need to influence the design and deployment of an AI eventually; it’s just that is his research approach is probably decades away from when it can be profitably applied to an actual deployed AI. On the other hand, any AI researcher can remain productive if denied the use of a GPU cluster for a week: for example, he or she can use the week to tidy up his or her office and do related “housekeeping” tasks. I guess what I want to know is if there is a ban on GPU clusters, how long—weeks? months? years? -- can the median researcher at Redwood remain productive without abandoning most of his or her work up to now if there is a ban? And is there any researcher at Redwood doing work that is a lot more robust against such a ban than the median researcher at Redwood?
If you (the team that wrote this post) had the power to decide which organizations will get shut down (with immediate effect) would Redwood be one of the orgs you shut down? Assume that you had enough power that if you chose to, you could shut down all meaningful research on AI and that you could be as selective as you like about which organizations and parts (e.g., academic departments) of organizations to shut down.
Thanks in advance.
Hey, I just wanted to thank you for writing this!
I’m looking forward to reading future posts in the series; actually, I think it would be great to have series like this one for each major cause area.