the quantity and quality of output is underwhelming given the amount of money and staff time invested.
Of Redwood’s published research, we were impressed by Redwood’s interpretability in the wild paper, but would consider it to be no more impressive than progress measures for grokking via mechanistic interpretability, executed primarily by two independent researchers, or latent knowledge in language models without supervision, performed by two PhD students.[4] These examples are cherry-picked to be amongst the best of academia and independent research, but we believe this is a valid comparison because we also picked what we consider the best of Redwood’s research and Redwood’s funding is very high relative to other labs.
I’m missing a lot of context here, but my impression is that this argument doesn’t go through, or at least is missing some steps:
We think that the best Redwood research is of similar quality to work by [Neel Nanda, Tom Lieberum and others, mentored by Jacob Steinhardt]
Work by those others doesn’t cost $20M
Therefore the work by Redwood shouldn’t cost $20M
Instead, the argument which would go through would be:
Open Philanthropy spent $20M on Redwood Research
That $20M produced [such and such research]
This is how you could have spent $20M to produce [better research]
Therefore, Open Philanthropy shouldn’t have spent $20M on Redwood Research, but instead on [alternatives]
(or spent $20M on [alternatives] and on Redwood Research, if the value of Redwood Research is still above the bar)
But you haven’t shown step 3, the tradeoff against the counterfactual. It seems likely that the situation is such that producing good AI safety research depends on somewhat idiosyncratic non-monetary factors. Sometimes you will find a talented independent researcher or a PhD student that will produce quality research for relatively small amounts of money, sometimes you will spend $20M to get an outcome of a similar quality. I could see that being the case if the bottleneck isn’t money, which seems plausible.
Also note that building an institution is potentially much more scalable than funding one-off independent researchers.
As I said, I’m missing lots of context (i.e., I haven’t read Redwood’s research, seems within the normal range of possibility that it wouldn’t be worth $20M), but I thought I’d give my two cents.
Neel Nanda, Tom Lieberum and others, mentored by Jacob Steinhardt
I will clarify in my personal case that I did the grokking work as an independent research project and that Jacob only became involved in the project after I had done the core research, and his mentorship was specifically about the process of distillation and writing up the results (to be clear, his mentorship here was high value! But I think that the paper benefited less from his mentorship than is implied by the reference class of having him as the final author)
Re your point about “building an institution” and step 3: We think the majority of our expected value comes from futures in which we produce more research value per dollar than in the past.
(Also, just wanted to note again that $20M isn’t the right number to use here, since around 1/3rd of that funding is for running Constellation, as mentioned in the post.)
Thanks for mentioning the $20M point Nate—I’ve edited the post to make this a little more clear and would suggest people use $14M as the number instead.
Meta note: We believe this response is the 80⁄20 in terms of quality vs time investment. We think it’s likely we could improve the comment with more work, but wanted to share our views earlier rather than later.
We think one thing we didn’t spell out very explicitly in this post, was the distinction between 1) how effectively we believed Redwood spent their resources and 2) whether we think OP should have funded them (and at what amount). As this post is focused on Redwood, I’ll focus more on 1) and comment briefly on 2) - but note that we plan to expand on this further in a follow-up post. We will add a paragraph which disambiguates between these two points more clearly.
Argument 1): We think Redwood could produce at least the same quality and quantity of research, with fewer resources (~$4-8 million over 2 years)
The key reasons we think 1) are:
If they had more senior ML staff or advisors, they could have avoided some mistakes on their agenda that we see as avoidable. This wouldn’t necessarily come at a large monetary cost given their overall budget (around $200-300K for 1 FTE).
We estimate as much as 25-30% of their spending went towards scaling up projects (e.g. REMIX) before they had a clear research agenda they were confident in. To be fair to Redwood, this premature scaling was more defensible prior to the FTX collapse when the general belief was that there was a “funding overhang”. Nate in his comment also mentions that scaling was raised by both Holden and Ajeya (at OP), and now sees this as an error on their part.
Argument 2): OP should have spent less on Redwood, 2a) and there were other comparable funding opportunities
The key reasons we think 2) are:
There are other TAIS labs (academic and not) that we believe could absorb and spend considerably more funding than they currently receive. Example non-profits include CAIS and FAR AI and underfunded safety-interested academic groups include David Krueger and Dylan Hadfield-Menell’s groups. Opportunities are more limited if focusing specifically on interpretability, but there are still a number of promising options. For example, Neel Nanda mentioned three academics he considers do good interpretability work: OP has funded one of them (David Bau) but as far as we know not the other two (of course, they may not have room for more funding, or OP may have investigated and decided not to fund them for other reasons).
A key reason OP may not think some of these labs are worth funding on the margin is that they are substantially more bullish on certain safety research agendas than others. We have some concerns about how the OP LT team decide which agendas to support but will explore this further in our Constellation post, so won’t comment in more depth at this point. As one of the main funders of TAIS work, in a field which is very speculative and new, we think OP should be more open to a broad range of research agendas than they are.
We think that small, young organizations without a track record beyond founder reputation should in general be given smaller grants and build up a track record before trying to scale. We think it’s plausible that several of the issues we pointed out could have been mitigated by this funding structure.
There are other TAIS labs (academic and not) that we believe could absorb and spend considerably more funding than they currently receive.
My understanding is that, had Redwood not existed, OpenPhil would not have significantly increased their funding to these other places, and broadly has more money than they know what to do with (especially in the previous EA funding environment!). I don’t know whether those other places have applied for grants, or why they aren’t as funded as they could be, but this doesn’t seem that related to me. And more broadly there are a bunch of constraints on grant makers like time to evaluate a grant, having enough context to competently evaluate it or external advisors with context who they trust, etc. Eg, I’m a bit hesitant about funding Interpretability academics who I think will go full steam ahead on capabilities (I think it’s often worth doing anyway, but not obvious to me, and the one time I recommended a grant here it did consume quite a lot of my time to evaluate the nuances)
And that grant making is just really not an efficient market, and there’s lots of good grants that don’t happen fordumb reasons
Concretely, it’s plausible to me that taking themarginal 1 million given to Redwood and dividing it evenly among the other labs you mention seems good. But that doesn’t feel like the right counterfactual here.
To push back on this point, presumably even if grantmaker time is the binding resource and not money, Redwood also took up grantmaker time from OP (indeed I’d guess that OP’s grantmaker time on RR is much higher than for most other grants given the board member relationship). So I don’t think this really negates Omega’s argument—it is indeed relevant to ask how Redwood looks compared to grants that OP hasn’t made.
Personally, I am pretty glad Redwood exists and think their research so far is promising. But I am also pretty disappointed that OP hasn’t funded some academics that seem like slam dunks to me and think this reflects an anti-academia bias within OP (note they know I think this and disagree with me). Presumably this is more a discussion for the upcoming post on OP, though, and doesn’t say whether OP was overvaluing RR or undervaluing other grants (mostly the latter imo, though it seems plausible that OP should have been more critical about the marginal $1M to RR especially if overhiring was one of their issues).
I am also pretty disappointed that OP hasn’t funded some academics that seem like slam dunks to me and think this reflects an anti-academia bias within OP (note they know I think this and disagree with me).
My prior is that people who Jacob thinks are slam-dunks should basically always be getting funding, so I’m pretty surprised by this anecdote. (In general I also expect that there are a lot of complex details in cases like these, so it doesn’t seem implausible that it was the right call, but it seemed worth registering the surprise.)
I work at Open Philanthropy, and in the last few months I took on much of our technical AI safety grantmaking.
In November and December, Jacob sent me a list of academics he felt that someone at Open Phil should reach out to and solicit proposals from. I was interested in these opportunities, but at the time, I was full-time on processing grant proposals that came in through Open Philanthropy’s form for grantees affected by the FTX crash and wasn’t able to take them on.
This work tailed off in January, and since then I’ve focused on a few bigger grants, some writing projects, and thinking through how I should approach further grantmaking. I think I should have reached out to at least a few of the people Jacob suggested earlier (e.g. in February). I didn’t make any explicit decision to reject someone that Jacob thought was a slam dunk because I disagreed with his assessment — rather, I was slower to reach out to talk to people he thought I should fund than I could have been.
I plan to talk to several of the leads Jacob sent my way in Q2, and (while I would plan to think through the case for these grants myself to the extent I can) I expect to end up agreeing a lot with Jacob’s assessments.
With that said, Jacob and I do have more nebulous higher-level disagreements about things like how truth-tracking academic culture tends to be and how much academic research has contributed to AI alignment so far, and in some indirect way these disagreements probably contributed to me prioritizing these reach outs less highly than someone else might have.
This seems fair, I’m significantly pushing back on this as criticism of Redwood, and as focus on the “Redwood has been overfunded” narrative. I agree that they probably consumed a bunch of grant makers time, and am sympathetic to the idea that OpenPhil is making a bunch of mistakes here.
I’m curious which academics you have in mind as slam dunks?
Thanks Nuno, I’m sharing this comment with the other contributors and will respond in depth soon. I think you’re right that we could be more explicit on 3).
I’m missing a lot of context here, but my impression is that this argument doesn’t go through, or at least is missing some steps:
We think that the best Redwood research is of similar quality to work by [Neel Nanda, Tom Lieberum and others, mentored by Jacob Steinhardt]
Work by those others doesn’t cost $20M
Therefore the work by Redwood shouldn’t cost $20M
Instead, the argument which would go through would be:
Open Philanthropy spent $20M on Redwood Research
That $20M produced [such and such research]
This is how you could have spent $20M to produce [better research]
Therefore, Open Philanthropy shouldn’t have spent $20M on Redwood Research, but instead on [alternatives]
(or spent $20M on [alternatives] and on Redwood Research, if the value of Redwood Research is still above the bar)
But you haven’t shown step 3, the tradeoff against the counterfactual. It seems likely that the situation is such that producing good AI safety research depends on somewhat idiosyncratic non-monetary factors. Sometimes you will find a talented independent researcher or a PhD student that will produce quality research for relatively small amounts of money, sometimes you will spend $20M to get an outcome of a similar quality. I could see that being the case if the bottleneck isn’t money, which seems plausible.
Also note that building an institution is potentially much more scalable than funding one-off independent researchers.
As I said, I’m missing lots of context (i.e., I haven’t read Redwood’s research, seems within the normal range of possibility that it wouldn’t be worth $20M), but I thought I’d give my two cents.
I will clarify in my personal case that I did the grokking work as an independent research project and that Jacob only became involved in the project after I had done the core research, and his mentorship was specifically about the process of distillation and writing up the results (to be clear, his mentorship here was high value! But I think that the paper benefited less from his mentorship than is implied by the reference class of having him as the final author)
I agree with this.
Cheers
Also, no reputational harm intended, sorry.
Re your point about “building an institution” and step 3: We think the majority of our expected value comes from futures in which we produce more research value per dollar than in the past.
(Also, just wanted to note again that $20M isn’t the right number to use here, since around 1/3rd of that funding is for running Constellation, as mentioned in the post.)
Thanks for mentioning the $20M point Nate—I’ve edited the post to make this a little more clear and would suggest people use $14M as the number instead.
Cheers
Meta note: We believe this response is the 80⁄20 in terms of quality vs time investment. We think it’s likely we could improve the comment with more work, but wanted to share our views earlier rather than later.
We think one thing we didn’t spell out very explicitly in this post, was the distinction between 1) how effectively we believed Redwood spent their resources and 2) whether we think OP should have funded them (and at what amount). As this post is focused on Redwood, I’ll focus more on 1) and comment briefly on 2) - but note that we plan to expand on this further in a follow-up post. We will add a paragraph which disambiguates between these two points more clearly.
Argument 1): We think Redwood could produce at least the same quality and quantity of research, with fewer resources (~$4-8 million over 2 years)
The key reasons we think 1) are:
If they had more senior ML staff or advisors, they could have avoided some mistakes on their agenda that we see as avoidable. This wouldn’t necessarily come at a large monetary cost given their overall budget (around $200-300K for 1 FTE).
We estimate as much as 25-30% of their spending went towards scaling up projects (e.g. REMIX) before they had a clear research agenda they were confident in. To be fair to Redwood, this premature scaling was more defensible prior to the FTX collapse when the general belief was that there was a “funding overhang”. Nate in his comment also mentions that scaling was raised by both Holden and Ajeya (at OP), and now sees this as an error on their part.
Argument 2): OP should have spent less on Redwood, 2a) and there were other comparable funding opportunities
The key reasons we think 2) are:
There are other TAIS labs (academic and not) that we believe could absorb and spend considerably more funding than they currently receive. Example non-profits include CAIS and FAR AI and underfunded safety-interested academic groups include David Krueger and Dylan Hadfield-Menell’s groups. Opportunities are more limited if focusing specifically on interpretability, but there are still a number of promising options. For example, Neel Nanda mentioned three academics he considers do good interpretability work: OP has funded one of them (David Bau) but as far as we know not the other two (of course, they may not have room for more funding, or OP may have investigated and decided not to fund them for other reasons).
A key reason OP may not think some of these labs are worth funding on the margin is that they are substantially more bullish on certain safety research agendas than others. We have some concerns about how the OP LT team decide which agendas to support but will explore this further in our Constellation post, so won’t comment in more depth at this point. As one of the main funders of TAIS work, in a field which is very speculative and new, we think OP should be more open to a broad range of research agendas than they are.
We think that small, young organizations without a track record beyond founder reputation should in general be given smaller grants and build up a track record before trying to scale. We think it’s plausible that several of the issues we pointed out could have been mitigated by this funding structure.
My understanding is that, had Redwood not existed, OpenPhil would not have significantly increased their funding to these other places, and broadly has more money than they know what to do with (especially in the previous EA funding environment!). I don’t know whether those other places have applied for grants, or why they aren’t as funded as they could be, but this doesn’t seem that related to me. And more broadly there are a bunch of constraints on grant makers like time to evaluate a grant, having enough context to competently evaluate it or external advisors with context who they trust, etc. Eg, I’m a bit hesitant about funding Interpretability academics who I think will go full steam ahead on capabilities (I think it’s often worth doing anyway, but not obvious to me, and the one time I recommended a grant here it did consume quite a lot of my time to evaluate the nuances)
And that grant making is just really not an efficient market, and there’s lots of good grants that don’t happen fordumb reasons
Concretely, it’s plausible to me that taking themarginal 1 million given to Redwood and dividing it evenly among the other labs you mention seems good. But that doesn’t feel like the right counterfactual here.
To push back on this point, presumably even if grantmaker time is the binding resource and not money, Redwood also took up grantmaker time from OP (indeed I’d guess that OP’s grantmaker time on RR is much higher than for most other grants given the board member relationship). So I don’t think this really negates Omega’s argument—it is indeed relevant to ask how Redwood looks compared to grants that OP hasn’t made.
Personally, I am pretty glad Redwood exists and think their research so far is promising. But I am also pretty disappointed that OP hasn’t funded some academics that seem like slam dunks to me and think this reflects an anti-academia bias within OP (note they know I think this and disagree with me). Presumably this is more a discussion for the upcoming post on OP, though, and doesn’t say whether OP was overvaluing RR or undervaluing other grants (mostly the latter imo, though it seems plausible that OP should have been more critical about the marginal $1M to RR especially if overhiring was one of their issues).
My prior is that people who Jacob thinks are slam-dunks should basically always be getting funding, so I’m pretty surprised by this anecdote. (In general I also expect that there are a lot of complex details in cases like these, so it doesn’t seem implausible that it was the right call, but it seemed worth registering the surprise.)
I work at Open Philanthropy, and in the last few months I took on much of our technical AI safety grantmaking.
In November and December, Jacob sent me a list of academics he felt that someone at Open Phil should reach out to and solicit proposals from. I was interested in these opportunities, but at the time, I was full-time on processing grant proposals that came in through Open Philanthropy’s form for grantees affected by the FTX crash and wasn’t able to take them on.
This work tailed off in January, and since then I’ve focused on a few bigger grants, some writing projects, and thinking through how I should approach further grantmaking. I think I should have reached out to at least a few of the people Jacob suggested earlier (e.g. in February). I didn’t make any explicit decision to reject someone that Jacob thought was a slam dunk because I disagreed with his assessment — rather, I was slower to reach out to talk to people he thought I should fund than I could have been.
I plan to talk to several of the leads Jacob sent my way in Q2, and (while I would plan to think through the case for these grants myself to the extent I can) I expect to end up agreeing a lot with Jacob’s assessments.
With that said, Jacob and I do have more nebulous higher-level disagreements about things like how truth-tracking academic culture tends to be and how much academic research has contributed to AI alignment so far, and in some indirect way these disagreements probably contributed to me prioritizing these reach outs less highly than someone else might have.
This seems fair, I’m significantly pushing back on this as criticism of Redwood, and as focus on the “Redwood has been overfunded” narrative. I agree that they probably consumed a bunch of grant makers time, and am sympathetic to the idea that OpenPhil is making a bunch of mistakes here.
I’m curious which academics you have in mind as slam dunks?
Thanks Nuno, I’m sharing this comment with the other contributors and will respond in depth soon. I think you’re right that we could be more explicit on 3).
Cheers