Thanks for writing this post—it was useful to see the argument written out so I could see exactly where I agreed and disagreed. I think lots of people agree with this but I’ve never seen it written up clearly before.
I think I place substantial weight (30% or something) on you being roughly right about the relative contributions of EA safety and non-EA safety. But I think it’s more likely that the penalty on non-EA safety work is larger than you think.
I think the crux here is that I think AI alignment probably requires really focused attention, and research done by people who are trying to do something else will probably end up not being very helpful for some of the core problems.
It’s a little hard to evaluate the counterfactuals here, but I’d much rather have the contributions from EA safety than from non EA safety over the last ten years.
I think that it might be easier to assign a value to the discount factor by assessing the total contributions of EA safety and non-EA safety. I think that EA safety does something like 70% of the value-weighted work, which suggests a much bigger discount factor than 80%.
---
Assorted minor comments:
But this is only half of the ledger. One of the big advantages of academic work is the much better distribution of senior researchers: EA Safety seems bottlenecked on people able to guide and train juniors
Yes, but those senior researchers won’t necessarily have useful things to say about how to do safety research. (In fact, my impression is that most people doing safety research in academia have advisors who don’t have very smart thoughts on long term AI alignment.)
None of those parameters is obvious, but I make an attempt in the model (bottom-left corner).
I think the link is to the wrong model?
A cursory check of the model
In this section you count nine safety-relevant things done by academia over two decades, and then note that there were two things from within EA safety last year that seem more important. This doesn’t seem to mesh with your claim about their relative productivity.
Was going to write a longer comment but I basically agree with Buck’s take here.
It’s a little hard to evaluate the counterfactuals here, but I’d much rather have the contributions from EA safety than from non EA safety over the last ten years.
I wanted to endorse this in particular.
On the actual argument:
1. EA safety is small, even relative to a single academic subfield.
2. There is overlap between capabilities and short-term safety work.
3. There is overlap between short-term safety work and long-term safety work.
4. So AI safety is less neglected than the opening quotes imply.
5. Also, on present trends, there’s a good chance that academia will do more safety over time, eventually dwarfing the contribution of EA.
I agree with 1, 2, and 3 (though perhaps disagree with the magnitude of 2 and 3, e.g. you list a bunch of related areas and for most of them I’d be surprised if they mattered much for AGI alignment).
I agree 4 is literally true, but I’m not sure it necessarily matters, as this sort of thing can be said for ~any field (as Ben Todd notes). It would be weird to say that animal welfare is not neglected because of the huge field of academia studying animals, even though those fields are relevant to questions of e.g. sentience or farmed animal welfare.
I strongly agree with 5 (if we replace “academia” with “academia + industry”, it’s plausible to me academia never gets involved while industry does), and when I argue that “work will be done by non-EAs”, I’m talking about future work, not current work.
research done by people who are trying to do something else will probably end up not being very helpful for some of the core problems.
Yeah, it’d be good to break AGI control down more, to see if there are classes of problem where we should expect indirect work to be much less useful. But this particular model already has enough degrees of freedom to make me nervous.
I think that it might be easier to assign a value to the discount factor by assessing the total contributions of EA safety and non-EA safety.
That would be great! I used headcount because it’s relatively easy, but value weights are clearly better. Do you know any reviews of alignment contributions?
… This doesn’t seem to mesh with your claim about their relative productivity.
Yeah, I don’t claim to be systematic. The nine are just notable things I happened across, rather than an exhaustive list of academic contributions. Besides the weak evidence from the model, my optimism about there being many other academic contributions is based on my own shallow knowledge of AI: “if even I could come up with 9...”
Something like the Median insights collection, but for alignment, would be amazing, but I didn’t have time.
those senior researchers won’t necessarily have useful things to say about how to do safety research
This might be another crux: “how much do general AI research skills transfer to alignment research?” (Tacitly I was assuming medium-high transfer.)
I think the link is to the wrong model?
No, that’s the one; I mean the 2x2 of factors which lead to ‘% work that is alignment relevant’. (Annoyingly, Guesstimate hides the dependencies by default; try View > Visible)
My intuition is also that the discount for academia solving core alignment problems should be (much?) higher than here. At the same time I agree that some mainstream work (esp. foundations) does help current AI alignment research significantly. I would expect (and hope) more of this to still appear, but to be increasingly sparse (relative to amount of work in AI).
I think that it would be useful to have a contribution model that can distinguish (at least) between a)improving the wider area (including e.g. fundamental models, general tools, best practices, learnability) and b)working on the problem itself. Distinguishing past contribution and expected future contribution (resp. discount factor) may also help.
Why: Having a well developed field is a big help in solving any particular problem X adjacent to it and it seems reasonable to assign a part of the value of “X is solved” to work done on the field. However, field development alone is unlikely to solve X for sufficiently hard X that is not in the field’s foci, and dedicated work on X is still needed. I imagine this applies to the field of ML/AI and long-termist AI alignment.
Model sketch: General work done on the field has diminishing returns towards the work remaining on the problem. As the field grows, it branches and the surface area grows as a function of this, and the progress in directions that are not foci slows appropriately. Extensive investment in the field would solve any problem eventually but unfocused effort would is increasingly inefficient. Main uncertainties: I am not sure how to model areas of field focus and the faster progress in their vicinity, or how much I would expect some direction sufficiently close to AI alignment to be a focus of AI.
Overall, this would make me to expect that the past work in AI and ML would have a significant contribution towards AI alignment but to expect increasing discount in the future, unless alignment becomes a focus for AI or close to it. When thinking about policy implications for focusing research effort (with the goal of solving AI alignment), I would expect the returns to general academia to diminish much faster than to EA safety research.
I think the crux here is that I think AI alignment probably requires really focused attention, and research done by people who are trying to do something else will probably end up not being very helpful for some of the core problems.
Considering the research necessary to “solve alignment for the AIs that will actually be built” as some nodes in the directed acyclic graph of scientific and engineering progress, another crux seems to me to be how effective it is to do that research with the input nodes available today to an org focused specifically on AI alignment:
My intuition there is that progress on fundamental, mathematically hard or even philosophical questions is likely to come serendipitously from people with academic freedom, who happen to have some relevant input nodes in their head. On the other hand, for an actual huge Manhattan-like engineering project to build GAI, making it safe might be a large sub-project itself—but only the engineers involved can understand what needs to be done to do so, just like the Wright brothers wouldn’t have much to say about making a modern jet plane safe.
Thanks for writing this post—it was useful to see the argument written out so I could see exactly where I agreed and disagreed. I think lots of people agree with this but I’ve never seen it written up clearly before.
I think I place substantial weight (30% or something) on you being roughly right about the relative contributions of EA safety and non-EA safety. But I think it’s more likely that the penalty on non-EA safety work is larger than you think.
I think the crux here is that I think AI alignment probably requires really focused attention, and research done by people who are trying to do something else will probably end up not being very helpful for some of the core problems.
It’s a little hard to evaluate the counterfactuals here, but I’d much rather have the contributions from EA safety than from non EA safety over the last ten years.
I think that it might be easier to assign a value to the discount factor by assessing the total contributions of EA safety and non-EA safety. I think that EA safety does something like 70% of the value-weighted work, which suggests a much bigger discount factor than 80%.
---
Assorted minor comments:
Yes, but those senior researchers won’t necessarily have useful things to say about how to do safety research. (In fact, my impression is that most people doing safety research in academia have advisors who don’t have very smart thoughts on long term AI alignment.)
I think the link is to the wrong model?
In this section you count nine safety-relevant things done by academia over two decades, and then note that there were two things from within EA safety last year that seem more important. This doesn’t seem to mesh with your claim about their relative productivity.
Was going to write a longer comment but I basically agree with Buck’s take here.
I wanted to endorse this in particular.
On the actual argument:
I agree with 1, 2, and 3 (though perhaps disagree with the magnitude of 2 and 3, e.g. you list a bunch of related areas and for most of them I’d be surprised if they mattered much for AGI alignment).
I agree 4 is literally true, but I’m not sure it necessarily matters, as this sort of thing can be said for ~any field (as Ben Todd notes). It would be weird to say that animal welfare is not neglected because of the huge field of academia studying animals, even though those fields are relevant to questions of e.g. sentience or farmed animal welfare.
I strongly agree with 5 (if we replace “academia” with “academia + industry”, it’s plausible to me academia never gets involved while industry does), and when I argue that “work will be done by non-EAs”, I’m talking about future work, not current work.
Thanks!
Yeah, it’d be good to break AGI control down more, to see if there are classes of problem where we should expect indirect work to be much less useful. But this particular model already has enough degrees of freedom to make me nervous.
That would be great! I used headcount because it’s relatively easy, but value weights are clearly better. Do you know any reviews of alignment contributions?
Yeah, I don’t claim to be systematic. The nine are just notable things I happened across, rather than an exhaustive list of academic contributions. Besides the weak evidence from the model, my optimism about there being many other academic contributions is based on my own shallow knowledge of AI: “if even I could come up with 9...”
Something like the Median insights collection, but for alignment, would be amazing, but I didn’t have time.
This might be another crux: “how much do general AI research skills transfer to alignment research?” (Tacitly I was assuming medium-high transfer.)
No, that’s the one; I mean the 2x2 of factors which lead to ‘% work that is alignment relevant’. (Annoyingly, Guesstimate hides the dependencies by default; try View > Visible)
My intuition is also that the discount for academia solving core alignment problems should be (much?) higher than here. At the same time I agree that some mainstream work (esp. foundations) does help current AI alignment research significantly. I would expect (and hope) more of this to still appear, but to be increasingly sparse (relative to amount of work in AI).
I think that it would be useful to have a contribution model that can distinguish (at least) between a) improving the wider area (including e.g. fundamental models, general tools, best practices, learnability) and b) working on the problem itself. Distinguishing past contribution and expected future contribution (resp. discount factor) may also help.
Why: Having a well developed field is a big help in solving any particular problem X adjacent to it and it seems reasonable to assign a part of the value of “X is solved” to work done on the field. However, field development alone is unlikely to solve X for sufficiently hard X that is not in the field’s foci, and dedicated work on X is still needed. I imagine this applies to the field of ML/AI and long-termist AI alignment.
Model sketch: General work done on the field has diminishing returns towards the work remaining on the problem. As the field grows, it branches and the surface area grows as a function of this, and the progress in directions that are not foci slows appropriately. Extensive investment in the field would solve any problem eventually but unfocused effort would is increasingly inefficient. Main uncertainties: I am not sure how to model areas of field focus and the faster progress in their vicinity, or how much I would expect some direction sufficiently close to AI alignment to be a focus of AI.
Overall, this would make me to expect that the past work in AI and ML would have a significant contribution towards AI alignment but to expect increasing discount in the future, unless alignment becomes a focus for AI or close to it. When thinking about policy implications for focusing research effort (with the goal of solving AI alignment), I would expect the returns to general academia to diminish much faster than to EA safety research.
Considering the research necessary to “solve alignment for the AIs that will actually be built” as some nodes in the directed acyclic graph of scientific and engineering progress, another crux seems to me to be how effective it is to do that research with the input nodes available today to an org focused specifically on AI alignment:
My intuition there is that progress on fundamental, mathematically hard or even philosophical questions is likely to come serendipitously from people with academic freedom, who happen to have some relevant input nodes in their head. On the other hand, for an actual huge Manhattan-like engineering project to build GAI, making it safe might be a large sub-project itself—but only the engineers involved can understand what needs to be done to do so, just like the Wright brothers wouldn’t have much to say about making a modern jet plane safe.