Thanks for this, I hadnât thought much about the topic and agree it seems more neglected than it should be. But I am probably overall less bullish than you (as operationalised by e.g. how many people in the existential risk field should be making this a significant focus: I am perhaps closer to 5% than your 30% at present).
I liked your flowchart on âInputs in the AI application pipeline,â so using that framing:
Learning algorithms: I agree this is not very tractable for us[1] to work on.
Training data: This seems like a key thing for us to contribute, particularly at the post-training stage. By supposition, a large fraction of the most relevant work on AGI alignment, control, governance, and strategy has been done by âusâ. I could well imagine that it would be very useful to get project notes, meetings, early drafts etc as well as the final report to train a specialised AI system to become an automated alignment/âgovernance etc researcher.
But my guess is just compiling this training data doesnât take that much time. All it takes is when the time comes you convince a lot of the relevant people and orgs to share old google docs of notes/âdrafts/âplans etc paired with the final product.
There will be a lot of infosec considerations here, so maybe each org will end up training their own AI based on their own internal data. I imagine this is what will happen for a lot of for-profit companies.
Making sure we donât delete old draft reports and meeting notes and things seems good here, but given storing google docs is so cheap and culling files is time-expensive, I think by default almost everyone just keeps most of their (at least textual) digital corpus anyway. Maybe there is some small intervention to make this work better though?
Compute: It certainly seems great for more compute to be spent on automated safety work versus automated capabilities work. But this is mainly a matter of how much money each party has to pay for compute. So lobbying for governments to spend lots on safety compute, or regulations to get companies to spend more on safety compute seems good, but this is a bit separate/âupstream from what you have in mind I think, it is more just âget key people to care more about safetyâ.
Post-training enhancements: we will be very useful for providing RLHF to tell a budding automated AI safety researcher how good each of its outputs is. Research taste is key here. This feels somewhat continuous with just âmanaging a fleet of AI research assistantsâ.
UI and complementary technologies: I donât think we have a comparative advantage here, and can just outsource this to human or AI contractors to build nice apps for us, or use generic apps on the market and just feed in our custom training data.
In terms of which applications to focus on, my guess is epistemic tools and coordination-enabling tools will mostly be built by default (though of course as you note additional effort can still speed them up some). E.g. politicians and business leaders and academics would all presumably love to have better predictions for which policies will be popular, what facts are true, which papers will replicate etc. And negotiation tools might be quite valuable for e.g. negotiating corporate mergers and deals.
So my take is that probably a majority of the game here is in âautomated AI safety/âgovernance/âstrategyâ because there will be less corporate incentive here, and it is also our comparative advantage to work on.
Overall, I agree differential AI tool development could be very important, but think the focus is mainly on providing high-quality training data and RLHF for automated AI safety research, which is somewhat narrower than what you describe.
Iâm not sure how much we actually disagree though, would be interested in your thoughts!
UI and complementary technologies: Iâm sort of confused about your claim about comparative advantage. Are you saying that there arenât people in this community whose comparative advantage might be designing UI? That would seem surprising.
More broadly, though:
Iâm not sure how much âwe can just outsource thisâ really cuts against the core of our argument (how to get something done is a question of tactics, and it could still be a strategic priority even if we just wanted to spend a lot of money on it)
I guess I feel, though, that youâre saying this wonât be a big bottleneck
I think that that may be true if youâre considering automated alignment research in particular. But Iâm not on board with that being the clear priority here
Yes, I suppose I am trying to divide tasks/âprojects up into two buckets based on whether they require high context and value-alignment and strategic thinking and EA-ness. And I think my claim was/âis that UI design is comparatively easy to outsource to someone without much of the relevant context and values. And therefore the comparative advantage of the higher-context people is to do things that are harder to outsource to lower-context people. But I know ~nothing about UI design, maybe being higher context is actually super useful.
Compute allocation: mostly I think that âget people to care moreâ does count as the type of thing we were talking about. But I think that itâs not just caring about safety, but also being aware ahead-of-time of the role that automated research may have to play in this, and when it may be appropriate to hit the gas and allocate a lot of compute to particular areas.
Which applications to focus on: I agree that epistemic tools and coordination-enabling tools will eventually have markets and so will get built at some point absent intervention. But this doesnât feel like a very strong argumentâthe whole point is that we may care about accelerating applications even if itâs not by a long period. And I donât think that these will obviously be among the most profitable applications people could make (especially if you can start specializing to the most high-leverage epistemic and coordination tools).
Also, we could make a similar argument that âautomated safetyâ research wonât get dropped, since itâs so obviously in the interests of whoeverâs winning the race.
Training data: I agree that the stuff youâre pointing to seems worthwhile. But I feel like youâve latched onto a particular type of training data, and youâre missing important categories, e.g.:
Epistemics stuffâthere are lots of super smart people earnestly trying to figure out very hard questions, and I think that if you could access their thinking, there would be a lot there which would compare favourably to a lot of the data that would be collected from people in this community. It wouldnât be so targeted in terms of the questions it addressed (e.g. âAI strategyâ, but learning good epistemics may be valuable and transfer over)
International negotiation, and high-stakes bargaining in generalâpotentially very important, but not something I think our community has any particular advantage at
Synthetic dataâa bunch of things may be unlocked more by working out how to enable âself-playâ (or the appropriate analogue), rather than just collecting more data the hard way
Yeah I think I agree with all this; I suppose since âweâ have the AI policy/âstrategy training data anyway that seems relatively low effort and high value to do, but yes if we could somehow get access to the private notes of a bunch of international negotiators that also seems very valuable! Perhaps actually asking top forecasters to record their working and meetings to use as training data later would be valuable, and I assume many people already do this by default (tagging @NunoSempere). Although of course having better forecasting AIs seems more dual-use than some of the other AI tools.
Thanks for this, I hadnât thought much about the topic and agree it seems more neglected than it should be. But I am probably overall less bullish than you (as operationalised by e.g. how many people in the existential risk field should be making this a significant focus: I am perhaps closer to 5% than your 30% at present).
I liked your flowchart on âInputs in the AI application pipeline,â so using that framing:
Learning algorithms: I agree this is not very tractable for us[1] to work on.
Training data: This seems like a key thing for us to contribute, particularly at the post-training stage. By supposition, a large fraction of the most relevant work on AGI alignment, control, governance, and strategy has been done by âusâ. I could well imagine that it would be very useful to get project notes, meetings, early drafts etc as well as the final report to train a specialised AI system to become an automated alignment/âgovernance etc researcher.
But my guess is just compiling this training data doesnât take that much time. All it takes is when the time comes you convince a lot of the relevant people and orgs to share old google docs of notes/âdrafts/âplans etc paired with the final product.
There will be a lot of infosec considerations here, so maybe each org will end up training their own AI based on their own internal data. I imagine this is what will happen for a lot of for-profit companies.
Making sure we donât delete old draft reports and meeting notes and things seems good here, but given storing google docs is so cheap and culling files is time-expensive, I think by default almost everyone just keeps most of their (at least textual) digital corpus anyway. Maybe there is some small intervention to make this work better though?
Compute: It certainly seems great for more compute to be spent on automated safety work versus automated capabilities work. But this is mainly a matter of how much money each party has to pay for compute. So lobbying for governments to spend lots on safety compute, or regulations to get companies to spend more on safety compute seems good, but this is a bit separate/âupstream from what you have in mind I think, it is more just âget key people to care more about safetyâ.
Post-training enhancements: we will be very useful for providing RLHF to tell a budding automated AI safety researcher how good each of its outputs is. Research taste is key here. This feels somewhat continuous with just âmanaging a fleet of AI research assistantsâ.
UI and complementary technologies: I donât think we have a comparative advantage here, and can just outsource this to human or AI contractors to build nice apps for us, or use generic apps on the market and just feed in our custom training data.
In terms of which applications to focus on, my guess is epistemic tools and coordination-enabling tools will mostly be built by default (though of course as you note additional effort can still speed them up some). E.g. politicians and business leaders and academics would all presumably love to have better predictions for which policies will be popular, what facts are true, which papers will replicate etc. And negotiation tools might be quite valuable for e.g. negotiating corporate mergers and deals.
So my take is that probably a majority of the game here is in âautomated AI safety/âgovernance/âstrategyâ because there will be less corporate incentive here, and it is also our comparative advantage to work on.
Overall, I agree differential AI tool development could be very important, but think the focus is mainly on providing high-quality training data and RLHF for automated AI safety research, which is somewhat narrower than what you describe.
Iâm not sure how much we actually disagree though, would be interested in your thoughts!
Throughout, I use âusâ to refer broadly to EA/âlongtermist/âexistential security type folks.
UI and complementary technologies: Iâm sort of confused about your claim about comparative advantage. Are you saying that there arenât people in this community whose comparative advantage might be designing UI? That would seem surprising.
More broadly, though:
Iâm not sure how much âwe can just outsource thisâ really cuts against the core of our argument (how to get something done is a question of tactics, and it could still be a strategic priority even if we just wanted to spend a lot of money on it)
I guess I feel, though, that youâre saying this wonât be a big bottleneck
I think that that may be true if youâre considering automated alignment research in particular. But Iâm not on board with that being the clear priority here
Yes, I suppose I am trying to divide tasks/âprojects up into two buckets based on whether they require high context and value-alignment and strategic thinking and EA-ness. And I think my claim was/âis that UI design is comparatively easy to outsource to someone without much of the relevant context and values. And therefore the comparative advantage of the higher-context people is to do things that are harder to outsource to lower-context people. But I know ~nothing about UI design, maybe being higher context is actually super useful.
Compute allocation: mostly I think that âget people to care moreâ does count as the type of thing we were talking about. But I think that itâs not just caring about safety, but also being aware ahead-of-time of the role that automated research may have to play in this, and when it may be appropriate to hit the gas and allocate a lot of compute to particular areas.
Which applications to focus on: I agree that epistemic tools and coordination-enabling tools will eventually have markets and so will get built at some point absent intervention. But this doesnât feel like a very strong argumentâthe whole point is that we may care about accelerating applications even if itâs not by a long period. And I donât think that these will obviously be among the most profitable applications people could make (especially if you can start specializing to the most high-leverage epistemic and coordination tools).
Also, we could make a similar argument that âautomated safetyâ research wonât get dropped, since itâs so obviously in the interests of whoeverâs winning the race.
Training data: I agree that the stuff youâre pointing to seems worthwhile. But I feel like youâve latched onto a particular type of training data, and youâre missing important categories, e.g.:
Epistemics stuffâthere are lots of super smart people earnestly trying to figure out very hard questions, and I think that if you could access their thinking, there would be a lot there which would compare favourably to a lot of the data that would be collected from people in this community. It wouldnât be so targeted in terms of the questions it addressed (e.g. âAI strategyâ, but learning good epistemics may be valuable and transfer over)
International negotiation, and high-stakes bargaining in generalâpotentially very important, but not something I think our community has any particular advantage at
Synthetic dataâa bunch of things may be unlocked more by working out how to enable âself-playâ (or the appropriate analogue), rather than just collecting more data the hard way
Yeah I think I agree with all this; I suppose since âweâ have the AI policy/âstrategy training data anyway that seems relatively low effort and high value to do, but yes if we could somehow get access to the private notes of a bunch of international negotiators that also seems very valuable! Perhaps actually asking top forecasters to record their working and meetings to use as training data later would be valuable, and I assume many people already do this by default (tagging @NunoSempere). Although of course having better forecasting AIs seems more dual-use than some of the other AI tools.