OscarD🔸 comments on AI Tools for Existential Security

OscarD🔸Mar 18, 2025, 7:22 PM
8 points
0 ∶ 0
Thanks for this, I hadn’t thought much about the topic and agree it seems more neglected than it should be. But I am probably overall less bullish than you (as operationalised by e.g. how many people in the existential risk field should be making this a significant focus: I am perhaps closer to 5% than your 30% at present).
I liked your flowchart on ‘Inputs in the AI application pipeline,’ so using that framing:
- Learning algorithms: I agree this is not very tractable for us^[1] to work on.
- Training data: This seems like a key thing for us to contribute, particularly at the post-training stage. By supposition, a large fraction of the most relevant work on AGI alignment, control, governance, and strategy has been done by ‘us’. I could well imagine that it would be very useful to get project notes, meetings, early drafts etc as well as the final report to train a specialised AI system to become an automated alignment/governance etc researcher.
  - But my guess is just compiling this training data doesn’t take that much time. All it takes is when the time comes you convince a lot of the relevant people and orgs to share old google docs of notes/drafts/plans etc paired with the final product.
    There will be a lot of infosec considerations here, so maybe each org will end up training their own AI based on their own internal data. I imagine this is what will happen for a lot of for-profit companies.
  - Making sure we don’t delete old draft reports and meeting notes and things seems good here, but given storing google docs is so cheap and culling files is time-expensive, I think by default almost everyone just keeps most of their (at least textual) digital corpus anyway. Maybe there is some small intervention to make this work better though?
- Compute: It certainly seems great for more compute to be spent on automated safety work versus automated capabilities work. But this is mainly a matter of how much money each party has to pay for compute. So lobbying for governments to spend lots on safety compute, or regulations to get companies to spend more on safety compute seems good, but this is a bit separate/upstream from what you have in mind I think, it is more just ‘get key people to care more about safety’.
- Post-training enhancements: we will be very useful for providing RLHF to tell a budding automated AI safety researcher how good each of its outputs is. Research taste is key here. This feels somewhat continuous with just ‘managing a fleet of AI research assistants’.
- UI and complementary technologies: I don’t think we have a comparative advantage here, and can just outsource this to human or AI contractors to build nice apps for us, or use generic apps on the market and just feed in our custom training data.
In terms of which applications to focus on, my guess is epistemic tools and coordination-enabling tools will mostly be built by default (though of course as you note additional effort can still speed them up some). E.g. politicians and business leaders and academics would all presumably love to have better predictions for which policies will be popular, what facts are true, which papers will replicate etc. And negotiation tools might be quite valuable for e.g. negotiating corporate mergers and deals.
So my take is that probably a majority of the game here is in ‘automated AI safety/governance/strategy’ because there will be less corporate incentive here, and it is also our comparative advantage to work on.
Overall, I agree differential AI tool development could be very important, but think the focus is mainly on providing high-quality training data and RLHF for automated AI safety research, which is somewhat narrower than what you describe.
I’m not sure how much we actually disagree though, would be interested in your thoughts!
1. ^
  Throughout, I use ‘us’ to refer broadly to EA/longtermist/existential security type folks.
- Owen Cotton-Barratt Mar 20, 2025, 10:53 PM
  6 points
  0 ∶ 0
  Parent
  UI and complementary technologies: I’m sort of confused about your claim about comparative advantage. Are you saying that there aren’t people in this community whose comparative advantage might be designing UI? That would seem surprising.
  More broadly, though:
  - I’m not sure how much “we can just outsource this” really cuts against the core of our argument (how to get something done is a question of tactics, and it could still be a strategic priority even if we just wanted to spend a lot of money on it)
  - I guess I feel, though, that you’re saying this won’t be a big bottleneck
    I think that that may be true if you’re considering automated alignment research in particular. But I’m not on board with that being the clear priority here
  - OscarD🔸Mar 23, 2025, 3:49 PM
    4 points
    1 ∶ 0
    Parent
    Yes, I suppose I am trying to divide tasks/projects up into two buckets based on whether they require high context and value-alignment and strategic thinking and EA-ness. And I think my claim was/is that UI design is comparatively easy to outsource to someone without much of the relevant context and values. And therefore the comparative advantage of the higher-context people is to do things that are harder to outsource to lower-context people. But I know ~nothing about UI design, maybe being higher context is actually super useful.
- Owen Cotton-Barratt Mar 20, 2025, 10:49 PM
  6 points
  0 ∶ 0
  Parent
  Compute allocation: mostly I think that “get people to care more” does count as the type of thing we were talking about. But I think that it’s not just caring about safety, but also being aware ahead-of-time of the role that automated research may have to play in this, and when it may be appropriate to hit the gas and allocate a lot of compute to particular areas.
- Owen Cotton-Barratt Mar 20, 2025, 10:58 PM
  4 points
  0 ∶ 0
  Parent
  Which applications to focus on: I agree that epistemic tools and coordination-enabling tools will eventually have markets and so will get built at some point absent intervention. But this doesn’t feel like a very strong argument—the whole point is that we may care about accelerating applications even if it’s not by a long period. And I don’t think that these will obviously be among the most profitable applications people could make (especially if you can start specializing to the most high-leverage epistemic and coordination tools).
  Also, we could make a similar argument that “automated safety” research won’t get dropped, since it’s so obviously in the interests of whoever’s winning the race.
- Owen Cotton-Barratt Mar 20, 2025, 10:47 PM
  4 points
  0 ∶ 0
  Parent
  Training data: I agree that the stuff you’re pointing to seems worthwhile. But I feel like you’ve latched onto a particular type of training data, and you’re missing important categories, e.g.:
  - Epistemics stuff—there are lots of super smart people earnestly trying to figure out very hard questions, and I think that if you could access their thinking, there would be a lot there which would compare favourably to a lot of the data that would be collected from people in this community. It wouldn’t be so targeted in terms of the questions it addressed (e.g. “AI strategy”, but learning good epistemics may be valuable and transfer over)
  - International negotiation, and high-stakes bargaining in general—potentially very important, but not something I think our community has any particular advantage at
  - Synthetic data—a bunch of things may be unlocked more by working out how to enable “self-play” (or the appropriate analogue), rather than just collecting more data the hard way
  - OscarD🔸Mar 23, 2025, 4:01 PM
    2 points
    0 ∶ 0
    Parent
    Yeah I think I agree with all this; I suppose since ‘we’ have the AI policy/strategy training data anyway that seems relatively low effort and high value to do, but yes if we could somehow get access to the private notes of a bunch of international negotiators that also seems very valuable! Perhaps actually asking top forecasters to record their working and meetings to use as training data later would be valuable, and I assume many people already do this by default (tagging @NunoSempere). Although of course having better forecasting AIs seems more dual-use than some of the other AI tools.