AI Tools for Existential Security

LizkaMar 14, 2025, 6:37 PM

64 points

Differential progress Existential risk AI safety Global catastrophic risk Forecasting

Rapid AI progress is the greatest driver of existential risk in the world today. But — if handled correctly — it could also empower humanity to face these challenges.

Executive summary

1. Some AI applications will be powerful tools for navigating existential risks

Three clusters of applications are especially promising:

Epistemic applications to help us anticipate and plan for emerging challenges
- e.g. high-quality AI assistants could prevent catastrophic decisions by helping us make sense of rapidly evolving situations
Coordination-enabling applications to help diverse groups work together towards shared goals
- e.g. automated negotiation could help labs and nations to find and commit to mutually desirable alternatives to racing
Risk-targeted applications to address specific challenges
- e.g. automating alignment research could make the difference between “It’s functionally impossible to bring alignment up to the requisite standard in time” and “this is just an issue of devoting enough compute to it”

2. We can accelerate these tools instead of waiting for them to emerge

While broad AI progress will drive the development of many applications, we have some flexibility in the timing of specific applications — and even small speed-ups could be crucial (e.g. by switching the order of risk-generating capabilities and risk-reducing ones)
We could use a variety of strategies to accelerate beneficial applications:
- Data pipelines & scaffolding: by curating datasets or scaffolding for key capabilities, or laying the groundwork to automate this, we could enable those capabilities as soon as underlying AI progress supports them
- Complementary tech & removing other barriers to adoption: by building out the UI or other complementary technology, and ensuring that people are eager to use the applications, we could enable the applications to see use as soon as the underlying capabilities are there, rather than accept delays to adoption
- Shaping compute allocation: by building support among key decision-makers who might allocate compute, we could ensure that crucial applications are among the earliest to see large amounts of automated research
Accelerating beneficial applications can often be done unilaterally (in contrast to delaying dangerous capabilities, which may need consensus)

Implications

These opportunities seem undervalued in existential risk work. We think a lot more people should work on this — and the broader “differential AI development” space. Our recommendations:

Shift towards accelerating important AI tools
- e.g. curate datasets for automating alignment research; or build AI forecasting systems
Plan for a world with abundant cognition
- Some new approaches will come online, and some current work may be obsoleted
- e.g. it could make sense to build tools that process rich information to provide bespoke infectious disease exposure advice in contact tracing apps
Get ready to help with automation
- e.g. build relevant expertise, or work towards institutional buy-in

Some AI applications will help navigate existential risks

Epistemic applications

People are more likely to handle novel challenges well if they can see them coming clearly, and have good ideas about what could be done about them.

Examples of promising epistemic applications^[1]
Applications	How they might help
AI forecasting tools	High quality forecasts, especially of novel technological developments and their strategic implications, could help us to anticipate and prepare for key challenges. Sufficiently trusted AI systems with strong general track records could help to align expectations between parties.
AI for collective epistemics	AI systems that do high-quality fact-checking (or evaluate other systems for how truthful or enlightening they are) could help people to stay oriented to what is reliable in the world, and avoid failures of coordination from misplaced trust.
AI for philosophy	By helping people to engage in moral reflection, or directly tackling hard philosophical questions, AI systems might help humanity to avoid subtle but catastrophic moral errors. Moreover, poor philosophical grounding could lead superintelligent AI systems to go off the rails in some ways.

Coordination-enabling applications

Local incentives sometimes prevent groups from achieving outcomes that would benefit everyone. This may make navigating key challenges — for example, coordinating to go slow enough with AI development that we can be justifiably confident it is safe — extremely difficult. Some AI applications could help people to coordinate and avoid such failures.

Examples of promising coordination-enabling applications
Applications	How they might help
Automated negotiation tools	Negotiation processes often fail to find the best mutually-desirable outcomes — especially when time is limited, there are many parties involved, or when it’s hard to exchange information openly. AI tools could relieve bandwidth issues, or permit the perfectly confidential processing of relevant private information.
Automated treaty verification and/or enforcement tools	In some cases, all decision-makers would be happy with a potential agreement if they could trust that everyone would follow it, but trust issues prevent the agreement. Verification systems can mitigate the issue, and AI systems can improve them (e.g. by improving monitoring systems, or by serving as arms inspectors who could be trusted not to leak sensitive information). Sufficiently robust AI systems could even be empowered to enforce certain treaty provisions.
AI tools for structured transparency	Technological progress may lead to a position where it is easy to construct extremely destructive weapons. AI monitoring could ensure, for instance, that people weren’t building weapons or help developers understand how people are using advanced AI models — without creating the privacy issues normally associated with surveillance.

Better coordination tools also have the potential to cause harm. Notably, some tools could empower small cliques to gain and maintain power at the expense of the rest of society. And commitment tools in particular are potentially dangerous, if they lead to a race to extort opposition by credibly threatening harm; or if humanity “locks in” certain choices before we are really wise enough to choose correctly.^[2]

Risk-targeted applications

Examples of promising risk-targeted applications
Applications	How they might help
Automating research into AI safety, such as theoretical alignment, mechanistic interpretability, or AI control	If these areas are automated early enough relative to the automation of research into AI capabilities, safety techniques might keep up with increasingly complex systems. This could make the difference in whether we lose control of the world to misaligned power-seeking AI systems.^[3]
AI tools for greatly improving information security	Strong information security could limit the proliferation of powerful AI models, which could facilitate coordinating not to race forwards as fast as possible. It could also reduce the risk of rogue models self-exfiltrating.
AI-enabled monitoring systems for pandemic pathogens	Screening systems could prevent malicious actors from synthesizing new pandemic-capable viruses. AI-assisted biosurveillance could detect transmission of threatening viruses early enough to contain them.

Other applications?

Applications outside of these three categories might still meaningfully help. For instance, if food insecurity increases the risk of war, and war increases the risk of existential catastrophe, then AI applications that boost crop production might indirectly lower existential risk.

But we guess that the highest priority applications will fall into the categories listed above^[4], each of which focuses on a crucial step for navigating looming risks and opportunities:

Identifying the challenges we’re facing and which strategies could help navigate them;
Coordinating on those strategies;
And actually implementing the strategies.

We can accelerate helpful AI tools

There’s meaningful room to accelerate some applications

To some extent, market forces will ensure that valuable AI applications are developed not too long after they become viable. It’s hard to imagine counterfactually moving a key application forward by decades.

But the market has gaps, and needs time to work. AI is a growth industry — lots of money and talent is flowing in because the available opportunities exceed the degree to which they’re already being taken. So we should expect there to be some room to counterfactually accelerate any given application by shifting undersupplied capital and labour towards it.

In some cases this room might be only a few months or weeks. This is especially likely for the most obviously economically valuable applications, or those which are “in vogue”. Other applications may be less incentivized, harder to envision, or blocked by other constraints. It may be possible to accelerate these by many months or even years.

Moreover, minor differences in timing could be significant. Even if the speed-up we achieve is relatively small or the period during which the effects of our speed-up persist is short, the effects could matter a lot.

This is because, at a time of rapid progress in AI:

Small differences in time could represent major differences in capability level
- What is “technologically feasible” given general AI capabilities may be shifting rapidly, so that keeping up with that frontier instead of lagging months behind it could mean a big difference in practice
Small boosts could flip the ordering of key capabilities
- Achieving risk-reducing capabilities before^[5] the risk-generating capabilities they correspond to could have a big impact on outcomes

Minor differences in timing could be significant; graphic of curves, showing that it could boost performance or flip the ordering of relevant capabilities.

There are promising strategies for accelerating specific AI applications

We have promising strategies that focus on almost all^[6] of the major inputs in the development of an AI application.

The application pipeline; learning algorithms, training data, compute, post-training enhancements, UI & complementary tech, user demand

1. Invest in the data pipeline (including task-evaluation)

High-quality task-specific data is crucial for training AI models and improving their performance on specific tasks (e.g. via fine-tuning), and it’s hard to get high-quality data (or other training signals) in some areas. So it could be very useful to:

Curate specialized datasets
- e.g. if we want to accelerate AI systems which can support people in avoiding making big mistakes in their decision-making, maybe we should collect datasets of identified errors
- e.g. if we want to speed up automation of research in a given field, maybe we should try to collect and share more intermediate research products (working notes, conversations) for future training, or build infrastructure to do this
Define robust task-evaluation schemes
- Metrics (benchmarks or other ways to grade performance) tend to accelerate the development of systems that perform well on the associated tasks — principally because they can enable rapid, automated learning (as in the case of self-play for AlphaGo), but also because they can become targets for developers
- So it may be high-leverage to develop evaluation schemes for performance on tasks we care about^[7]
- e.g. if we want to improve automated negotiation tools, we might invest in benchmarks for assessing the quality of tool performance

2. Work on scaffolding and post-training enhancements

Techniques like scaffolding can significantly boost pre-trained models’ performance on specific tasks. And even if the resulting improvement is destined to be made obsolete by the next generation of models, the investment could be worth it if the boost falls during a critical period or create compounding benefits (e.g. via enabling faster production of high-quality task-relevant data).

3. Shape the allocation of compute

As R&D is automated, choices about where compute is spent will increasingly determine the rate of progress on different applications.^[8] Indeed, under inference paradigms this is true more broadly than just for R&D — larger compute investment may give better application performance. This means it could be very valuable to get AI company leadership, governments, or other influential actors on board with investing in key applications.

4. Address non-AI barriers

For some applications, the main bottleneck to adoption won’t be related to underlying AI technologies. Instead of focusing on AI systems, it might make sense to:

Improve user interfaces or build a more accessible version of an existing application
Increase demand, for instance by working on reputational issues for a given application (e.g. just building a trusted brand!), or ensuring that key institutions are prepared to adopt important applications quickly
Develop complementary technologies (e.g. some privacy-conscious actors might use certain AI applications only with sufficiently good privacy systems)

Different situations will call for different strategies. The best approach will be determined by:

The likeliest bottlenecks for a given application
The target timeline (some strategies take longer to pay off or help at different technological levels)
The levers that are available

The most effective implementation of one of these strategies won’t always be the most direct one. For instance, if high-quality data is the key bottleneck, setting up a prize for better benchmarks might be more valuable than directly collecting the data. But sometimes the best approach for accelerating an application further down the line will involve simply building or improving near-term versions of the application, to encourage more investment.

These methods can generally be pursued unilaterally. In contrast, delaying an application that you think is harmful might more frequently require building consensus. (We discuss this in more detail in an appendix.)

Implications for work on existential risk reduction

Five years ago, working on accelerating AI applications in a targeted way would have seemed like a stretch. Today, it seems like a realistic and viable option. For the systems of tomorrow, we suspect it will seem obvious — and we’ll wish that we’d started sooner.

The existential risk community has started recognizing this shift, but we don’t think it’s been properly priced in.

This is an important opportunity — as argued above, some AI applications will help navigate existential risks and can be meaningfully accelerated — and it seems more tractable than much other work. Moreover, as AI capabilities rise, AI systems will be responsible for increasing fractions of important work — likely at some point a clear majority. Shaping those systems to be doing more useful work seems like a valuable (and increasingly valuable) opportunity for which we should begin preparing for now.

We think many people focused on existential risk reduction should move into this area. Compared to direct technical interventions, we think this will often be higher leverage because of the opportunity to help direct much larger quantities of cognitive labour, and because it is under-explored relative to its importance. Compared to more political interventions, it seems easier for many people to contribute productively in this area, since they can work in parallel rather than jostling for position around a small number of important levers.^[9] By the time these applications are a big deal, we think it could easily make sense for more than half of the people focusing on existential risk working on related projects. And given how quickly capabilities seem to be advancing, and the benefits of being in a field early, we think a significant fraction — perhaps around 30% — of people in the existential risk field should be making this a focus today.

What might this mean, in practice?

1. Shift towards accelerating important AI tools

Speed up AI for current existential security projects

If you’re tackling an important problem^[10], consider how future AI applications could transform your work. There might already be some benefits to using AI^[11] — and using AI applications earlier than might seem immediately useful could help you to learn how to automate the work more quickly. You could also take direct steps to speed up automation in your area, by:

Identifying good tasks to automate — ones that are:
- Worth scaling up (not rare, one-off tasks)
- Self-contained, with clear inputs and outputs
- Within reach
Redesigning your processes to be automation-friendly — using standard templates, clear documentation, etc.
Gathering and documenting examples of good work for training current or future AI systems

Work on new AI projects for existential security

You might also accelerate important AI applications by:

Starting or joining new projects (e.g. startups) that are building those applications
Joining or partnering existing institutions which are holding key functions (e.g. democratic oversight), and helping them with automation

2. Plan for a world with abundant cognition

As AI automates more cognitive tasks, strategies that were once impractically labour-intensive may become viable. We should look for approaches that scale with more cognitive power, or use its abundance to bypass other bottlenecks.

Newly-viable strategies might include, e.g.:

In epistemic tools — automatically propagating updates from each new piece of information through a knowledge database, to highlight places where it might require a significant rethink of strategy
In coordination tools — exploring a large range of possible agreements, and examining what different potential coalitions might think of them, in order to identify the best directions to go in
In biosecurity — processing rich information on a bespoke person-by-person basis, to enable significantly better contact tracing apps
In AI safety — modeling and assessing the alignment impacts of different possible updates to model weights, and using these assessments to make smarter updates than blind gradient descent

The other side of this coin is that some current work is likely to soon be obsolete. When it’s a realistic option to just wait and have it done cheaply later, that could let us focus on other things in the short term.

3. Get ready to help with automation

Our readiness-to-automate isn’t a fixed variable. If automation is important — and getting more so — then helping to ensure that the ecosystem as a whole is prepared for it is a high priority.

This could include:

Positioning yourself in the job market, by joining relevant startups or existing institutions that will need to automate things well
Getting experience working with state-of-the-art AI tools, and using them to do things
Developing deep expertise about domains that may be particularly valuable to automate
Investing further in strategy and prioritization to help the field as a whole to orient
Creating infrastructure to help people coordinate and share knowledge about how best to automate high-value areas
Fostering understanding, buy-in, or community among relevant people

Further context

In appendices, we discuss:

Whether accelerating applications could be bad via speeding up AI progress in general
Dynamics that might make it harder to meaningfully counterfactually accelerate an application (and when meaningful acceleration still looks achievable)
How our proposal relates to existing concepts, such as def/acc, differential technological development, and differential AI development
Within “differential AI development,” the distinction between differential application development and differential paradigm development, and the relative advantages of each
Why we’ve focused on accelerating risk-reducing applications rather than slowing down risky applications

Acknowledgements

We’re grateful to Max Dalton, Will MacAskill, Raymond Douglas, Lukas Finnveden, Tom Davidson, Joe Carlsmith, Vishal Maini, Adam Bales, Andreas Stuhlmüller, Fin Moorhouse, Davidad, Rose Hadshar, Nate Thomas, Toby Ord, Ryan Greenblatt, Eric Drexler, and many others for comments on earlier drafts and conversations that led to this work. Owen’s work was supported by the Future of Life Foundation.

^
See e.g. Lukas Finnveden’s post on AI for epistemics for further discussion of this area.
^
There is a bit more discussion of potential downsides in section 5 of this paper: here.
^
This strategy has been discussed in many places, e.g. here, here, here, and here.
^
Not that there is anything definitive about this categorization; we’d encourage people to think about what’s crucial from a variety of different angles.
^
And note it’s not just about the ordering of the capabilities, but about whether we have them in a timely fashion so that systems that need to be built on top of them actually get built.
^
The main exceptions are learning algorithms and in most cases architectures, which are typically too general to differentially accelerate specific applications.
^
In some cases, optimizing for a task metric may result in spillover capabilities on other tasks. The ideal metric from a differential acceleration perspective is one which has less of this property; although some spillover doesn’t preclude getting differential benefits at the targeted task.
^
AI companies are already spending compute on things like generating datasets to train or fine-tune models with desired properties and RL for improving performance in specific areas. As more of AI R&D is automated (and changing research priorities becomes as easy as shifting compute spending), key decision-makers will have more influence and fine-grained control on the direction of AI progress.
^
This work may also be more promising than policy-oriented work if progress in AI capabilities outpaces governments’ ability to respond
^
Although you should also be conscious that “which problems are important” may be changing fairly rapidly!
^
As we write this (in March 2025) we suspect that a lot of work is on the cusp of automation — not that there are obvious huge returns to automation, but that there are some, and they’re getting bigger over time.

What links here?

AI Safety & Entrepreneurship v1.0 by Chris_Leong (LessWrong; Apr 26, 2025, 2:37 PM; 16 points)

LizkaMar 14, 2025, 6:37 PM

64 points

10 comments11 min readEA link

Differential progress Existential risk AI safety Global catastrophic risk Forecasting

Lizka Mar 14, 2025, 7:32 PM
15 points
0 ∶ 0

I wrote a Twitter thread that summarizes this piece and has a lot of extra images (I probably went overboard, tbh.)
I kinda wish I’d included the following image in the piece itself, so I figured I’d share it here:
$As AI capabilities rise, AI systems will be responsible for a growing fraction of relevant work$
Jacob Watts🔸Apr 24, 2025, 7:07 PM
14 points
0 ∶ 0

I love to see stuff like this!
It has been a pleasure reading this, listening to your podcast episode, and trying to really think it through,
This reminds me of a few other things I have seen lately like Superalignment, Joe Carlsmith’s recent “AI for AI Safety”, and the recent 80,000 Hours Podcast with Will McAskill.
I really appreciate the “Tools for Existential Security” framing. Your example applications were on point and many of them brought up things I hadn’t even considered. I enjoy the idea of rapidly solving lots of coordination failures.

This sort of DAID approach feels like an interesting continuation on other ideas about differential acceleration and the vulnerable world hypothesis. Trying to get this right can feel like some combination of applied ethics and technology forecasting.
Probably one of the weirdest or most exciting applications you suggest is AI for philosophy. You put it under the “Epistemics” category. I usually think of epistemics as a sub-branch of philosophy, but I think I get what you mean. AI for this sort of thing remains exciting, but very abstract to me.

What a heady thing to think about; really exciting stuff! There is something very cosmic about the idea of using AI research and cognition for ethics, philosophy, and automated wisdom. (I have been meaning to read “Winners of the Essay competition on the Automation of Wisdom and Philosophy”). I strongly agree that since AI comes with many new philosophically difficult and ethically complex questions, it would be amazing if we could use AI to face these.
The section on how to accelerate helpful AI tools was nice too.
Appendix 4 was gold. The DPD framing is really complimentary to the rest of the essay. I can totally appreciate the distinction you are making, but I also see DPD as bleeding into AI for Existential Safety a lot as well. Such mixed feelings. Like, for one thing, you certainly wouldn’t want to be deploying whack AI in your “save the world” cutting edge AI startup.

And it seems like there is a good case for thinking about doing better pre-training and finding better paradigms if you are going to be thinking about safer AI development and deployment a lot anyways. Maybe I am missing something about the sheer economics of not wanting to actually do pre-training ever.
In any case, I thought your suggestions around aiming for interpretable, robust, safe paradigms were solid. Paradigm-shaping and application-shaping are both interesting.

***
I really appreciate that this proposal is talking about building stuff! And that it can be done ~unilaterally. I think that’s just an important vibe and an important type of project to have going.
I also appreciate that you said in the podcast that this was only one possible framing / clustering. Although you also say “we guess that the highest priority applications will fall into the categories listed above” which seems like a potentially strong claim.
I have also spent some time thinking about which forms of ~research / cognitive labor would be broadly good to accelerate for similar existential security reasons and I kind of tried to retrospectively categorize some notes I had made with your framing. I had some ideas that were hard to categorize cleanly into epistemics, coordination, or direct risk targeting.
I included a few more ideas for areas where AI tools, marginal automated research, and cognitive abundance might be well applied. I was going for a similar vibe, so I’m sorry if I overlap a lot. I will try to only mention things you didn’t explicitly suggest:
Epistemics:
- you mention bench-marking as a strategy for accelerating specific AI applications, but it also deserves mention as an epistemic tool. METR style elicitation tests too
  - I should say up front that I don’t know if, due to acceleration and iteration effects you mention, eg. FrontierMath and lastexam.ai are explicitly “race-dynamic-accelerating in a way that overshadows their epistemic usefulness; even METR’s task horizon metric could be accused here.
  - From a certain perspective, I would consider benchmarks and METR elicitation tests natural compliments to mech interp and AI capabilities forecasting
  - this would include capabilities and threats assessment (hopefully we can actively iterate downwards on risk assessment scores)
- broad economics and societal impact research
  - the effects of having AI more or less “do the economy” seem vast and differentially accelerating understanding and strategy there seems like a non-trivial application relevant to the long term future of humanity
  - wealth inequality and the looming threat of mass unemployment (at minimum, this seems important for instability and coordination reasons even if one were too utilitarian / long termist to care for normal reasons)
- I think it would be good to accelerate “Risk evaluation” in a sense that I think was defined really elegantly by Joe Carlsmith in “Paths and waystations in AI safety” ^[1]
  - building naturally from there, forecasting systems could be specifically applied to DAID and DPD; I know this is a little “ouroboros” to suggest but I think it works
Coordination-enabling:
- movement building research and macro-strategy, AI-fueled activism, political coalition building, AI research into and tools for strengthening democracy
- auto research into deliberative mini-publics, improved voting systems (eg. ranked choice, liquid democracy, quadratic voting, anti-gerrymandering solutions), secure digital voting platforms, improved checks and balances (eg. strong multi-stakeholder oversight, whistleblower protections, human rights), non-censorship oriented solutions to misinformation
Risk-targeting:
- I know it is not the main thrust of “existential security”, but I think it worth considering the potential for “abundant cognition” to welfare / sentience research (eg. bio and AI). This seems really important from a lot of perspectives, for a lot of reasons:
  - AI Safety might be worse if the AIs are “discontent”
  - we could lock in a future where most people are suffering terribly which would not count as existential security
  - it seems worthwhile to know if the AI workers are suffering ASAP for normal “avoid doing moral catastrophes” reasons
  - we could unlock huge amounts of welfare or learn to avoid huge amount of pain; (cf. “hedonium” or the Far Out Initiative)
  That said, I have not really considered the offense / defense balance here. We may discover how to simulate suffering for much cheaper than pleasure or something horrendous like that. Or there might be info hazards. This space seems so high stakes and hard to chart.
Some mix:
- Certain forms of monitoring and openly researching other people’s actions seem like a mix of epistemics and coordination. For example, I had listed some stuff about ie. AI for broadly OSINT-based investigative journalism, AI lab watch, legislator scorecards, and similar. These are kind of information for the sake of coordination.
I know I included some moonshots. This all depends on what AI systems we are talking about and what they are actually helpful with I guess. I would hate for EA to bet too hard on any of this stuff and accidentally flood the zone of key areas with LLM “slop” or whatever.
Also, to state the obvious, there may be some risk of correlated exposure if you pin too much of your existential security with the crucial aid of unreliable, untrustworthy AIs. Maybe Hal 9000 isn’t always the entity to trust with your most critical security.

Lots to think about here! Thanks!
1. ^
  Joe Carlsmith: “Risk evaluation tracks the safety range and the capability frontier, and it forecasts where a given form of AI development/deployment will put them.
  Paradigm examples include:
  evals for dangerous capabilities and motivations;
  forecasts about where a given sort of development/deployment will lead (e.g., via scaling laws, expert assessments, attempts to apply human and/or AI forecasting to relevant questions, etc);
  general improvements to our scientific understanding of AI
  structured safety cases and/or cost-benefit analyses that draw on this information.”
OscarD🔸Mar 18, 2025, 7:22 PM
8 points
0 ∶ 0

Thanks for this, I hadn’t thought much about the topic and agree it seems more neglected than it should be. But I am probably overall less bullish than you (as operationalised by e.g. how many people in the existential risk field should be making this a significant focus: I am perhaps closer to 5% than your 30% at present).
I liked your flowchart on ‘Inputs in the AI application pipeline,’ so using that framing:
- Learning algorithms: I agree this is not very tractable for us^[1] to work on.
- Training data: This seems like a key thing for us to contribute, particularly at the post-training stage. By supposition, a large fraction of the most relevant work on AGI alignment, control, governance, and strategy has been done by ‘us’. I could well imagine that it would be very useful to get project notes, meetings, early drafts etc as well as the final report to train a specialised AI system to become an automated alignment/governance etc researcher.
  - But my guess is just compiling this training data doesn’t take that much time. All it takes is when the time comes you convince a lot of the relevant people and orgs to share old google docs of notes/drafts/plans etc paired with the final product.
    There will be a lot of infosec considerations here, so maybe each org will end up training their own AI based on their own internal data. I imagine this is what will happen for a lot of for-profit companies.
  - Making sure we don’t delete old draft reports and meeting notes and things seems good here, but given storing google docs is so cheap and culling files is time-expensive, I think by default almost everyone just keeps most of their (at least textual) digital corpus anyway. Maybe there is some small intervention to make this work better though?
- Compute: It certainly seems great for more compute to be spent on automated safety work versus automated capabilities work. But this is mainly a matter of how much money each party has to pay for compute. So lobbying for governments to spend lots on safety compute, or regulations to get companies to spend more on safety compute seems good, but this is a bit separate/upstream from what you have in mind I think, it is more just ‘get key people to care more about safety’.
- Post-training enhancements: we will be very useful for providing RLHF to tell a budding automated AI safety researcher how good each of its outputs is. Research taste is key here. This feels somewhat continuous with just ‘managing a fleet of AI research assistants’.
- UI and complementary technologies: I don’t think we have a comparative advantage here, and can just outsource this to human or AI contractors to build nice apps for us, or use generic apps on the market and just feed in our custom training data.
In terms of which applications to focus on, my guess is epistemic tools and coordination-enabling tools will mostly be built by default (though of course as you note additional effort can still speed them up some). E.g. politicians and business leaders and academics would all presumably love to have better predictions for which policies will be popular, what facts are true, which papers will replicate etc. And negotiation tools might be quite valuable for e.g. negotiating corporate mergers and deals.
So my take is that probably a majority of the game here is in ‘automated AI safety/governance/strategy’ because there will be less corporate incentive here, and it is also our comparative advantage to work on.
Overall, I agree differential AI tool development could be very important, but think the focus is mainly on providing high-quality training data and RLHF for automated AI safety research, which is somewhat narrower than what you describe.
I’m not sure how much we actually disagree though, would be interested in your thoughts!
1. ^
  Throughout, I use ‘us’ to refer broadly to EA/longtermist/existential security type folks.
- Owen Cotton-Barratt Mar 20, 2025, 10:53 PM
  6 points
  0 ∶ 0
  Parent
  
  UI and complementary technologies: I’m sort of confused about your claim about comparative advantage. Are you saying that there aren’t people in this community whose comparative advantage might be designing UI? That would seem surprising.
  More broadly, though:
  - I’m not sure how much “we can just outsource this” really cuts against the core of our argument (how to get something done is a question of tactics, and it could still be a strategic priority even if we just wanted to spend a lot of money on it)
  - I guess I feel, though, that you’re saying this won’t be a big bottleneck
    I think that that may be true if you’re considering automated alignment research in particular. But I’m not on board with that being the clear priority here
  - OscarD🔸Mar 23, 2025, 3:49 PM
    4 points
    1 ∶ 0
    Parent
    
    Yes, I suppose I am trying to divide tasks/projects up into two buckets based on whether they require high context and value-alignment and strategic thinking and EA-ness. And I think my claim was/is that UI design is comparatively easy to outsource to someone without much of the relevant context and values. And therefore the comparative advantage of the higher-context people is to do things that are harder to outsource to lower-context people. But I know ~nothing about UI design, maybe being higher context is actually super useful.
- Owen Cotton-Barratt Mar 20, 2025, 10:49 PM
  6 points
  0 ∶ 0
  Parent
  
  Compute allocation: mostly I think that “get people to care more” does count as the type of thing we were talking about. But I think that it’s not just caring about safety, but also being aware ahead-of-time of the role that automated research may have to play in this, and when it may be appropriate to hit the gas and allocate a lot of compute to particular areas.
- Owen Cotton-Barratt Mar 20, 2025, 10:58 PM
  4 points
  0 ∶ 0
  Parent
  
  Which applications to focus on: I agree that epistemic tools and coordination-enabling tools will eventually have markets and so will get built at some point absent intervention. But this doesn’t feel like a very strong argument—the whole point is that we may care about accelerating applications even if it’s not by a long period. And I don’t think that these will obviously be among the most profitable applications people could make (especially if you can start specializing to the most high-leverage epistemic and coordination tools).
  Also, we could make a similar argument that “automated safety” research won’t get dropped, since it’s so obviously in the interests of whoever’s winning the race.
- Owen Cotton-Barratt Mar 20, 2025, 10:47 PM
  4 points
  0 ∶ 0
  Parent
  
  Training data: I agree that the stuff you’re pointing to seems worthwhile. But I feel like you’ve latched onto a particular type of training data, and you’re missing important categories, e.g.:
  - Epistemics stuff—there are lots of super smart people earnestly trying to figure out very hard questions, and I think that if you could access their thinking, there would be a lot there which would compare favourably to a lot of the data that would be collected from people in this community. It wouldn’t be so targeted in terms of the questions it addressed (e.g. “AI strategy”, but learning good epistemics may be valuable and transfer over)
  - International negotiation, and high-stakes bargaining in general—potentially very important, but not something I think our community has any particular advantage at
  - Synthetic data—a bunch of things may be unlocked more by working out how to enable “self-play” (or the appropriate analogue), rather than just collecting more data the hard way
  - OscarD🔸Mar 23, 2025, 4:01 PM
    2 points
    0 ∶ 0
    Parent
    
    Yeah I think I agree with all this; I suppose since ‘we’ have the AI policy/strategy training data anyway that seems relatively low effort and high value to do, but yes if we could somehow get access to the private notes of a bunch of international negotiators that also seems very valuable! Perhaps actually asking top forecasters to record their working and meetings to use as training data later would be valuable, and I assume many people already do this by default (tagging @NunoSempere). Although of course having better forecasting AIs seems more dual-use than some of the other AI tools.
OscarD🔸Apr 17, 2025, 6:03 PM
4 points
0 ∶ 0

I just remembered another sub-category that seems important to me: AI-enabled very accurate lie detection. This could be useful for many things, but most of all for helping make credible commitments in high-stakes US-China ASI negotiations.