Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
I wrote a Twitter thread that summarizes this piece and has a lot of extra images (I probably went overboard, tbh.)
I kinda wish I’d included the following image in the piece itself, so I figured I’d share it here:
I love to see stuff like this!
It has been a pleasure reading this, listening to your podcast episode, and trying to really think it through,
This reminds me of a few other things I have seen lately like Superalignment, Joe Carlsmith’s recent “AI for AI Safety”, and the recent 80,000 Hours Podcast with Will McAskill.
I really appreciate the “Tools for Existential Security” framing. Your example applications were on point and many of them brought up things I hadn’t even considered. I enjoy the idea of rapidly solving lots of coordination failures.
This sort of DAID approach feels like an interesting continuation on other ideas about differential acceleration and the vulnerable world hypothesis. Trying to get this right can feel like some combination of applied ethics and technology forecasting.
Probably one of the weirdest or most exciting applications you suggest is AI for philosophy. You put it under the “Epistemics” category. I usually think of epistemics as a sub-branch of philosophy, but I think I get what you mean. AI for this sort of thing remains exciting, but very abstract to me.
What a heady thing to think about; really exciting stuff! There is something very cosmic about the idea of using AI research and cognition for ethics, philosophy, and automated wisdom. (I have been meaning to read “Winners of the Essay competition on the Automation of Wisdom and Philosophy”). I strongly agree that since AI comes with many new philosophically difficult and ethically complex questions, it would be amazing if we could use AI to face these.
The section on how to accelerate helpful AI tools was nice too.
Appendix 4 was gold. The DPD framing is really complimentary to the rest of the essay. I can totally appreciate the distinction you are making, but I also see DPD as bleeding into AI for Existential Safety a lot as well. Such mixed feelings. Like, for one thing, you certainly wouldn’t want to be deploying whack AI in your “save the world” cutting edge AI startup.
And it seems like there is a good case for thinking about doing better pre-training and finding better paradigms if you are going to be thinking about safer AI development and deployment a lot anyways. Maybe I am missing something about the sheer economics of not wanting to actually do pre-training ever.
In any case, I thought your suggestions around aiming for interpretable, robust, safe paradigms were solid. Paradigm-shaping and application-shaping are both interesting.
***
I really appreciate that this proposal is talking about building stuff! And that it can be done ~unilaterally. I think that’s just an important vibe and an important type of project to have going.
I also appreciate that you said in the podcast that this was only one possible framing / clustering. Although you also say “we guess that the highest priority applications will fall into the categories listed above” which seems like a potentially strong claim.
I have also spent some time thinking about which forms of ~research / cognitive labor would be broadly good to accelerate for similar existential security reasons and I kind of tried to retrospectively categorize some notes I had made with your framing. I had some ideas that were hard to categorize cleanly into epistemics, coordination, or direct risk targeting.
I included a few more ideas for areas where AI tools, marginal automated research, and cognitive abundance might be well applied. I was going for a similar vibe, so I’m sorry if I overlap a lot. I will try to only mention things you didn’t explicitly suggest:
Epistemics:
you mention bench-marking as a strategy for accelerating specific AI applications, but it also deserves mention as an epistemic tool. METR style elicitation tests too
I should say up front that I don’t know if, due to acceleration and iteration effects you mention, eg. FrontierMath and lastexam.ai are explicitly “race-dynamic-accelerating in a way that overshadows their epistemic usefulness; even METR’s task horizon metric could be accused here.
From a certain perspective, I would consider benchmarks and METR elicitation tests natural compliments to mech interp and AI capabilities forecasting
this would include capabilities and threats assessment (hopefully we can actively iterate downwards on risk assessment scores)
broad economics and societal impact research
the effects of having AI more or less “do the economy” seem vast and differentially accelerating understanding and strategy there seems like a non-trivial application relevant to the long term future of humanity
wealth inequality and the looming threat of mass unemployment (at minimum, this seems important for instability and coordination reasons even if one were too utilitarian / long termist to care for normal reasons)
I think it would be good to accelerate “Risk evaluation” in a sense that I think was defined really elegantly by Joe Carlsmith in “Paths and waystations in AI safety” [1]
building naturally from there, forecasting systems could be specifically applied to DAID and DPD; I know this is a little “ouroboros” to suggest but I think it works
Coordination-enabling:
movement building research and macro-strategy, AI-fueled activism, political coalition building, AI research into and tools for strengthening democracy
auto research into deliberative mini-publics, improved voting systems (eg. ranked choice, liquid democracy, quadratic voting, anti-gerrymandering solutions), secure digital voting platforms, improved checks and balances (eg. strong multi-stakeholder oversight, whistleblower protections, human rights), non-censorship oriented solutions to misinformation
Risk-targeting:
I know it is not the main thrust of “existential security”, but I think it worth considering the potential for “abundant cognition” to welfare / sentience research (eg. bio and AI). This seems really important from a lot of perspectives, for a lot of reasons:
AI Safety might be worse if the AIs are “discontent”
we could lock in a future where most people are suffering terribly which would not count as existential security
it seems worthwhile to know if the AI workers are suffering ASAP for normal “avoid doing moral catastrophes” reasons
we could unlock huge amounts of welfare or learn to avoid huge amount of pain; (cf. “hedonium” or the Far Out Initiative)
That said, I have not really considered the offense / defense balance here. We may discover how to simulate suffering for much cheaper than pleasure or something horrendous like that. Or there might be info hazards. This space seems so high stakes and hard to chart.
Some mix:
Certain forms of monitoring and openly researching other people’s actions seem like a mix of epistemics and coordination. For example, I had listed some stuff about ie. AI for broadly OSINT-based investigative journalism, AI lab watch, legislator scorecards, and similar. These are kind of information for the sake of coordination.
I know I included some moonshots. This all depends on what AI systems we are talking about and what they are actually helpful with I guess. I would hate for EA to bet too hard on any of this stuff and accidentally flood the zone of key areas with LLM “slop” or whatever.
Also, to state the obvious, there may be some risk of correlated exposure if you pin too much of your existential security with the crucial aid of unreliable, untrustworthy AIs. Maybe Hal 9000 isn’t always the entity to trust with your most critical security.
Lots to think about here! Thanks!
Joe Carlsmith: “Risk evaluation tracks the safety range and the capability frontier, and it forecasts where a given form of AI development/deployment will put them.
Paradigm examples include:
evals for dangerous capabilities and motivations;
forecasts about where a given sort of development/deployment will lead (e.g., via scaling laws, expert assessments, attempts to apply human and/or AI forecasting to relevant questions, etc);
general improvements to our scientific understanding of AI
structured safety cases and/or cost-benefit analyses that draw on this information.”
Thanks for this, I hadn’t thought much about the topic and agree it seems more neglected than it should be. But I am probably overall less bullish than you (as operationalised by e.g. how many people in the existential risk field should be making this a significant focus: I am perhaps closer to 5% than your 30% at present).
I liked your flowchart on ‘Inputs in the AI application pipeline,’ so using that framing:
Learning algorithms: I agree this is not very tractable for us[1] to work on.
Training data: This seems like a key thing for us to contribute, particularly at the post-training stage. By supposition, a large fraction of the most relevant work on AGI alignment, control, governance, and strategy has been done by ‘us’. I could well imagine that it would be very useful to get project notes, meetings, early drafts etc as well as the final report to train a specialised AI system to become an automated alignment/governance etc researcher.
But my guess is just compiling this training data doesn’t take that much time. All it takes is when the time comes you convince a lot of the relevant people and orgs to share old google docs of notes/drafts/plans etc paired with the final product.
There will be a lot of infosec considerations here, so maybe each org will end up training their own AI based on their own internal data. I imagine this is what will happen for a lot of for-profit companies.
Making sure we don’t delete old draft reports and meeting notes and things seems good here, but given storing google docs is so cheap and culling files is time-expensive, I think by default almost everyone just keeps most of their (at least textual) digital corpus anyway. Maybe there is some small intervention to make this work better though?
Compute: It certainly seems great for more compute to be spent on automated safety work versus automated capabilities work. But this is mainly a matter of how much money each party has to pay for compute. So lobbying for governments to spend lots on safety compute, or regulations to get companies to spend more on safety compute seems good, but this is a bit separate/upstream from what you have in mind I think, it is more just ‘get key people to care more about safety’.
Post-training enhancements: we will be very useful for providing RLHF to tell a budding automated AI safety researcher how good each of its outputs is. Research taste is key here. This feels somewhat continuous with just ‘managing a fleet of AI research assistants’.
UI and complementary technologies: I don’t think we have a comparative advantage here, and can just outsource this to human or AI contractors to build nice apps for us, or use generic apps on the market and just feed in our custom training data.
In terms of which applications to focus on, my guess is epistemic tools and coordination-enabling tools will mostly be built by default (though of course as you note additional effort can still speed them up some). E.g. politicians and business leaders and academics would all presumably love to have better predictions for which policies will be popular, what facts are true, which papers will replicate etc. And negotiation tools might be quite valuable for e.g. negotiating corporate mergers and deals.
So my take is that probably a majority of the game here is in ‘automated AI safety/governance/strategy’ because there will be less corporate incentive here, and it is also our comparative advantage to work on.
Overall, I agree differential AI tool development could be very important, but think the focus is mainly on providing high-quality training data and RLHF for automated AI safety research, which is somewhat narrower than what you describe.
I’m not sure how much we actually disagree though, would be interested in your thoughts!
Throughout, I use ‘us’ to refer broadly to EA/longtermist/existential security type folks.
UI and complementary technologies: I’m sort of confused about your claim about comparative advantage. Are you saying that there aren’t people in this community whose comparative advantage might be designing UI? That would seem surprising.
More broadly, though:
I’m not sure how much “we can just outsource this” really cuts against the core of our argument (how to get something done is a question of tactics, and it could still be a strategic priority even if we just wanted to spend a lot of money on it)
I guess I feel, though, that you’re saying this won’t be a big bottleneck
I think that that may be true if you’re considering automated alignment research in particular. But I’m not on board with that being the clear priority here
Yes, I suppose I am trying to divide tasks/projects up into two buckets based on whether they require high context and value-alignment and strategic thinking and EA-ness. And I think my claim was/is that UI design is comparatively easy to outsource to someone without much of the relevant context and values. And therefore the comparative advantage of the higher-context people is to do things that are harder to outsource to lower-context people. But I know ~nothing about UI design, maybe being higher context is actually super useful.
Compute allocation: mostly I think that “get people to care more” does count as the type of thing we were talking about. But I think that it’s not just caring about safety, but also being aware ahead-of-time of the role that automated research may have to play in this, and when it may be appropriate to hit the gas and allocate a lot of compute to particular areas.
Which applications to focus on: I agree that epistemic tools and coordination-enabling tools will eventually have markets and so will get built at some point absent intervention. But this doesn’t feel like a very strong argument—the whole point is that we may care about accelerating applications even if it’s not by a long period. And I don’t think that these will obviously be among the most profitable applications people could make (especially if you can start specializing to the most high-leverage epistemic and coordination tools).
Also, we could make a similar argument that “automated safety” research won’t get dropped, since it’s so obviously in the interests of whoever’s winning the race.
Training data: I agree that the stuff you’re pointing to seems worthwhile. But I feel like you’ve latched onto a particular type of training data, and you’re missing important categories, e.g.:
Epistemics stuff—there are lots of super smart people earnestly trying to figure out very hard questions, and I think that if you could access their thinking, there would be a lot there which would compare favourably to a lot of the data that would be collected from people in this community. It wouldn’t be so targeted in terms of the questions it addressed (e.g. “AI strategy”, but learning good epistemics may be valuable and transfer over)
International negotiation, and high-stakes bargaining in general—potentially very important, but not something I think our community has any particular advantage at
Synthetic data—a bunch of things may be unlocked more by working out how to enable “self-play” (or the appropriate analogue), rather than just collecting more data the hard way
Yeah I think I agree with all this; I suppose since ‘we’ have the AI policy/strategy training data anyway that seems relatively low effort and high value to do, but yes if we could somehow get access to the private notes of a bunch of international negotiators that also seems very valuable! Perhaps actually asking top forecasters to record their working and meetings to use as training data later would be valuable, and I assume many people already do this by default (tagging @NunoSempere). Although of course having better forecasting AIs seems more dual-use than some of the other AI tools.
I just remembered another sub-category that seems important to me: AI-enabled very accurate lie detection. This could be useful for many things, but most of all for helping make credible commitments in high-stakes US-China ASI negotiations.