It has been a pleasure reading this, listening to your podcast episode, and trying to really think it through,
This reminds me of a few other things I have seen lately like Superalignment, Joe Carlsmithâs recent âAI for AI Safetyâ, and the recent 80,000 Hours Podcast with Will McAskill.
I really appreciate the âTools for Existential Securityâ framing. Your example applications were on point and many of them brought up things I hadnât even considered. I enjoy the idea of rapidly solving lots of coordination failures.
This sort of DAID approach feels like an interesting continuation on other ideas about differential acceleration and the vulnerable world hypothesis. Trying to get this right can feel like some combination of applied ethics and technology forecasting.
Probably one of the weirdest or most exciting applications you suggest is AI for philosophy. You put it under the âEpistemicsâ category. I usually think of epistemics as a sub-branch of philosophy, but I think I get what you mean. AI for this sort of thing remains exciting, but very abstract to me.
What a heady thing to think about; really exciting stuff! There is something very cosmic about the idea of using AI research and cognition for ethics, philosophy, and automated wisdom. (I have been meaning to read âWinners of the Essay competition on the Automation of Wisdom and Philosophyâ). I strongly agree that since AI comes with many new philosophically difficult and ethically complex questions, it would be amazing if we could use AI to face these.
The section on how to accelerate helpful AI tools was nice too.
Appendix 4 was gold. The DPD framing is really complimentary to the rest of the essay. I can totally appreciate the distinction you are making, but I also see DPD as bleeding into AI for Existential Safety a lot as well. Such mixed feelings. Like, for one thing, you certainly wouldnât want to be deploying whack AI in your âsave the worldâ cutting edge AI startup.
And it seems like there is a good case for thinking about doing better pre-training and finding better paradigms if you are going to be thinking about safer AI development and deployment a lot anyways. Maybe I am missing something about the sheer economics of not wanting to actually do pre-training ever.
In any case, I thought your suggestions around aiming for interpretable, robust, safe paradigms were solid. Paradigm-shaping and application-shaping are both interesting.
***
I really appreciate that this proposal is talking about building stuff! And that it can be done ~unilaterally. I think thatâs just an important vibe and an important type of project to have going.
I also appreciate that you said in the podcast that this was only one possible framing /â clustering. Although you also say âwe guess that the highest priority applications will fall into the categories listed aboveâ which seems like a potentially strong claim.
I have also spent some time thinking about which forms of ~research /â cognitive labor would be broadly good to accelerate for similar existential security reasons and I kind of tried to retrospectively categorize some notes I had made with your framing. I had some ideas that were hard to categorize cleanly into epistemics, coordination, or direct risk targeting.
I included a few more ideas for areas where AI tools, marginal automated research, and cognitive abundance might be well applied. I was going for a similar vibe, so Iâm sorry if I overlap a lot. I will try to only mention things you didnât explicitly suggest:
Epistemics:
you mention bench-marking as a strategy for accelerating specific AI applications, but it also deserves mention as an epistemic tool. METR style elicitation tests too
I should say up front that I donât know if, due to acceleration and iteration effects you mention, eg. FrontierMath and lastexam.ai are explicitly ârace-dynamic-accelerating in a way that overshadows their epistemic usefulness; even METRâs task horizon metric could be accused here.
From a certain perspective, I would consider benchmarks and METR elicitation tests natural compliments to mech interp and AI capabilities forecasting
this would include capabilities and threats assessment (hopefully we can actively iterate downwards on risk assessment scores)
broad economics and societal impact research
the effects of having AI more or less âdo the economyâ seem vast and differentially accelerating understanding and strategy there seems like a non-trivial application relevant to the long term future of humanity
wealth inequality and the looming threat of mass unemployment (at minimum, this seems important for instability and coordination reasons even if one were too utilitarian /â long termist to care for normal reasons)
I think it would be good to accelerate âRisk evaluationâ in a sense that I think was defined really elegantly by Joe Carlsmith in âPaths and waystations in AI safetyâ[1]
building naturally from there, forecasting systems could be specifically applied to DAID and DPD; I know this is a little âouroborosâ to suggest but I think it works
Coordination-enabling:
movement building research and macro-strategy, AI-fueled activism, political coalition building, AI research into and tools for strengthening democracy
auto research into deliberative mini-publics, improved voting systems (eg. ranked choice, liquid democracy, quadratic voting, anti-gerrymandering solutions), secure digital voting platforms, improved checks and balances (eg. strong multi-stakeholder oversight, whistleblower protections, human rights), non-censorship oriented solutions to misinformation
Risk-targeting:
I know it is not the main thrust of âexistential securityâ, but I think it worth considering the potential for âabundant cognitionâ to welfare /â sentience research (eg. bio and AI). This seems really important from a lot of perspectives, for a lot of reasons:
AI Safety might be worse if the AIs are âdiscontentâ
we could lock in a future where most people are suffering terribly which would not count as existential security
it seems worthwhile to know if the AI workers are suffering ASAP for normal âavoid doing moral catastrophesâ reasons
we could unlock huge amounts of welfare or learn to avoid huge amount of pain; (cf. âhedoniumâ or the Far Out Initiative)
That said, I have not really considered the offense /â defense balance here. We may discover how to simulate suffering for much cheaper than pleasure or something horrendous like that. Or there might be info hazards. This space seems so high stakes and hard to chart.
Some mix:
Certain forms of monitoring and openly researching other peopleâs actions seem like a mix of epistemics and coordination. For example, I had listed some stuff about ie. AI for broadly OSINT-based investigative journalism, AI lab watch, legislator scorecards, and similar. These are kind of information for the sake of coordination.
I know I included some moonshots. This all depends on what AI systems we are talking about and what they are actually helpful with I guess. I would hate for EA to bet too hard on any of this stuff and accidentally flood the zone of key areas with LLM âslopâ or whatever.
Also, to state the obvious, there may be some risk of correlated exposure if you pin too much of your existential security with the crucial aid of unreliable, untrustworthy AIs. Maybe Hal 9000 isnât always the entity to trust with your most critical security.
Joe Carlsmith: âRisk evaluationtracks the safety range and the capability frontier, and it forecasts where a given form of AI development/âdeployment will put them.
Paradigm examples include:
evals for dangerous capabilities and motivations;
forecasts about where a given sort of development/âdeployment will lead (e.g., via scaling laws, expert assessments, attempts to apply human and/âor AI forecasting to relevant questions, etc);
general improvements to our scientific understanding of AI
structured safety cases and/âor cost-benefit analyses that draw on this information.â
I love to see stuff like this!
It has been a pleasure reading this, listening to your podcast episode, and trying to really think it through,
This reminds me of a few other things I have seen lately like Superalignment, Joe Carlsmithâs recent âAI for AI Safetyâ, and the recent 80,000 Hours Podcast with Will McAskill.
I really appreciate the âTools for Existential Securityâ framing. Your example applications were on point and many of them brought up things I hadnât even considered. I enjoy the idea of rapidly solving lots of coordination failures.
This sort of DAID approach feels like an interesting continuation on other ideas about differential acceleration and the vulnerable world hypothesis. Trying to get this right can feel like some combination of applied ethics and technology forecasting.
Probably one of the weirdest or most exciting applications you suggest is AI for philosophy. You put it under the âEpistemicsâ category. I usually think of epistemics as a sub-branch of philosophy, but I think I get what you mean. AI for this sort of thing remains exciting, but very abstract to me.
What a heady thing to think about; really exciting stuff! There is something very cosmic about the idea of using AI research and cognition for ethics, philosophy, and automated wisdom. (I have been meaning to read âWinners of the Essay competition on the Automation of Wisdom and Philosophyâ). I strongly agree that since AI comes with many new philosophically difficult and ethically complex questions, it would be amazing if we could use AI to face these.
The section on how to accelerate helpful AI tools was nice too.
Appendix 4 was gold. The DPD framing is really complimentary to the rest of the essay. I can totally appreciate the distinction you are making, but I also see DPD as bleeding into AI for Existential Safety a lot as well. Such mixed feelings. Like, for one thing, you certainly wouldnât want to be deploying whack AI in your âsave the worldâ cutting edge AI startup.
And it seems like there is a good case for thinking about doing better pre-training and finding better paradigms if you are going to be thinking about safer AI development and deployment a lot anyways. Maybe I am missing something about the sheer economics of not wanting to actually do pre-training ever.
In any case, I thought your suggestions around aiming for interpretable, robust, safe paradigms were solid. Paradigm-shaping and application-shaping are both interesting.
***
I really appreciate that this proposal is talking about building stuff! And that it can be done ~unilaterally. I think thatâs just an important vibe and an important type of project to have going.
I also appreciate that you said in the podcast that this was only one possible framing /â clustering. Although you also say âwe guess that the highest priority applications will fall into the categories listed aboveâ which seems like a potentially strong claim.
I have also spent some time thinking about which forms of ~research /â cognitive labor would be broadly good to accelerate for similar existential security reasons and I kind of tried to retrospectively categorize some notes I had made with your framing. I had some ideas that were hard to categorize cleanly into epistemics, coordination, or direct risk targeting.
I included a few more ideas for areas where AI tools, marginal automated research, and cognitive abundance might be well applied. I was going for a similar vibe, so Iâm sorry if I overlap a lot. I will try to only mention things you didnât explicitly suggest:
Epistemics:
you mention bench-marking as a strategy for accelerating specific AI applications, but it also deserves mention as an epistemic tool. METR style elicitation tests too
I should say up front that I donât know if, due to acceleration and iteration effects you mention, eg. FrontierMath and lastexam.ai are explicitly ârace-dynamic-accelerating in a way that overshadows their epistemic usefulness; even METRâs task horizon metric could be accused here.
From a certain perspective, I would consider benchmarks and METR elicitation tests natural compliments to mech interp and AI capabilities forecasting
this would include capabilities and threats assessment (hopefully we can actively iterate downwards on risk assessment scores)
broad economics and societal impact research
the effects of having AI more or less âdo the economyâ seem vast and differentially accelerating understanding and strategy there seems like a non-trivial application relevant to the long term future of humanity
wealth inequality and the looming threat of mass unemployment (at minimum, this seems important for instability and coordination reasons even if one were too utilitarian /â long termist to care for normal reasons)
I think it would be good to accelerate âRisk evaluationâ in a sense that I think was defined really elegantly by Joe Carlsmith in âPaths and waystations in AI safetyâ [1]
building naturally from there, forecasting systems could be specifically applied to DAID and DPD; I know this is a little âouroborosâ to suggest but I think it works
Coordination-enabling:
movement building research and macro-strategy, AI-fueled activism, political coalition building, AI research into and tools for strengthening democracy
auto research into deliberative mini-publics, improved voting systems (eg. ranked choice, liquid democracy, quadratic voting, anti-gerrymandering solutions), secure digital voting platforms, improved checks and balances (eg. strong multi-stakeholder oversight, whistleblower protections, human rights), non-censorship oriented solutions to misinformation
Risk-targeting:
I know it is not the main thrust of âexistential securityâ, but I think it worth considering the potential for âabundant cognitionâ to welfare /â sentience research (eg. bio and AI). This seems really important from a lot of perspectives, for a lot of reasons:
AI Safety might be worse if the AIs are âdiscontentâ
we could lock in a future where most people are suffering terribly which would not count as existential security
it seems worthwhile to know if the AI workers are suffering ASAP for normal âavoid doing moral catastrophesâ reasons
we could unlock huge amounts of welfare or learn to avoid huge amount of pain; (cf. âhedoniumâ or the Far Out Initiative)
That said, I have not really considered the offense /â defense balance here. We may discover how to simulate suffering for much cheaper than pleasure or something horrendous like that. Or there might be info hazards. This space seems so high stakes and hard to chart.
Some mix:
Certain forms of monitoring and openly researching other peopleâs actions seem like a mix of epistemics and coordination. For example, I had listed some stuff about ie. AI for broadly OSINT-based investigative journalism, AI lab watch, legislator scorecards, and similar. These are kind of information for the sake of coordination.
I know I included some moonshots. This all depends on what AI systems we are talking about and what they are actually helpful with I guess. I would hate for EA to bet too hard on any of this stuff and accidentally flood the zone of key areas with LLM âslopâ or whatever.
Also, to state the obvious, there may be some risk of correlated exposure if you pin too much of your existential security with the crucial aid of unreliable, untrustworthy AIs. Maybe Hal 9000 isnât always the entity to trust with your most critical security.
Lots to think about here! Thanks!
Joe Carlsmith: âRisk evaluation tracks the safety range and the capability frontier, and it forecasts where a given form of AI development/âdeployment will put them.
Paradigm examples include:
evals for dangerous capabilities and motivations;
forecasts about where a given sort of development/âdeployment will lead (e.g., via scaling laws, expert assessments, attempts to apply human and/âor AI forecasting to relevant questions, etc);
general improvements to our scientific understanding of AI
structured safety cases and/âor cost-benefit analyses that draw on this information.â