Scrappy note on the AI safety landscape. Very incomplete, but probably a good way to get oriented to (a) some of the orgs in the space, and (b) how the space is carved up more generally.
(A) Technical
(i) A lot of the safety work happens in the scaling-based AGI companies (OpenAI, GDM, Anthropic, and possibly Meta, xAI, Mistral, and some Chinese players). Some of it is directly useful, some of it is indirectly useful (e.g. negative results, datasets, open-source models, position pieces etc.), and some is not useful and/or a distraction. It’s worth developing good assessment mechanisms/instincts about these.
(ii) A lot of safety work happens in collaboration with the AGI companies, but by individuals/organisations with some amount of independence and/or different incentives. Some examples: METR, Redwood, UK AISI, Epoch, Apollo. It’s worth understanding what they’re doing with AGI cos and what their theories of change are.
(iii) Orgs that don’t seem to work directly with AGI cos but are deeply technically engaging with frontier models and their relationship to catastrophic risk: places like Palisade, FAR AI, CAIS. These orgs maintain even more independence, and are able to do/say things which maybe the previous tier might not be able to. A recent cool thing was CAIS finding that models don’t do well on remote work tasks—only 2.5% of tasks—in contrast to OpenAI’s findings in GDPval suggests models have an almost 50% win-rate against industry professionals on a suite of “economically valuable, real-world tasks” tasks.
(iv) Orgs that are pursuing other* technical AI safety bets, different from the AGI cos: FAR AI, ARC, Timaeus, Simplex AI, AE Studio, LawZero, many independents, some academics at e.g. CHAI/Berkeley, MIT, Stanford, MILA, Vector Institute, Oxford, Cambridge, UCL and elsewhere. It’s worth understanding why they want to make these bets, including whether it’s their comparative advantage, an alignment with their incentives/grants, or whether they’re seeing things that others haven’t been able to see yet. (*Some of the above might be pursuing similar bets to AGI cos but with fewer resources or with increased independence etc.)
(v) Orgs pursuing non-software technical bets: e.g. FlexHEG, TamperSec
(B) Non-technical or less technical, but still aimed (or could be aimed) at directly** working the problem
(i) Orgs that do more policy-focussed/outreach/advocacy/other-non-technical things: e.g. MIRI, CAIS, RAND, CivAI, FLI, Safe AI forum, SaferAI, EU AI office, CLTR, GovAI, LawAI, CSET, CSER
(ii) AGI cos policy and governance teams, e.g. the RSP teams, the government engagement teams, and maybe even some influence and interaction with product teams and legal departments.
** “directly” here means something like “make a strong case to delay the development of AGI giving us more time to technically solve the problem”, a first-order effect, rather than something like “fund someone who can make a case to delay...”, which is a higher order effect
(i) Direct talent development: Constellation, Kairos, BlueDot, ARENA, MATS, LASR, Apart Research, Tarbell, etc. These orgs aim to increase the number of people going into above categories or speed them up. They don’t usually (aim to) work directly on the problem, but sometimes incidentally do (e.g. via high quality outputs from MATS). There can be a multiplier effect for working in such orgs.
(ii) Infra: Constellation, FAR AI, Mox, LISA
(iii) Incubators: e.g. Seldon Labs, Constellation, Catalyze, EF, Fifty-Fifty
(D) Moving money
(i) Non-profit/philanthropic donors: e.g. OpenPhil, SFF, EA Funds, LongView, Schmidt Futures
(ii) Digital consciousness type-things: CLR, Eleos, NYU Center for Mind, Ethics, and Policy
(iii) Post-AGI futures: Forethought, MIT FutureTech
(F) For-profits trying to translate AI safety work into some kind of business model to validate research and possibly be well situated should more regulation mandate evals, audit, certifications etc.: e.g. Goodfire, Lakera, GraySwan, possibly dozens more startups + big professional services firms would be itching to get in on this when the regulations happen.
It is very worth investigating whether to work on any of these: The field is wide open and there are many approaches to pursue. “Defence in depth” (1, 2, 3) implies that there is work to be done across a lot of different attack surfaces, and so it’s maybe not so central to identify a singular best thing to work on; it’s enough to find something that has a plausible theory of change, that seems to be neglected and/or is patching some hole in a huge array of defences—we need lots of people/orgs/resources to help with finding and patching the countless holes!
“anyone” is a high bar! Maybe worth looking at what notable orgs mightwanttofund, as a way of spotting “useful safety work not covered by enough people”?
Scrappy note on the AI safety landscape. Very incomplete, but probably a good way to get oriented to (a) some of the orgs in the space, and (b) how the space is carved up more generally.
(A) Technical
(i) A lot of the safety work happens in the scaling-based AGI companies (OpenAI, GDM, Anthropic, and possibly Meta, xAI, Mistral, and some Chinese players). Some of it is directly useful, some of it is indirectly useful (e.g. negative results, datasets, open-source models, position pieces etc.), and some is not useful and/or a distraction. It’s worth developing good assessment mechanisms/instincts about these.
(ii) A lot of safety work happens in collaboration with the AGI companies, but by individuals/organisations with some amount of independence and/or different incentives. Some examples: METR, Redwood, UK AISI, Epoch, Apollo. It’s worth understanding what they’re doing with AGI cos and what their theories of change are.
(iii) Orgs that don’t seem to work directly with AGI cos but are deeply technically engaging with frontier models and their relationship to catastrophic risk: places like Palisade, FAR AI, CAIS. These orgs maintain even more independence, and are able to do/say things which maybe the previous tier might not be able to. A recent cool thing was CAIS finding that models don’t do well on remote work tasks—only 2.5% of tasks—in contrast to OpenAI’s findings in GDPval suggests models have an almost 50% win-rate against industry professionals on a suite of “economically valuable, real-world tasks” tasks.
(iv) Orgs that are pursuing other* technical AI safety bets, different from the AGI cos: FAR AI, ARC, Timaeus, Simplex AI, AE Studio, LawZero, many independents, some academics at e.g. CHAI/Berkeley, MIT, Stanford, MILA, Vector Institute, Oxford, Cambridge, UCL and elsewhere. It’s worth understanding why they want to make these bets, including whether it’s their comparative advantage, an alignment with their incentives/grants, or whether they’re seeing things that others haven’t been able to see yet. (*Some of the above might be pursuing similar bets to AGI cos but with fewer resources or with increased independence etc.)
(v) Orgs pursuing non-software technical bets: e.g. FlexHEG, TamperSec
(B) Non-technical or less technical, but still aimed (or could be aimed) at directly** working the problem
(i) Orgs that do more policy-focussed/outreach/advocacy/other-non-technical things: e.g. MIRI, CAIS, RAND, CivAI, FLI, Safe AI forum, SaferAI, EU AI office, CLTR, GovAI, LawAI, CSET, CSER
(ii) AGI cos policy and governance teams, e.g. the RSP teams, the government engagement teams, and maybe even some influence and interaction with product teams and legal departments.
** “directly” here means something like “make a strong case to delay the development of AGI giving us more time to technically solve the problem”, a first-order effect, rather than something like “fund someone who can make a case to delay...”, which is a higher order effect
(C) Field-building/Talent development/Physical infrastructure
(i) Direct talent development: Constellation, Kairos, BlueDot, ARENA, MATS, LASR, Apart Research, Tarbell, etc. These orgs aim to increase the number of people going into above categories or speed them up. They don’t usually (aim to) work directly on the problem, but sometimes incidentally do (e.g. via high quality outputs from MATS). There can be a multiplier effect for working in such orgs.
(ii) Infra: Constellation, FAR AI, Mox, LISA
(iii) Incubators: e.g. Seldon Labs, Constellation, Catalyze, EF, Fifty-Fifty
(D) Moving money
(i) Non-profit/philanthropic donors: e.g. OpenPhil, SFF, EA Funds, LongView, Schmidt Futures
(ii) VCs: e.g. Halcyon, Fifty-Fifty
For added coverage,
(E) Others
(i) Multipolar scenarios: CLR, ACS Prague, FOCAL (CMU), CAIF
(ii) Digital consciousness type-things: CLR, Eleos, NYU Center for Mind, Ethics, and Policy
(iii) Post-AGI futures: Forethought, MIT FutureTech
(F) For-profits trying to translate AI safety work into some kind of business model to validate research and possibly be well situated should more regulation mandate evals, audit, certifications etc.: e.g. Goodfire, Lakera, GraySwan, possibly dozens more startups + big professional services firms would be itching to get in on this when the regulations happen.
It is very worth investigating whether to work on any of these: The field is wide open and there are many approaches to pursue. “Defence in depth” (1, 2, 3) implies that there is work to be done across a lot of different attack surfaces, and so it’s maybe not so central to identify a singular best thing to work on; it’s enough to find something that has a plausible theory of change, that seems to be neglected and/or is patching some hole in a huge array of defences—we need lots of people/orgs/resources to help with finding and patching the countless holes!
This is super helpful—do you feel like your overview even points at what potentially useful safety work is currently not covered by anyone?
“anyone” is a high bar! Maybe worth looking at what notable orgs might want to fund, as a way of spotting “useful safety work not covered by enough people”?
I notice you’re already thinking about this in some useful ways, nice. I’d love to see a clean picture of threat models overlaid with plans/orgs that aim to address them.
I think the field is changing too fast for any specific claim here to stay true in 6-12m.