AI governance and strategy: a list of research agendas and work that could be done.

AI governance and strategy: a list of research agendas and work that could be done

This document was written by Nathan Barnard and Erin Robertson.

We have compiled a list of research agendas in AI governance, and we’ve written some possible questions that people could work on. Each section contains an explanation for why this theme might be relevant for existential risk and longtermist focussed governance, followed by a short description of past work. We propose some questions for each theme, but we prioritise clarity over completeness.

The content is focussed on questions which seem most important to Nathan personally, with a focus on questions which seem most useful on the margin. We have drawn often on other people’s lists, in an attempt to represent a more consensus view. Neither Erin nor Nathan have ever held AI governance research or implementation positions, and Nathan has been an independent researcher for less than a year.

A theme throughout these questions is that Nathan thinks it would be useful to try to get more high quality empirical work. A good example is this paper on policy persistence and policy windows from Freitas-Groff that has credible causal estimates of how persistent policy is, which he thinks is a really important question for prioritisation of different interventions in AI governance.


This document is divided into topics, each includes:

  1. A brief discussion on theory of change: why might this work be useful?

  2. Examples of past work in this domain: who’s working on this?

  3. 2-4 questions which people can answer with some description.

Topics:

AI regulation and other standard tools

Compute governance

Corporate governance

International governance

Misuse

Evals

China

Information security

Strategy and forecasting

Post TAI/​ASI/​AGI governance

AI regulation and other standard tools

Theory of change

Historically, government regulation has been successful at reducing accident rates in other potentially dangerous industries—for instance air travel, nuclear power plants, finance and pharmaceuticals. It’s plausible that similar regulatory action could reduce risks from powerful AI systems.

Past work

A dominant paradigm right now is applying the standard tools of technology regulation—and other non-regulatory means of reducing harm from novel and established technologies—to AI. This paradigm seems particularly important right now because of the extensive interest—and action—from governments on AI. Specifically, there was a recent Biden Executive order (EO) on AI instructing the executive branch[1] to take various regulatory and quasi-regulatory actions, on AI. The EU AI act has been passed, but there are now many open questions on how the act will be enacted, and the UK is, in various ways, creating its AI regulatory policy. Thus far there has been a lot of work looking at case studies of particular regulatory regimes, and work looking deeply into the mechanics of US government and how this could matter for AI regulation.

Questions

  • Systematic statistical work estimating the effects of regulatory and quasi-regulatory interventions:

  • This paper looks at case studies of non-legally-binding standards from other industries but makes no attempt at statistically estimating the effect of non-legally binding standards on accident rates—this could be very useful.

  • There have been many individual case studies on different regulatory regimes, but to my knowledge no statistical work trying to estimate the average effect size of, for instance US federal regulation, or the creation of a new regulatory agency on accident rates.

  • It looks likely that the default way in which the US regulates AI is with NIST standards. It’s unclear how useful these standards are likely to be and statistical work estimating the average reduction in accident rates from NIST standards would shed light on this question.

There are of course reasons why this statistical work hasn’t been done—it’s hard to get good data on these questions, and it’ll be difficult to credibly estimate the causal effect of these interventions because it’s hard to do natural experiments. Nevertheless, we think it’s worth some people trying hard to make progress on these questions—without this kind of statistical work we should be extremely uncertain about how useful standards that don’t have the force of law will be in particular.

UK specific questions

  • Could the UK establish a new regulator for AI (similar to the Financial Conduct Authority or Environment Agency)? What structure should such an institution have? This question may be especially important because the UK civil service tends to hire generalists, in a way which could plausibly make UK AI policy substantially worse.

US specific questions

There are many, many questions on the specifics of US interventions and lots of work being done on them. In listing these questions I’ll try to avoid duplication of work that’s already been done or is in the process of being done.

  • Can the Department of Homeland Security (DHS) be improved and is this tractable? Under the recent Biden EO, DHS took on an important role in AI regulation and we expect this to continue given the placement of WMD and cybersecurity in DHS. The DHS has also—since its creation—in 2001 been rated as one of the worst performing parts of the US government and it’s very unclear if this is likely to change, how much it matters for AI, and whether these are tractable interventions to improve DHS.

  • What should we expect the risk tolerance of the civilian officials in the Department of Defence (DoD) to be in the case where the DoD plays a large role in AI development. My strong impression from the history of Nuclear weapons in the US is that uniformed officials are willing to take large risks in exchange for relatively small increases in the chance that the US would win a war, but it’s not clear that this is the case for the civilians in the DoD.

  • Liability law is a tool used to try to internalise the costs of harm caused by firms and has been proposed as a tool to try to reduce harms from AI. We haven’t found good empirical work estimating the effect of changes in liability law on accident rates in industries plausibly similar to AI—such as oil—where there are large but rare accidents. We don’t expect this work to be done by law professors, but do expect knowledge both of law and statistics and particularly causal inference.

EU specific questions

We are not particularly plugged in to what’s happening in the EU, so take these questions with a big pinch of salt. This website has very useful summaries of the various parts of the EU AI act.

  • What opportunities for lobbying from tech firms will there be in the EU standard setting process? For instance, will they by default be consulted during the stakeholder engagement?

  • When are the key times for civil society bodies to engage with the EU standard setting process? Are there specific dates when AI safety orgs should have documents ready?

  • How far will the EU defer to international standards settings bodies on AI standards?

  • How will the AI board function and who is likely to be appointed to it? Are there other similar boards within the EU?

  • Which are the most important national standards bodies for EU standard setting?

  • Where is the EU AI act good and bad for reducing Xrisk?

Compute governance

Theory of change

Much compute governance work hinges on the assumption that access to compute (often GPUs) is crucial for development of frontier AI systems. Policymakers may be able to influence the pace and direction of development by controlling or tracking these resources. A good example of this is US export controls on chips to China.

We expect that the role of chips may shift in the coming years, and they may be less clearly a bottleneck to development. It may be that the bottleneck shifts to algorithmic progress or financial constraints of firms. We expect that work imagining possible failure modes of compute governance, and possible alternative constraints on progress will be helpful, since work of this kind is neglected. It’s worth noting that it’s only recently that training costs from computation have reached the level to put training models out of reach for semi-large firms, but prior to this there were only a small number of leading labs. This implies that something other than compute is the main constraint on the ability of labs to produce frontier models, which this paper lends some support to. This implies that it’s plausible that theories of change that rely on controlling who has access to leading node AI chips might not be very effective.

Past work

Some examples of past work include;

  • Putting “switch off” capability into chips which can stop dangerous situations before they get out of hand, investigated here.

  • Tracking chips, so that state actors are able to better enforce standards, which they might impose on models above a specified number of floating point operations per second (FLOP/​s), investigated here.

This recent paper on compute governance has an appendix with broad research directions at the end, and we encourage interested readers to draw from it for research ideas focused on advancing the compute governance agenda.

Questions

These questions are quite idiosyncratic and focused on examining the failure modes of compute governance.

  • How feasible would it be for China to implement export controls for chips to the US.

  • Currently the US has imposed aggressive export controls, in collaboration with the Netherlands and Japan (both very important countries in the semiconductor supply chain) on both advanced node chips and inputs needed to make advanced node chips in China. China has control of a large share of the supply chain of some of the raw materials needed for chip production. It’s currently unclear how feasible it would be for China to impose extremely aggressive export controls on these raw materials, and the degree to which this could be used by China to negotiate for less strict export controls by the US and allies.

  • What is the likely impact of the US export controls of chips to China? See if it’s possible to work out how Chinese companies reacted to the controls, and if the implementation has affected progress.

  • Explore possible other bottlenecks to development of frontier models, possibly focussing on projected financial constraints, or projected algorithmic constraints, or data. Make some systematic survey- one useful method for this would be to quickly get some answers from experts on relevant questions, and see where people’s intuitions are.

  • Algorithmic efficiency. It may be possible to map relevant algorithmic gains over the past few years, and may be possible to make projections or observations from this data.

  • Supply chain: Map the supply chain for chips, using trade data, industry reports, and academic literature. Attempt to assess the strategic impact of this info.

Corporate governance

Corporate governance refers to the policies, procedures, and structures by which an organisation is controlled, aiming to align the organisation’s objectives with various stakeholders including shareholders, employees, and society.

Corporate governance allows individuals and leadership within a company to be held accountable for their decisions and the effects they have.

Theory of change

There are a few reasons why corporate governance can be a valuable alongside good regulation:

  • Frontier AI organisations have a greater understanding of their own operations and more resources than regulators.

  • Senior leadership of firms have limited information on everything that goes on in the firm. Therefore, strong communication channels and systems of oversight are needed to effectively manage risks.

  • The good intentions of individuals in a company don’t reliably mean good outcomes of the organisation as a whole. So there is a need for well designed processes to allow those good intentions to be acted on.

Past work

This research agenda from Matt Wearden is a great place to look for questions. The questions we mention here are focussed on trying to understand the track record of corporate governance.

  • This paper from Schuett et al looks at the design of AI ethics boards. There are lots of detailed questions of design of ethics boards—financing, legal structure ect—they seem important. I’d be interested though in case studies of cases where ethics boards have plausibly counterfactually prevented harm

  • Are there cases where it’s plausible that shareholders would sue AI firms that attempted to make safety enhancing decisions at the expense of profitability, or where there the threat of suit or even generic sense of responsibility to shareholders would meaningfully change the decisions that firms make around safety?

Responsible scaling policies (RSPs) are probably the most important corporate governance and risk management tool being used at AI labs. Responsible scaling policies are the practices that firms commit to undertake to ensure that they can appropriately manage the risks associated with larger models.

I (Nathan) personally think that RSPs are useful for three reasons:

  • A way for firms to prepare for the risks from scaling up models

  • Act as a way of reconciling the views of people with diverse views on AI safety where RSPs represent an agreement between factions to take costly, safety enhancing actions conditional of models displaying some dangerous capabilities

  • They are a way of firms tying their own hands so that they take costly actions to prevent risks from models (this is the most speculative of the three reasons)

Some questions that might be important around RSPs are:

  • How good is the theory of change for RSPs for preventing risks from misaligned power-seeking

  • What does a defence-in-depth way of doing RSPs

  • Do RSPs help with coordination between firms

  • How likely is it that RSPs become the basis for law

Industry led standards bodies are common across sectors, for instance finance has quite a lot of these self-regulatory bodies (SRB). We aren’t sure if these are effective at actually leading to lower accident rates in the relevant industries, and if so what the effect size, particularly in comparison to regulation. It could be really useful for someone to do a case study on a plausible case in which SRB reduced accidents in an industry and what the mechanism for this was.

International governance

Theory of change

It seems like there are two worlds where international governance is important.

  1. Firms are able to go to jurisdictions with laxer AI regulation, leading to a race to the bottom between jurisdictions.

  2. States become interested in developing powerful AI systems (along the lines of space race) and this leads to dangerous racing.

In both of these cases international agreements could improve coordination between jurisdictions and states to prevent competition that’s harmful to AI safety.

Past work

Most of the academic work on international governance has been aimed at the first theory of change.

Trager et al propose an international governance regime for civilian AI with a similar structure to international agreements around the verification of the rules of jurisdictions, similar to international agreements on aeroplane safety. Baker looks at nuclear arms control verification agreements as a model for an AI treaty. There has also been some excitement about CERN for AI, as this EA forum post explains, but little more formal work investigating the idea.

There has also been some work on racing between states, for instance this paper from Stafford et al.

Questions

Nathan is sceptical that international agreements on AI will matter very much and most of the questions are about investigating whether international agreements could solve important problems and could feasibly be strong enough to solve these important problems.

  • What are the distinguishing features of the Montreal Protocol on CFCs that made it so effective?

  • Similarly, why has the nuclear non-proliferation treaty been effective? In the sense that there hasn’t been much nuclear proliferation. What was the counterfactual of the treaty over merely the US and USSR extending their nuclear umbrellas to a large fraction of states that could plausibly get nuclear weapons? It seems like the treaty itself hasn’t mattered very much since states which have wanted nuclear weapons—like Pakistan, India and Israel—have just not ratified the treaty.

  • How effective has the Paris Climate accord been, in the light of a lack of enforcement provisions?

  • How effective have the Basel agreements on international finance been? Are there ways to make the application of treaties quicker given the length of time it took to both agree to and put into practice the various agreements from the Basel agreements?

  • What circumstances do we see regulatory arbitrage causing a race to the bottom? Nathan’s own research suggests that we don’t see this when central banks engage in stress testing, but it’s commonly believed that we have seen this in the ways in which firms set corporate tax rates. What determines this and what should we expect to see with AI?

Misuse

Nathan isn’t convinced there should be much focus on misuse risks from communities worried about existential risks from AI for two reasons:

  • National security establishments seem to have woken up to the risk of misuse from AI, particularly on the bio and cybersecurity side, and he expects them to be able to handle these risks effectively

  • He is more generally sceptical of the x-risk case from misuse.

We would be particularly interested in empirical work that tries to clarify how likely x-risk from misuse is. Some work in this vein that is extremely useful is this report from the forecasting research institute on how likely superforecasters think various forms of x-risk are, this EA forum post that looks at the base rates of terrorist attacks, and this report from RAND on how useful LLMs are for bioweapon production.

Some theoretical work that has really influenced Nathan’s thinking here is this paper from Aschenbrenner modelling how x-risk changes with economic growth. The core insight of the paper is that, even if economic growth initially increases x-risk due to new technologies, as societies get richer they get more willing to spend money on safety enhancing technologies, which can be used to force down x-risk.

Questions

Some empirical work that we think would be helpful here:

  • Changes in rates of terrorist attacks with changes in availability of the internet and quality of google search.

  • A high quality literature review on how rates of terrorist attacks and fatalities change with GDP—are richer countries able to spend money to reduce the risks of terrorist attacks?

  • The most commonly cited example of a homicidal terrorist group is Aum Shinrikyo. However I’ve also heard reports that instead of killing everyone, they wanted to start a war between Japan and the US which would have a lot of casualties although Japan would rise from the ashes of war.
    This might explain why, to my knowledge, they only attempted attacks with sarin and anthrax, neither of which are infectious diseases.
    If there are in fact no examples of homicidal terrorist groups, this would be a big update on the risks from AI misuse. David Thorstad has a good post on this already, but we think more work would still be useful.

  • Trying to understand why terrorists don’t seem to use cyberattacks very often.
    The model of advances in AI capabilities leading to large harms from misuse predicts lots of terrorist attacks that use malware, but Nathan’s understanding is that this isn’t the case. It would be useful to know why, and what evidence this provides on the misuse question.

Evals

Theory of change

Evals are a tool for assessing whether AI systems pose threats by trying to elicit potentially dangerous capabilities and misalignment from AI systems. This is a new field and there are many technical questions to tackle in it. The interested reader is encouraged to read this post on developing a science of evals from Apollo, a new organisation focused on evals.

The governance questions for evals are how evals fit into a broader governance strategy. See this paper from Apollo and this introduction to METR’s work. Evals also play a central part in the UK government’s AI regulation strategy. See box 5 of the UK government’s recent white paper for questions the UK government has, many of which relate to evals.

Questions

Some particular questions we are interested in are:

  • How can we ensure that evals are not gamed, in a broadly similar way to Volkswagen gas emissions tests (I think this would make a great case study,) or the risk ratings given by credit ratings agencies prior to the financial crisis?

  • How should information from evals be made public? Information design literature could be informative here, e.g this paper on the release of stress testing information

  • How important is having in-house government capacity for doing evals? How could this be built?

  • How likely is it that evals lead to a false sense of security? The winner’s curse could be relevant here.

  • How should evals firms and governments act when they suspect that firms are trying to cheat their evals?

China

Theory of change

China questions are some of the most crucial strategic questions on AI. There seem to be two big ways in which China questions matter:

  1. How likely is it that a Chinese lab develops TAI that causes an existential catastrophe? Does this mean that we should be more reluctant to adopt measures that slow down AI in the US and allies?

  2. How likely is there to be an AI arms race between the US and China?

There are three sub questions to the first question that I’m really interested in:

  1. How likely is it that a Chinese lab is able to develop TAI before a US lab?

  2. What alignment measures are Chinese labs likely to adopt?

  3. How long should we expect it to take Chinese labs to catch up to US labs once a US lab has developed TAI?

Crucial context to here is the export controls adopted by the Biden administration in 2022, and updated in 2023, which aim to maximise the distance between leading node production in the US and allies and leading node production in China, combined with a more narrow aim of specifically restricting the technology that the Chinese military has access to.

Past work

There’s lots of great work both on the export controls, the Chinese AI sector, and Chinese semiconductor manufacturing capabilities. Interested readers are encouraged to take part in the forecasting tournament on Chinese semiconductor manufacturing capabilities.

Questions

Information security

Theory of change

It seems like there are three theories of change for why infosec could matter a lot

  1. If firms are confident that they won’t have important technical information stolen they’ll race less against other firms

  2. Preventing non-state actors from gaining access to model weights might be really important for preventing misuse

  3. Preventing China from gaining access to model weights and or other technical information might be important for maintaining an AI lead for the US and allies

All of these theories of change seem plausible, but we haven’t seen any work that has really tried to test these theories of change using case studies or historical data and it would be interesting to see this sort of work.

There’s some interesting work to be done on non-infosec ways of deterring cyberattacks. It may also turn out that AI makes cyberattacks very easy to conduct technically, so the way to deter cyberattacks is with very aggressive reprisals against groups found to be conducting cyberattacks, combined with an international extradition treaty for cybercriminals.

Past work

Questions

All of these questions will be social science questions rather than technical questions—this is not all meant to imply that technical infosec questions aren’t important, just that we are completely unqualified to write interesting technical infosec questions.

  • Are there any examples of firms taking fewer safety precautions out of fear that they could be hacked?

  • Are there examples of high value targets that we’re confident haven’t been hacked e.g US nuclear command and control? This is a good list of high-stakes breaches (hasn’t been updated since 2020, some more recent ones are listed here. This podcast episode has some relevant information both for this question and for infosec for AI more generally

  • How effective are national security services at deterring cyberattacks?

  • Are there examples of security services aiding private firms in their cyber defence and how effective has that been? How likely is it that this would be extended to AI firms?

  • My understanding is that state actors—particularly the US and the UK—have much much better offensive cyber capabilities than other actors. Why is this the case? Do we expect it to stay this way as AI gets more capable? We don’t have a very good theory of change for this question, but it seems like an important and surprising facet of the infosec landscape.

Strategy and forecasting

Theory of change

Anticipating the speed at which developments will occur, and understanding the levers, is likely very helpful for informing high-level decision making.

Past work

There’s a risk with strategy and forecasting that it’s easy to be vague or use unscientific methodology, which is why recent commentary has suggested it’s not a good theme for junior researchers to work on. There’s some merit to this view, and we’d encourage junior researchers to try especially hard to seek out empirical or otherwise solid methodology if they’d like to make progress on this theme.

Epoch is an AI forecasting organisation which focuses on compute. Their work is excellent because they focus on empirical results or on extending standard economic theory. Other strategy work with solid theoretical grounding includes Tom Davidson’s takeoff speed report, Halperin et al’s work on using interest rates to forecast AI, and Cotra’s bio anchors report.

Lots of the strategy work thus far—on AI timelines and AI takeoff speeds—is compute centric. This means that a core assumption of much of this work is that AI progress can be converted into a common currency of compute—the assumption here is that if you throw enough compute at today’s data and algorithms you can get TAI.

Recently there’s been quite a lot of focus on work on the economic and scientific impact of LLMs, for instance see this post and this post from Open Philanthropy calling for this kind of work.

Questions

  • Actually trying to get to the bottom of why essentially all economists are so sceptical of explosive growth from AI. Some great work on this topic is this paper from Erdil and Besiroglu, and this debate between Clancy and Besiroglu. Unlike the other questions raised in this post, this one is quite crowded, but seems extremely valuable.

  • Progress from algorithmic vs data vs compute improvement on scientific and game playing tasks (e.g chess,) similar to this Epoch paper looking at this question in vision models.

  • Time series modelling of AI investment.

  • Human capital as a bottleneck on AI progress. This paper from CSET is highly relevant here and suggests that AI researchers at least think that the main bottleneck to progress is human capital. Nathan also thinks that human capital is an underexplored governance lever.

  • Economic history of integration of general purpose technologies into the economy, particularly looking at how long they took to (counterfactually) increase the rate of scientific progress.

Post TAI/​ASI/​AGI governance

Theory of change

Lots of people think that transformative AI is coming in the next few decades. Some people have defined this by “AGI”: an AI that can do everything a human can do but better. Some have defined it in terms of “TAI”: AI which significantly changes the economic growth rate such that global GDP grows X% each year, or so that scientific developments occur X% quicker. These changes may be abrupt, and may completely change the world in ways we can’t predict. Some work has been done to anticipate these changes, and to avert the worst outcomes. It’s becoming increasingly possible to do useful work under this theme, as some specific avenues for productive work have emerged. The hope is that anticipating the changes and the worst outcomes will help us have the appropriate mechanisms in place when things start getting weird.

Past work

This paper by Shulman and Bostrom on issues with digital minds is excellent, and this paper from O’Keefe et al looks specifically at the question of mechanisms to share the windfall from TAI.

A big challenge when trying to work on these kinds of questions is finding projects that are well-scoped, empirical or based in something with a well established theory like law or economics, while still plausibly being useful.

Questions

In light of this, here are some post TAI governance questions that could fulfil these criteria:

  • The two best resources for a list of questions in this area are this post by Holden Karnofsky and this post by Lukas Finnveden.

  • Digital minds: How might legal protections evolve for digital beings? In the UK, the recent sentience act gave some protections to octopuses based on a recognition of their sentience. This case study may prove especially relevant when imagining how legislation around digital beings may evolve.

  • Misinformation as a threat to democracy: Are there examples of misinformation causing important breakdowns in democratic functioning? What are the examples of this and why did it happen in those cases? Why, for instance, were the Protocols of the Elders of Zion influential in promoting antisemitism and how important were they in fact? Why is whatsapp misinformation a big deal in India but not in rich democracies? How big a deal has misinformation actually been in democracies? This 80K podcast has lots of great stuff on these questions.

  • Review of the democratic backsliding literature to try to identify if this literature tells us anything about how likely TAI is to cause democratic backsliding, particularly in the US

  1. ^

    Excluding independent agencies which the President doesn’t have direct control over