AI Governance: Opportunity and Theory of Impact
AI governance concerns how humanity can best navigate the transition to a world with advanced AI systems.[1] It relates to how decisions are made about AI,[2] and what institutions and arrangements would help those decisions to be made well.
I believe advances in AI are likely to be among the most impactful global developments in the coming decades, and that AI governance will become among the most important global issue areas. AI governance is a new field and is relatively neglected. I’ll explain here how I think about this as a cause area and my perspective on how best to pursue positive impact in this space. The value of investing in this field can be appreciated whether one is primarily concerned with contemporary policy challenges or long-term risks and opportunities (“longtermism”); this piece is primarily aimed at a longtermist perspective. Differing from some other longtermist work on AI, I emphasize the importance of also preparing for more conventional scenarios of AI development.
Contemporary Policy Challenges
AI systems are increasingly being deployed in important domains: for many kinds of surveillance; by authoritarian governments to shape online discourse; for autonomous weapons systems; for cyber tools and autonomous cyber capabilities; to aid and make consequential decisions such as for employment, loans, and criminal sentencing; in advertising; in education and testing; in self-driving cars and navigation; in social media. Society and policy makers are rapidly trying to catch up, to adapt, to create norms and policies to guide these new areas. We see this scramble in contemporary international tax law, competition/antitrust policy, innovation policy, and national security motivated controls on trade and investment.
To understand and advise contemporary policymaking, one needs to develop expertise in specific policy areas (such as antitrust/competition policy or international security) as well as in the relevant technical aspects of AI. It is also important to build a community jointly working across these policy areas, as these phenomena interact, and are often driven by similar technical developments, involve similar tradeoffs, and benefit from similar insights. For example, AI-relevant antitrust/competition policy is shaping and being shaped by great power rivalry, and these fields benefit from understanding AI’s character and trajectory.
Long-term Risks and Opportunities
Longtermists are especially concerned with the long-term risks and opportunities from AI, and particularly existential risks, which are risks of extinction or other destruction of humanity’s long-term potential (Ord 2020, 37).
Superintelligence Perspective
Many longtermists come to the field of AI Governance from what we can call the superintelligence perspective, which typically focuses on the challenge of having an AI agent with cognitive capabilities vastly superior to those of humans. Given how important intelligence is—to the solving of our global problems, to the production and allocation of wealth, and to military power—this perspective makes clear that superintelligent AI would pose profound opportunities and risks. In particular, superintelligent AI could pose a threat to human control and existence that dwarfs our other natural and anthropogenic risks (for a weighing of these risks, see Toby Ord’s The Precipice).[3] Accordingly, this perspective highlights the imperative that AI be safe and aligned with human preferences/values. The field of AI Safety is in part motivated and organized to address this challenge. The superintelligence perspective is well developed in Nick Bostrom’s Superintelligence, Eliezer Yudkowsky’s writings (eg), Max Tegmark’s Life 3.0, and Stuart Russell’s Human Compatible. The superintelligence perspective is most illuminating under scenarios involving fast takeoff, such as via an intelligence explosion.
Problems of building safe superintelligence are made all the more difficult if the researchers, labs, companies, and countries developing advanced AI perceive themselves to be in an intense winner-take-all race with each other, since then each developer will face a strong incentive to “cut corners” so as to accelerate their development and deployment; this is part of the problem of managing AI competition. A subsequent governance problem concerns how the developer should institutionalize control over and share the bounty from its superintelligence; we could call this the problem of constitution design (for superintelligence), since the solution amounts to a constitution over superintelligence.
Work on these problems interact. Sometimes they are substitutes: progress on managing AI competition can lower the burden on AI safety, and vice versa. Sometimes they are complements. Greater insight into the strategic risks from AI competition could help us focus our safety work. Technical advances in, say, AI verification mechanisms could facilitate global coordination (see Toward Trustworthy AI). It is imperative that we work on all promising strands, and that these fields be in conversation with each other.
Ecology and GPT Perspectives
The superintelligence perspective illuminates a sufficient condition for existential risk from AI. However, it is not necessary, and it is often the target of criticism by those who regard it as making overly strong assumptions about the character of advanced AI systems. There are other perspectives which illuminate other risks and considerations. One we might call the AI ecology perspective: instead of imagining just one or several superintelligent agents vastly superior to all other agents, we can imagine a diverse, global, ecology of AI systems. Some may be like agents, but others may be more like complex services, systems, or corporations. These systems, individually or in collaboration with humans, could give rise to cognitive capabilities in strategically important tasks that exceed what humans are otherwise capable of. Hanson’s Age of Em describes one such world, where biological humans have been economically displaced by evolved machine agents, who exist in a Malthusian state; there was no discrete event when superintelligence took over. Drexler’s Comprehensive AI Services offers an ecological/services perspective on the future of AI, arguing that we are more likely to see many superhuman but narrow AI services (and that this would be easier to build safely), rather than an integrated agential general superintelligence.
Another, broadly mainstream, perspective regards AI as a general purpose technology (GPT), in some ways analogous to other GPTs like steam-power, electricity, or computers (the GPT perspective). Here we need not emphasize only agent-like AI or powerful AI systems, but instead can examine the many ways even mundane AI could transform fundamental parameters in our social, military, economic, and political systems, from developments in sensor technology, digitally mediated behavior, and robotics. AI and associated technologies could dramatically reduce the labor share of value and increase inequality, reduce the costs of surveillance and repression by authorities, make global market structure more oligopolistic, alter the logic of the production of wealth, shift military power, and undermine nuclear stability. Of the three, this perspective is closest to that expressed by most economists and policy analysts.
These perspectives are not mutually exclusive. For example, even if we are most concerned about risks from the superintelligence perspective, the GPT perspective may be valuable for anticipating and shaping the policy, economic, and geopolitical landscape in which superintelligence would emerge.
Misuse Risks, Accident Risks, Structural Risks
Many analyses of AI risks, including most of those adopting a superintelligence perspective, understand risk primarily through the lenses of misuse or accidents. Misuse occurs when a person uses AI in an unethical manner, with the clearest cases involving malicious intent. Accidents involve unintended harms from an AI system, which in principle the developers of the system could have foreseen or prevented. Both of these kinds of risk place responsibility on a person or group who could have averted the risk through better motivation, caution, or technical competence. These lenses typically identify the opportunity for safety interventions to be causally proximate to the harm: right before the system is deployed or used there was an opportunity for someone to avert the disaster through better motivation or insight.
By contrast, the ecology and especially GPT perspectives illuminate a broader lens of structural risks. When we think about the risks arising from the combustion engine—such as urban sprawl, blitzkrieg offensive warfare, strategic bombers, and climate change—we see that it is hard to fault any one individual or group for negligence or malign intent. It is harder to see a single agent whose behavior we could change to avert the harm, or a causally proximate opportunity to intervene. Rather, we see that technology can produce social harms, or fail to have its benefits realized, because of a host of structural dynamics. The impacts from technology may be diffuse, uncertain, delayed, and hard to contract over. Existing institutions are often not suited to managing disruption and renegotiating arrangements. To govern AI well, we need the lenses of misuse risks and accident risks, but also the lens of structural risks.
The more we see risks from the superintelligence perspective, in which a machine agent may achieve decisive strategic advantage, especially when it emerges from rapid self-improvement beginning sometime soon, the more it makes sense to invest our attention on the cutting edge of AI and AI safety. From this perspective, the priority is to focus on those groups who are most likely to incubate superintelligence, and help them to have the best culture, organization, safety expertise, insights, and infrastructure for the process to go well.
By contrast, the more we see risks from the ecology perspective, and especially the GPT and structural risk perspectives, the more we need to understand the AI safety and governance problems in a broad way. While these perspectives may still see a comparably high level risk, that risk is distributed over a broader space of scenarios. The opportunities for reducing risk are also similarly broadly distributed. These perspectives regard it as more likely that existing social systems will be critical in shaping outcomes, important phenomena to understand, and possible vehicles for positive impact. These perspectives see greater need for collaboration, amongst a larger set of areas within AI safety and governance, as well as with experts from the broader space of social science and policymaking.
People who are foremost concerned about existential risks often prioritize the superintelligence perspective, probably because it most describes novel, concrete, and causally proximate ways that humans could lose all power (and potentially go extinct). However, the ecology and GPT perspectives are also important for understanding existential risks. In addition to illuminating other existential risks, these perspectives can illuminate existential risk factors[4], which are factors that indirectly affect existential risk. A risk factor can be as important to focus on as a more proximate cause: when trying to prevent cancer, investing in policies to reduce smoking can be more impactful than investments in chemotherapy.
Concrete Pathways to Existential Risk
What are some examples of concrete pathways to existential risk, or existential risk factors, that are better illuminated from the ecology and GPT perspectives?
Nuclear Instability
Relatively mundane changes in sensor technology, cyberweapons, and autonomous weapons could increase the risk of nuclear war (SIPRI 2020). To understand this requires understanding nuclear deterrence, nuclear command and control, first strike vulnerability and how it could change with AI processing of satellite imagery, undersea sensors, social network analytics, cyber surveillance and weapons, and risks of “flash” escalation of autonomous systems.
Power Transitions, Uncertainty, and Turbulence
Technology can change key parameters undergirding geopolitical bargains. Technology can lead to power transitions, which induce commitment problems that can lead to war (Powell 1999; Allison 2017). Technology can shift the offense-defense balance, which can make war more tempting or amplify fear of being attacked, destabilizing international order (Jervis 1978; Garfinkel and Dafoe 2019). Technology can lead to a general turbulence—between countries, firms, and social groups—which can lead to a breakdown in social bargains, disruption in relationships, gambits to seize advantage, and decline in trust. All of this can increase the risk of a systemic war, and otherwise enfeeble humanity’s ability to act collectively to address global risks.
Inequality, Labor Displacement, Authoritarianism
The world could become much more unequal, undemocratic, and inhospitable to human labor, through processes catalyzed by advanced AI. These processes include global winner-take-all-markets, technological displacement of labor, and authoritarian surveillance and control. At the limit, AI could catalyze (global) robust totalitarianism. Such processes could lead to a permanent lock-in of bad values, and amplify other existential risks from a reduction in the competence of government.
Epistemic Security[5]
Arguably social media has undermined the ability of political communities to work together, making them more polarized and untethered from a foundation of agreed facts. Hostile foreign states have sought to exploit the vulnerability of mass political deliberation in democracies. While not yet possible, the spectre of mass manipulation through psychological profiling as advertised by Cambridge Analytica hovers on the horizon. A decline in the ability of the world’s advanced democracies to deliberate competently would lower the chances that these countries could competently shape the development of advanced AI.
Value Erosion through Competition
A high-stakes race (for advanced AI) can dramatically worsen outcomes by making all parties more willing to cut corners in safety. This risk can be generalized. Just as a safety-performance tradeoff, in the presence of intense competition, pushes decision-makers to cut corners on safety, so can a tradeoff between any human value and competitive performance incentivize decision makers to sacrifice that value. Contemporary examples of values being eroded by global economic competition could include non-monopolistic markets, privacy, and relative equality. In the long run, competitive dynamics could lead to the proliferation of forms of life (countries, companies, autonomous AIs) which lock-in bad values. I refer to this as value erosion; Nick Bostrom discusses this in The Future of Human Evolution (2004); Paul Christiano has referred to the rise of “greedy patterns”; Hanson’s Age of Em scenario involves loss of most value that is not adapted to ongoing AI market competition[6].
Prioritization and Theory of Impact
The optimal allocation of investments (in research, policy influence, and field building) will depend on our beliefs about the nature of the problem. Given the value I see in each of the superintelligence, ecology, and GPT perspectives, and our great uncertainty about what dynamics will be most critical in the future, I believe we need a broad and diverse portfolio. To offer a metaphor, as a community concerned about long-term risks from advanced AI, I think we want to build a metropolis—a hub with dense connections to the broader communities of computer science, social science, and policymaking—rather than an isolated island.
A diverse portfolio still requires prioritization: we don’t want to blindly fund and work on every problem in social science and computer science! If anything, our prioritization problem has become harder and more important. Whereas our problem is cognitively easier if we have a strong prior assigning zero weight to most areas, many more questions arise if we by default assign some weight to most areas. We thus must continue to examine and deliberate over the details of how the field of AI governance should grow.
Within any given topic area, what should our research activities look like so as to have the most positive impact? To answer this, we can adopt a simple two stage asset-decision model of research impact. At some point in the causal chain, impactful decisions will be made, be they by AI researchers, activists, public intellectuals, CEOs, generals, diplomats, or heads of state. We want our research activities to provide assets that will help those decisions to be made well. These assets can include: technical solutions; strategic insights; shared perception of risks; a more cooperative worldview; well-motivated and competent advisors; credibility, authority, and connections for those experts. There are different perspectives on which of these assets, and the breadth of the assets, that are worth investing in.
On the narrow end of these perspectives is what I’ll call the product model of research, which regards the value of funding research to be primarily in answering specific important questions. The product model is optimally suited for applied research with a well-defined problem. For example, support for COVID-19 vaccine research fits the product model, since it is largely driven by the foreseeable final value of research producing a usable vaccine. The product model is fairly widely held; it is perpetuated in part by researchers, who have grant incentives to tell a compelling, concrete, narrative about the value of their intended research, and whose career incentives are weighted heavily towards their research products.
I believe the product model substantially underestimates the value of research in AI safety and, especially, AI governance; I estimate that the majority (perhaps ~80%) of the value of AI governance research comes from assets other than the narrow research product[7]. Other assets include (1) bringing diverse expertise to bear on AI governance issues; (2) otherwise improving, as a byproduct of research, AI governance researchers’ competence on relevant issues; (3) bestowing intellectual authority and prestige to individuals who have thoughtful perspectives on long term risks from AI; (4) growing the field by expanding the researcher network, access to relevant talent pools, improved career-pipelines, and absorptive capacity for junior talent; and (5) screening, training, credentialing, and placing junior researchers. Let’s call this broader perspective the field building model of research, since the majority of value from supporting research occurs from the ways it grows the field of people who care about long term AI governance issues, and improves insight, expertise, connections, and authority within that field. [8]
Ironically, though, to achieve this it may still be best for most people to focus on producing good research products. The reason is similar to that for government funding of basic research: while fellowships and grants are given primarily on the merits of the research, the policy justification typically rests on the byproduct national benefits that it produces, such as nationally available expertise, talent networks, spinoff businesses, educational and career opportunities, and national absorptive capacity for cutting edge science. I will reflect briefly on these channels of impact for AI governance, though much more could be said about this.
Consider the potential problem of international control of AI, which I regard as one of the most important subproblems in AI governance. In the future we may find ourselves in a world where intense competition in AI R&D, particularly in the military domain, poses substantial global risks. The space of such scenarios is vast, varying by the role and strength of governments, the nature of the risks posed by AI R&D and the perception of those risks, the control points in AI R&D, the likely trajectory of future developments in AI, and other features of geopolitics and the global landscape. I predict that any attempt to write a plan, draft a blueprint, or otherwise solve the problem, many years in advance, is almost guaranteed to fail. But that doesn’t mean that the act of trying to formulate a plan—to anticipate possible complications and think through possible solutions—won’t provide insight and preparation. Eisenhower’s maxim resonates: “plans are useless, but planning is indispensable.” To put it concretely: I believe I have learned a great deal about this problem through research on various topics, background reading, thinking, and conversations. While it is not easy for me to distill this large set of lessons into written form, I am able to mobilize and build on the most important of these lessons for any particular situation that may arise. In sum, I think there is a lot of useful work that can be done in advance, but most of the work involves us building our competence, capacity, and credibility, so that when the time comes, we are in position and ready to formulate a plan.
Consider by analogy the problem of international control of nuclear weapons. H.G. Wells imagined, in 1913, the possibility of atomic bombs, and he sketched their risks and geopolitical implications. In so doing, he helped others, like Leo Szilard, anticipate in advance (and act on) some key features of a world with nuclear weapons, such as the necessity of global control to avoid a catastrophically dangerous arms race. But in 1945-1946, actual efforts to achieve international control depended on many specific factors: agreements, misunderstandings, and conflict between the US and Soviet Union; bargains, bluffs, and brinkmanship over everything from Eastern Europe to the bomb; espionage; technical details around the control and construction of atomic weapons; allied agreements, interests, and actions; shifting opinion amongst the US public and global elites; institutional details of the UN and UNSC; business interest in nuclear energy; and the many idiosyncrasies of decision makers such as Truman, Stalin, Groves, and Baruch. Illustrating the critical role of individuals, and their beliefs and values, the most serious plan for international control—the Acheson-Lilienthal Report—wouldn’t have been produced without the technical brilliance of people like Bush and Oppenheimer, was almost scuttled by Groves[9], and was ultimately distorted and poorly advocated by Baruch. Thus, even if we give ourselves the hindsight benefit of knowing the technical details of the technology, which even contemporaneous decision-makers didn’t have, we see that to be able to positively intervene we would do well to have experts on-hand on a wide range of global issues, those experts should be ready to adapt their insights to the specific contours of the diplomatic problem that needs to be solved, and, lastly, those experts needs to have trusted access to those who have power “in the room”.
I regard our problem as similar, but requiring an even more diversified portfolio of adaptable expertise given our greater uncertainty about technical and geopolitical parameters. Investments we make today should increase our competence in relevant domains, our capacity to grow and engage effectively, and the intellectual credibility and policy influence of competent experts.
Thanks to many at the Future of Humanity Institute and Centre for the Governance of AI for conversations about this. For specific input, I am grateful to Markus Anderljung, Asya Bergal, Natalie Cargill, Owen Cotton-Barratt, Ben Garfinkel, Habiba Islam, Alex Lintz, Luke Muehlhauser, and Toby Ord.
- ↩︎
“‘Advanced AI’ gestures towards systems substantially more capable (and dangerous) than existing (2020) systems, without necessarily invoking specific generality capabilities or otherwise as implied by concepts such as “Artificial General Intelligence” (“AGI”). AI governance definition from www.fhi.ox.ac.uk/govaiagenda.
- ↩︎
Which can be defined here simply as machines capable of sophisticated information processing.
- ↩︎
Toby Ord estimates a 1 in 10 chance of existential catastrophe from unaligned artificial intelligence within the next 100 years as compared to 1 in 10,000 from all natural risks, 1 in 1,000 from nuclear war, 1 in 1,000 from climate change, 1 in 1,000 from non-climate change mediated environmental damage, 1 in 10,000 from ‘naturally’ arising pandemics, 1 in 30 from engineered pandemics, 1 in 50 from other foreseen anthropogenic risks, and 1 in 30 from unforeseen anthropogenic risks. Risk from unaligned artificial intelligence thus comprises a substantial portion of Ord’s total estimate of 1 in 6 for total existential risk over the next 100 years.
- ↩︎
In Toby Ord’s terminology.
- ↩︎
I believe Shahar Avin coined this term.
- ↩︎
Though Hanson tends to not emphasize this aspect of the scenario.
- ↩︎
For AI safety, I would estimate there is more value in the research product, but still less than 50%.
- ↩︎
The product model becomes more appropriate as particular governance problems come more into focus, become more urgent, and demand a written solution. At the limit, for example, would be the drafting of the constitution for an important new institution. Even in such a constitution formation scenario, however, the tacit knowledge of the involved experts continues to play a critical role.
- ↩︎
Groves also played a huge role in promoting within U.S. decisionmakers the erroneous belief that the U.S. would retain the nuclear monopoly for a long time, an impact that was made possible by his monopoly on information about nuclear weapons and global nuclear supplies.
- Big List of Cause Candidates by 25 Dec 2020 16:34 UTC; 282 points) (
- Shallow evaluations of longtermist organizations by 24 Jun 2021 15:31 UTC; 192 points) (
- A personal take on longtermist AI governance by 16 Jul 2021 22:08 UTC; 173 points) (
- The longtermist AI governance landscape: a basic overview by 18 Jan 2022 12:58 UTC; 166 points) (
- [Intro to brain-like-AGI safety] 1. What’s the problem & Why work on it now? by 26 Jan 2022 15:23 UTC; 156 points) (LessWrong;
- Survey on AI existential risk scenarios by 8 Jun 2021 17:12 UTC; 154 points) (
- AGI Safety Fundamentals curriculum and application by 20 Oct 2021 21:45 UTC; 123 points) (
- EA Infrastructure Fund: May 2021 grant recommendations by 3 Jun 2021 1:01 UTC; 92 points) (
- Part 1: The AI Safety community has four main work groups, Strategy, Governance, Technical and Movement Building by 25 Nov 2022 3:45 UTC; 72 points) (
- The Tree of Life: Stanford AI Alignment Theory of Change by 2 Jul 2022 18:32 UTC; 69 points) (
- CEA’s 2020 Annual Review by 10 Dec 2020 23:45 UTC; 69 points) (
- AGI Safety Fundamentals curriculum and application by 20 Oct 2021 21:44 UTC; 69 points) (LessWrong;
- Components of Strategic Clarity [Strategic Perspectives on Long-term AI Governance, #2] by 2 Jul 2022 11:22 UTC; 66 points) (
- Survey on AI existential risk scenarios by 8 Jun 2021 17:12 UTC; 65 points) (LessWrong;
- A Survey of the Potential Long-term Impacts of AI by 18 Jul 2022 9:48 UTC; 63 points) (
- Some governance research ideas to prevent malevolent control over AGI and why this might matter a hell of a lot by 23 May 2023 13:07 UTC; 63 points) (
- AI safety university groups: a promising opportunity to reduce existential risk by 30 Jun 2022 18:37 UTC; 53 points) (
- AMA: Markus Anderljung (PM at GovAI, FHI) by 21 Sep 2020 11:23 UTC; 49 points) (
- EA Malaysia Cause Prioritisation Report (2021) by 24 Apr 2021 5:48 UTC; 45 points) (
- EA Organization Updates: October 2020 by 22 Nov 2020 20:37 UTC; 38 points) (
- EA Updates for September 2020 by 1 Oct 2020 14:51 UTC; 36 points) (
- Framing AI strategy by 7 Feb 2023 19:20 UTC; 33 points) (LessWrong;
- International cooperation as a tool to reduce two existential risks. by 19 Apr 2021 16:51 UTC; 28 points) (
- Singapore AI Policy Career Guide by 21 Jan 2021 3:05 UTC; 28 points) (
- The Tree of Life: Stanford AI Alignment Theory of Change by 2 Jul 2022 18:36 UTC; 25 points) (LessWrong;
- 17 Jun 2021 9:01 UTC; 21 points) 's comment on 2018-2019 Long-Term Future Fund Grantees: How did they do? by (
- 10 Dec 2020 11:33 UTC; 17 points) 's comment on Careers Questions Open Thread by (
- EA Forum Prize: Winners for September 2020 by 5 Nov 2020 6:23 UTC; 17 points) (
- Credo AI is hiring! by 3 Mar 2022 18:02 UTC; 16 points) (
- [AN #118]: Risks, solutions, and prioritization in a world with many AI systems by 23 Sep 2020 18:20 UTC; 15 points) (LessWrong;
- What are the biggest current impacts of AI? by 7 Mar 2021 21:44 UTC; 15 points) (LessWrong;
- AI safety university groups: a promising opportunity to reduce existential risk by 1 Jul 2022 3:59 UTC; 14 points) (LessWrong;
- Public Explainer on AI as an Existential Risk by 7 Oct 2022 19:23 UTC; 13 points) (
- 26 Nov 2020 0:39 UTC; 8 points) 's comment on Rethink Priorities 2020 Impact and 2021 Strategy by (
- 9 Sep 2021 0:00 UTC; 8 points) 's comment on How to succeed as an early-stage researcher: the “lean startup” approach by (
- AI Writeup Part 1 by 4 Feb 2022 21:16 UTC; 8 points) (LessWrong;
- 21 May 2021 11:21 UTC; 7 points) 's comment on Goals we might have when taking actions to improve the EA-aligned research pipeline by (
- 24 Nov 2020 0:33 UTC; 7 points) 's comment on Which EA organisations’ research has been useful to you? by (
- 25 Jun 2021 10:01 UTC; 5 points) 's comment on Why scientific research is less effective in producing value than it could be: a mapping by (
- 6 Jun 2021 9:05 UTC; 5 points) 's comment on EA Infrastructure Fund: Ask us anything! by (
- 24 Apr 2021 12:51 UTC; 4 points) 's comment on Propose and vote on potential EA Wiki entries by (
- 3 Nov 2020 3:58 UTC; 4 points) 's comment on John_Maxwell’s Shortform by (LessWrong;
- 長期主義的AIガバナンスの展望:基礎的な概要 by 17 Aug 2023 15:16 UTC; 2 points) (
- 課題候補のビッグリスト by 20 Aug 2023 14:59 UTC; 2 points) (
- [Opzionale] Il panorama della governance lungoterminista delle intelligenze artificiali by 17 Jan 2023 11:03 UTC; 1 point) (
- The AI Safety community has four main work groups, Strategy, Governance, Technical and Movement Building by 25 Nov 2022 3:45 UTC; 1 point) (LessWrong;
Just wanted to say that I found this to be a really helpful guide to GovAI’s research direction and your general views on how the field of AI governance should grow. It would be great to see more high-level explainers like this for other research topics in EA. Thanks Allan!
I summarized this in AN #118, along with a summary of this related podcast and some of my own thoughts about how this compares to more classical intent alignment risks.
I found this section very confusing. It is definitely not the case that competition always pushes towards cutting corners on safety. In many cases consumers prize safety and are willing to pay a premium for it, for example with automobiles or baby-related products, where companies actively advertise on this basis. In some cases it does reduce safety—typically when consumers think the increase in safety is too marginal to be worth the cost. And I’m not sure that ‘non-monopolistic markets’ count as a contemporary example of a value? Nor do I think privacy is really a good example here: privacy has largely been undermined by 1) governments, who are not directly subject to strong competitive pressures and 2) consumers voluntarily giving up privacy in return for better/cheaper services.
The conclusion is, in my view, correct—competitive pressures should be expected to degrade safety for AI. But this is ultimately due to externalities: an unsafe AI imposes costs (in expectation) on third parties. If externalities were not present, as they are not (to a significant degree) in your examples, competition would not cause an undesirable reduction in safety. If AGI only threatened extinction for AGI developers and their customers, AI safety would not be a big problem.
Yes, I was compressing a lot of ideas in a small space!
We have conceptual slippage here. By safety I’m referring to the safety of the developer (and maybe the world), not of a consumer who decides which product to buy. As you note, if safety is a critical component of winning a competition, then safety will be incentivized by the competition.
It is generally the case that it is hard to separate out (self) safety from competitive performance; we might best conceptualize “safety” as the residual of safety (the neglected part) after conditioning on performance (ie we focus on the parts of safety that are not an input to performance).
Increased competitive pressure will: (1) induce some actors to drop out of the competition, since they can’t keep up; (2) those who stay in will in equilibrium invest more in winning the competition, investing less in other goods. This happens when there is some “tradeoff” or fungibility across anything of value, and winning the competition. You can think of “competitive pressure” as the slope of the function of the probability that one wins the competition, given investment in winning (and holding the other’s investment fixed). If the contest is a lottery, then the slope is flat. If one is so far ahead or behind that effort doesn’t make a difference, the slope is flat. If we are neck in neck, and investment maps into performance, then the marginal incentive to invest can get high.
You are right that in the presence of a safety externality this race to the bottom can get much worse. I think of Russell’s setting “the remaining unconstrained variables to extreme values”. In the Cold War, nuclear brinkmanship yielded advantage, and so world leaders gambled with nuclear annihilation.
But this dynamic can still happen even if the risk is only to oneself. An individual can rationally choose to accept a greater risk of harm, in exchange for keeping up in a now more intense contest. The “war of attrition” game (or equivalently “all pay auction”) exemplifies this, where the player only pays what they bid, and yet in symmetric equilibria, actors do burn resources competing for the prize (more so the more valuable is the prize).
The externality in a race is that anything I do to increase my chance of winning, reduces your chance of winning, and so “investment in winning” will be oversupplied relative to the socially optimal level.
Overall, thinking clearly through the strategic dynamics of models like these can be very complex, and one can often generate a counter-intuitive result. Robert Trager, formerly myself, and others have a lot of work exploring the various strategic wrinkles. I’m not sure what is the best link, here are some places to start that I’m more familiar with: https://forum.effectivealtruism.org/posts/c73nsggC2GQE5wBjq/announcing-the-spt-model-web-app-for-ai-governance
Eoghan Stafford, Robert Trager and Allan Dafoe, Safety Not Guaranteed: International Strategic Dynamics of Risky Technology Races, Working paper, July 2022. https://drive.google.com/file/d/1zYLALn3u8AhuXA4zheWmrD7VHmI6zTvD/view?usp=sharing
Good points, to play devil’s advocate it did take many decades for auto manufacturers to include seat belts by default.
It seems safety policy sometimes meets resistance by production / profit motive, despite clear evidence of cause of death.
My understanding is a lot of that is just that consumers didn’t want them. From the first source I found on this:
This is not surprising to me given that, even after the installation of seatbelts became mandatory, it was decades until most Americans actually used them. Competition encouraged manufacturers to ‘cut corners’ on safety in this instance precisely because that was what consumers wanted them to do.
Ah yeah thanks for informing me.
This continues to be one of the most clearly written explanations of a speculative or longtermist intervention that I have ever read.
Thanks for this, I found this really useful! Will be referring back to it quite a bit I imagine.
I would say researchers working on AI governance at the Centre for the Study of Existential Risk and the Leverhulme Centre for the Future of Intelligence, University of Cambridge (where I work) would agree with a lot of your framing of the risks, pathways, and theory of impact.
Personally, I find it helpful to think about our strategy under four main points (which I think has a lot in common with the ‘field-building model’):
1. Understand—study and better understand risks and impacts.
2. Solutions—develop ideas for solutions, interventions, strategies and policies in collaboration with policy-makers and technologists.
3. Impact—implement those strategies through extensive engagement.
4. Field-build—foster a global community of academics, technologists and policy-makers working on these issues.
On AI governance we (Regulatory Institute) published a two-part report on artificial intelligence, which provides an overview of considerations for the regulation of AI as well as a comparative legal analysis of what various jurisdictions have done to regulate AI. For those interested the links are below:
(1) covering the regulatory landscape (http://www.howtoregulate.org/artificial_intelligence/#more-322), and
(2) an outline of future AI regulation (http://www.howtoregulate.org/aipart2/#more-327).
Its a burgeoning area of regulatory research the effectiveness of current regulations of AI particularly in the banking and trade sector, where such regulations are mature. Government’s are also turning their minds to the safe use of AI in the deployment of government services and decision-making, through various guidelines or in some cases following failed deployment of automated decision-making systems (ie. Australia’s Robodebt). We are looking now at how best to regulate the use of AI-based automated decision-making systems in government, particularly when the end-user has no choice of service or is a vulnerable member of the population.
But what of the role of government in identifying, assessing and mitigating risks of AI in research for example. We also published a four-part report about how research and technology risks could be covered by regulation, see below for links:
(1) Regulating Research and Technology Risks: Part I – Research Risks: http://www.howtoregulate.org/regulating-research-technology-risks-part-i-research-risks/#more-248
(2) Regulating Research and Technology Risks: Part II – Technology Risks: http://www.howtoregulate.org/technology-risks/
(3) Research and Technology Risks: Part III – Risk Classification: http://www.howtoregulate.org/classification-research-technology-risks/#more-296
(4) Research and Technology Risks: Part IV – A Prototype Regulation: http://www.howtoregulate.org/prototype-regulation-research-technology/
A quick note on epistemic security: we’ve just published a report exploring some of these ideas (previously discussed with GovAI) in partnership with the Alan Turing Institute and the UK’s Defence Science Technology Laboratory, and building on a previous series of workshops (which Eric Drexler among others participated in). For those interested, it’s available below.
https://www.turing.ac.uk/research/publications/tackling-threats-informed-decision-making-democratic-societies
“Access to reliable information is crucial to the ability of a democratic society to coordinate effective collective action when responding to a crisis, like a global pandemic, or complex challenge like climate change. Through a series of workshops we developed and analysed a set of hypothetical yet plausible crisis scenarios to explore how technologically exacerbated external threats and internal vulnerabilities to a society’s epistemic security – its ability to reliably avert threats to the processes by which reliable information is produced, distributed, and assessed within the society – can be mitigated in order to facilitate timely decision-making and collective action in democratic societies.
Overall we observed that preserving a democratic society’s epistemic security is a complex effort that sits at the interface of many knowledge domains, theoretical perspectives, value systems, and institutional responsibilities, and we developed a series of recommendations to highlight areas where additional research and resources will likely have a significant impact on improving epistemic security in democratic societies”
If I understand correctly, you view AI Governance as addressing how to deal with many different kinds of AI problems (misuse, accident or structural risks) that can occur via many different scenarios (superintelligence, ecology or GPT perspectives). I also think (though I’m less confident) that you think it involves using many different levers (policy, perhaps alternative institutions, perhaps education and outreach).
I was wondering if you could say a few words on why (or if!) this is a helpful portion of problem-assumption-lever space to carve into a category. For example, I feel more confused when I try and fit (a) people navigating Manhattan projects for superintelligent AI and (b) people ensuring an equality-protecting base of policy for GPT AI into the same box than when I try and think about them separately.
I’ll drop in my 2c.
AI governance is a fairly nascent field. As the field grows and we build up our understanding of it, people will likely specialise in sub-parts of the problem. But for now, I think there’s benefit to having this broad category, for a few reasons:
There’s a decent overlap in expertise needed to address these questions. By thinking about the first, I’ll probably build up knowledge and intuitions that will be applicable to the second. For example, I might want to think about how previous powerful technologies such as nuclear weapons came to be developed and deployed.
I don’t think we currently know what problems within AI governance are most pressing. Once we do, it seems prudent to specialise more.
This doesn’t mean you shouldn’t think of problems of type a and b separately. You probably should.
This doesn’t yet seem obvious to me. Take the nuclear weapons example. Obviously in the Manhattan project case, that’s the analogy that’s being gestured at. But a structural risk of inequality doesn’t seem to be that well-informed by a study of nuclear weapons. If we have a CAIS world with structural risks, it seems to me that the broad development of AI and its interactions across many companies is pretty different from the discrete technology of nuclear bombs.
I want to note that I imagine this is a somewhat annoying criticism to respond to. If you claim that there are generally connections between the elements of the field, and I point at pairs and demand you explain their connection, it seems like I’m set up to demand large amounts of explanatory labor from you. I don’t plan to do that, just wanted to acknowledge it.
It definitely seems true that if I want to specifically figure out what to do with scenario a), studying how AI might affect structural inequality shouldn’t be my first port of call. But it’s not clear to me that this means we shouldn’t have the two problems under the same umbrella term. In my mind, it mainly means we ought to start defining sub-fields with time.
Thanks for the response!
It makes sense not to specialize early, but I’m still confused about what the category is. For example, the closest thing to a definition in this post (btw, not a criticism if a definition is missing in this post. Perhaps it’s aimed at people with more context than me) seems to be:
To me, that seems to be synonymous with the AI risk problem in its entirety. A first guess at what might be meant by AI governance is “all the non-technical stuff that we need to sort out regarding AI risk”. Wonder if that’s close to the mark?
A great first guess! It’s basically my favourite definition, though negative definitions probably aren’t all that satisfactory either.
We can make it more precise by saying (I’m not sure what the origin of this one is, it might be Jade Leung or Allan Dafoe):
AI governance has a descriptive part, focusing on the context and institutions that shape the incentives and behaviours of developers and users of AI, and a normative part, asking how should we navigate a transition to a world of advanced artificial intelligence?
It’s not quite the definition we want, but it’s a bit closer.
OK, thanks! The negative definition makes sense to me. I remain unconvinced that there is a positive definition that hits the same bundle of work, but I can see why we would want a handle for the non-technical work of AI risk mitigation (even before we know what the correct categories are within that).
On AI governance we (Regulatory Institute) published a two-part report on artificial intelligence, which provides an overview of considerations for the regulation of AI as well as a comparative legal analysis of what various jurisdictions have done to regulate AI. For those interested the links are below:
(1) covering the regulatory landscape (http://www.howtoregulate.org/artificial_intelligence/#more-322), and
(2) an outline of future AI regulation (http://www.howtoregulate.org/aipart2/#more-327).
Its a burgeoning area of regulatory research the effectiveness of current regulations of AI particularly in the banking and trade sector, where such regulations are mature. Government’s are also turning their minds to the safe use of AI in the deployment of government services and decision-making, through various guidelines or in some cases following failed deployment of automated decision-making systems (ie. Australia’s Robodebt). We are looking now at how best to regulate the use of AI-based automated decision-making systems in government, particularly when the end-user has no choice of service or is a vulnerable member of the population.
But what of the role of government in identifying, assessing and mitigating risks of AI in research for example. We also published a four-part report about how research and technology risks could be covered by regulation, see below for links:
(1) Regulating Research and Technology Risks: Part I – Research Risks: http://www.howtoregulate.org/regulating-research-technology-risks-part-i-research-risks/#more-248
(2) Regulating Research and Technology Risks: Part II – Technology Risks: http://www.howtoregulate.org/technology-risks/
(3) Research and Technology Risks: Part III – Risk Classification: http://www.howtoregulate.org/classification-research-technology-risks/#more-296
(4) Research and Technology Risks: Part IV – A Prototype Regulation: http://www.howtoregulate.org/prototype-regulation-research-technology/
Reply