The right to protection from catastrophic AI risk

Linkpost to my blog, Ordinary Events.

Epistemic status: Highly speculative—a work in progress. I’m interested in feedback on the proposed right, and on how implementation could happen.

The White House seeks an AI Bill of Rights.[1] What are the guiding philosophical principles of such rights? Can we include existential risk concerns into such a document? How do we implement these rights?

Background

“Powerful technologies should be required to respect our democratic values and abide by the central tenet that everyone should be treated fairly.”—excerpt from AI Bill of Rights proposal (emphasis added)

This idea seems trivially obvious. Technology development should allow all humans to live better, happier lives. Democracies are better at making humans happy than non-democracies. On top of that, new technologies usually come with economic growth and solving persistent problems, and AI is no exception.

As the technology develops, however, we should take care to limit its negative effects. AI systems already feed into existing patterns of bias and systematic inequalities. One example, as cited in the proposal, is that “mortgage approval algorithms to determine credit worthiness can readily infer that certain home zip codes are correlated with race and poverty, extending decades of housing discrimination into the digital age.” AI developers need to improve their products to achieve fairness and democratic values.

Further, treating everyone fairly requires us to think carefully about the negative externalities of AI development on future generations. Future generations matter and should be included in our moral circle. The worst possible outcome for future humans is to never exist, or to exist in a world of untold suffering. As we seek to develop transformative AIs that might be much more powerful and intelligent than humans, we would do well to keep the rights of future humans in mind, lest we consign them to a terrible fate.

The right to protection from catastrophic AI risk

Our country should clarify the rights and freedoms we expect data-driven technologies to respect. What exactly those are will require discussion, but here are some possibilities: your right to know when and how AI is influencing a decision that affects your civil rights and civil liberties; your freedom from being subjected to AI that hasn’t been carefully audited to ensure that it’s accurate, unbiased, and has been trained on sufficiently representative data sets; your freedom from pervasive or discriminatory surveillance and monitoring in your home, community, and workplace; and your right to meaningful recourse if the use of an algorithm harms you. - excerpt from AI Bill of Rights proposal

The proposed rights enumerated above are all important in the near-term future of artificial intelligence. However, none of them addresses the most fundamental threat posed by transformative AI systems: catastrophic or existential risk to the future of humanity.

What might such a right look like? The following is my attempt to think through how such a right might be phrased:

We[2] have the right to a long-term future of human flourishing. To guarantee this, we must avoid the risk of catastrophic outcomes associated with the development of transformative AI (TAI). Among other things, this right guarantees that, as we near the technological frontier for TAI:

  • 1) A consensus of leading AI safety researchers, coordinating with U.S. regulators, will:

    • a) develop a robust understanding of the AI alignment problem;

    • b) develop a model for a TAI system that is aligned with human values;

    • c) define the technical outlines of robust failsafe modes[3] for all such TAI systems.

  • 2) TAI developers will ensure that the systems they build align with the model(s) described in 1).

  • 3) Systems developed in 2) will, before application/​commercial use, undergo thorough auditing by both outside AI researchers and U.S. regulators to ensure alignment as described in 1).

This right attempts to both address the philosophical question of AI’s impact on the long-term future and set up a framework for how to increase the likelihood of such a future.

This right does NOT:

  • Define terms. I am bowing to consensus on some important terms, including “transformative AI,” “the AI alignment problem,” and “technological frontier.” I posit that other terms will take on sharper meaning over time as applied to TAI, including “failsafe modes,” what it means to be “near” the technological frontier, the particular process of a TAI “audit,” and what a “consensus” of leading AI researchers looks like.

  • Look at existential risks beyond TAI. Narrow AI systems, including those involved in military or nuclear applications, would likely be covered by other aspects of the Bill of Rights. Bioweapons and pandemic prevention are also important, but outside the scope.

  • Necessitate any particular solution to the alignment problem. This draft is agnostic to technical questions of the architecture of a human-compatible AI system.

  • Meet the legal standard for a “right,” as narrowly defined by the U.S. Constitution. As of now, I suspect that this right is not likely to be guaranteed by any Supreme Court interpretation of constitutional rights. Rather, the framing as a “right” is likely to be aspirational, signaling, and guiding policy development within the executive branch. I am hardly an expert in this area and would be interested to consider an informed legal take.

  • Seriously grapple with the incentive alignment of the relevant stakeholders. I suspect that more work would be necessary to flesh out a regulatory framework that passes game-theoretical muster. For instance, we wouldn’t want the U.S. body overseeing TAI development to behave like the Nuclear Regulatory Commission and slow down AI development out of a concern for safety not warranted by the facts. We also wouldn’t want to set up perverse incentives for AI developers to avoid regulation, as that could damage the industry or make the enforcement of this right toothless.

This right requires that, before transformative AI is put into effect, we develop a high-fidelity understanding of what constitutes AI alignment and a TAI system aligned with human values. This is a high bar! It’s possible that we will never get there; it’s possible that TAI technology will come faster than our solution to the alignment problem. In the former case, this right would imply that we should not develop TAI at all; in the latter case, this right would imply that we should delay the deployment of TAI until we solved alignment. At this stage, it’s far from clear that either approach is possible given current regulatory/​enforcement tools.

Implementation/​Regulation

Of course, enumerating the rights is just a first step. What might we do to protect them? Possibilities include the federal government refusing to buy software or technology products that fail to respect these rights, requiring federal contractors to use technologies that adhere to this “bill of rights,” or adopting new laws and regulations to fill gaps. - excerpt from AI Bill of Rights proposal

Given the enormous wealth that TAI systems could confer onto their developers, I am skeptical that the possibilities named above would incentive TAI developers to comply with the proposed right. In truth, the federal government and federal contractors are already a small part of most markets for AI systems, so I am skeptical that these enforcement mechanisms would even discourage many current AI developers from developing technology without regard for this Bill of Rights.

What are some relevant considerations for “new laws and regulations”? We require an enforcement mechanism that is either aligned with or powerful enough to outweigh the enormous incentives TAI developers might have to develop such systems. We require a mechanism that accounts for “bad actors” who might seek to develop TAI systems outside of regulatory bounds. Since we may only have one bite of the apple (if we assume quick takeoff speed), this mechanism must be sensitive enough to filter out all permutations of misaligned TAI and specific enough to identify aligned TAI.

A federal agency tasked with coordinating U.S. policy on AI alignment seems like a good start. Since this field is highly complex and likely to increase in complexity, it seems helpful to create such an office; staff it with smart, conscientious, experienced people; and task them with identifying the laws and regulations that will support the U.S. AI industry in upholding the AI Bill of Rights. Crucially, as a federal agency, they can then implement those laws and regulations.[4]

Currently, AI policy is developed in a multitude of different U.S. agencies and departments: State, Defense, Commerce, Treasury, and many more. In 2019, the American AI Initiative was created by executive order and tasked with coordinating AI policy throughout the government. Crucially, its mission is to speed rather than regulate AI development. Lynne Parker, head of the initiative, writes that the executive order “reduces barriers to the safe development, testing, deployment, and adoption of AI technologies by providing guidance for the governance of AI consistent with our Nation’s values and by driving the development of appropriate AI technical standards.” While this sounds distressing to those concerned about catastrophic risks from TAI, this initiative is limited in scope to “narrow” applications of AI. So far, it seems that there is no unified U.S. policy toward the development of “general” or transformative AI—which might be yet more distressing.

Most informed observers put transformative AI timelines within the 21st century, but not within the next decade. This leaves time (but not much time!) to develop a robust regulatory ecosystem within U.S. government, in tandem with our development of a robust understanding of AI alignment. Perhaps the best thing we could do in the near future is to create a federal agency and staff it with the right people, such that, when future research yields more precise answers about the boundaries of AI safety and catastrophic risks, the government is poised to take swift action.[5]

  1. ^

    Precious few details are offered as to what the practical implications of such a Bill of Rights are. They say, “In the coming months, the White House Office of Science and Technology Policy (which we lead) will be developing such a bill of rights, working with partners and experts across the federal government, in academia, civil society, the private sector, and communities all over the country.” But they give few details as to how the Bill will be implemented. My guess is that it would serve as policy guidance, akin to a policy memo, for various executive branch departments, but that it wouldn’t be codified in legislation through Congress or interpreted as a legal/​constitutional right.

  2. ^

    Left intentionally vague. I think that everyone could read their preferred understanding from “we,” whether they are concerned with “we” current Americans, “we” current and future Americans, or “we” current and future humans.

  3. ^

    In terms of failsafe modes, I’d like to avoid a Dr. Strangelove scenario, or a paperclip-maximizer. I am not sure how possible it is to specify a process by which the TAI could be safely shut down that wouldn’t be corrupted by the TAI on its way to its own goals, but I will leave that to the technical people! (Not optimistic that it can be done, but also this is meant to be “ideal governance.”)

  4. ^

    Federal agencies are hard to create, as they require an act of Congress. Understandably so! An executive action that might be helpful is repurposing the existing federal initiative, as described here, to a more safety-oriented posture.

  5. ^

    Arguably, the best time to create such a federal agency would be in the 5-10 years before the advent of TAI. Not that we will see it coming with precise certainty—maybe we’ll have fast takeoff speed! But an agency that has three or four decades to develop might invite a fair amount of mission creep, or have time to invent lots of regulations that would be counterproductive.