Certified Safe: A Schematic for Approval Regulation of Frontier AI

This is a short summary of a longer report I authored as my research project for the University of Chicago Existential Risk Lab summer fellowship. In a sentence, this is an exploration of and proposal for comprehensive domestic approval regulation of frontier AI. The language and structure is targeted for Washington, but EAs will get a lot out of reading this summary and, if interested, sections of the full report: https://​​arxiv.org/​​pdf/​​2408.06210

The frontier of general artificial intelligence (AI) capabilities is progressing rapidly and in potentially dangerous directions. Future AI models may be used to create bioweapons, transform militaries, spread misinformation at unprecedented scale, or they may become powerful and uncontrollable—a catastrophic combination.

Various structural features amplify these risks of large-scale harm from AI: intense competition leads developers to underinvest in safety measures; highly concentrated rewards incentivize developers to pursue production despite possible highly distributed harms; corporate governance structures face pressure from profit motives; and the incentives to pursue model theft, even for state actors, increase by the day. The great challenges of this technology, and the desire to harness its incredible potential, require an earnest and timely pursuit of comprehensive, rather than patchwork, regulation of frontier (i.e., the most generally capable) AI.

Approval regulation is emerging as particularly promising amongst potential regulatory regimes. An approval regulation scheme is one in which a firm cannot legally market, or in some cases develop, a product without explicit approval from a regulator on the basis of experiments performed upon the product that demonstrate its safety. For a product to be sold, it must be certified safe. The two most effective existing applications of approval regulation—the US Federal Aviation Administration’s process to certify new plane types and the US Food and Drug Administration’s process to certify drugs and medical devices—are incredibly successful as regulators in high risk industries. For example, American commercial aviation has seen only two fatalities since 2010, despite operating over ten million flights per year.

Many, including AI firm CEOs, researchers, and policymakers, have publicly supported approval regulation of frontier AI. It seems a particularly appealing choice in this industry for a few reasons. First, approval gates (i.e., decision barriers) naturally map on well to the phases of AI development. Second, the number of models to regulate is small, since only the most expensive models are relevant to large-scale risk and thus apply to this process. Regulation need not be burdensome for small models. Third, approval regulation tackles the industry’s information asymmetry issues. Finally, this regime allows for scrutiny throughout development and deployment, which is required to defend against increasingly capable theft attempts.

This proposal details a schematic, based on the FAA’s type certification process for aircraft, for the approval regulation of frontier AI. In this “model certification process,” any model planned for training on more computation than some threshold initiates a certification project. Before training a covered model, the regulator must grant Training Authorization to the applicant firm, which specifies model details, a training timeline, estimates of capability progression, and verification of agreed-upon safety and security measures. During monitored training, a Certification Basis (CB) and certification plan are jointly composed, in which the regulator specifies line item requirements for legal deployment and the applicant plans a set of experiments to meet each one. After training, these experiments are performed, verified, and checked for compliance with requirements. If the CB is satisfied, a model deployment card and instructions for continued safety are issued, specifying legal deployment conditions and required post-deployment safety and security measures.

A high-level overview of the entire model certification process. Events above the center timeline are activities primarily performed by the applicant; those below are primarily performed by the regulator (“FAIA”). Each approval gate is linked to one or two key documents. The timeline is divided into Ideation, Before Training, Training, Compliance Showing and Finding, and Post-Deployment phases by approval gates.

The implementation of such a process faces five major challenges:

  1. Firms may deploy certified models under illegal circumstances or deploy uncertified models in difficult-to-detect environments (e.g., internally).

  2. It may be difficult to construct a demonstrable, specific, and comprehensive list of certification requirements that guarantee deployment readiness.

  3. Testing and evaluation methods are not yet reliable enough to demonstrate these requirements with confidence.

  4. Filtering out safe models with a computational use threshold may become unreliable over time.

  5. Implementation must strike the correct balance of minimizing regulatory overhead (to stimulate innovation) without compromising effective regulatory scrutiny.

Yet, further work on these challenges may clear the way for approval regulation that secures AI development from theft, confidently establishes model safety, and sustains a competitive and peerless domestic AI industry. To achieve this, stakeholders should urgently:

  • Improve evaluation techniques. Government agencies, frontier AI firms, and academic research should tackle the challenge of effective evaluation and broader testing of AI models.

  • Specify deployment readiness conditions. Policymakers and other relevant stakeholders should enumerate the conditions required for a model to be safely and securely deployed.

  • Bolster compute use tracking and detection. Executive action should enforce compute reporting requirements and collaborate with AI chip designers to create and test on-chip hardware for tracking and identification.

Less urgently, but nevertheless before approval regulation is used, stakeholders should:

  • Minimize potential regulatory overhead. Government agencies and frontier AI firms should research causes of regulatory overhead in approval regulation and methods for reducing them in any application to AI.

  • Establish whistleblower protections. Federal action should provide enhanced whistleblower protections and rewards to those working in frontier AI firms.

  • Determine best model filtering practices. Government agencies and independent or academic organizations should direct research towards determining the adequacy of and alternatives to a compute use filter.

  • Establish information and personnel security. Any regulator should implement reliable information and personnel security to protect dangerous or proprietary information, which may be incredibly valuable but must be shared to enable adequate regulatory scrutiny.

In addition, the analysis in this report yields the following general recommendations, designed to improve the effectiveness of any comprehensive governance regime for frontier AI:

  • Regulate throughout development and deployment. Any set of regulations which hopes to ensure safe and secure AI should be involved at least from training to post-deployment.

  • Consider approval gating in any regime. Any regulatory approach should consider the use of a regulatory gate to promote transparency, compliance, extra safety and security testing, etc.

  • Consider checkpoint capability estimation. Checkpoint capability estimation should be considered for use in any training oversight regime in order to determine the extent to which developing capabilities are expected and manageable.

The full report can be found at https://​​arxiv.org/​​pdf/​​2408.06210. Feedback is welcomed.

No comments.