AISN #38: Supreme Court Decision Could Limit Federal Ability to Regulate AI Plus, “Circuit Breakers” for AI systems, and updates on China’s AI industry

Link post

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.


Supreme Court Decision Could Limit Federal Ability to Regulate AI

In a recent decision, the Supreme Court overruled the 1984 precedent Chevron v. Natural Resources Defence Council. In this story, we discuss the decision’s implications for regulating AI.

Chevron allowed agencies to flexibly apply expertise when regulating. The “Chevron doctrine” had required courts to defer to a federal agency’s interpretation of a statute in the case that that statute was ambiguous and the agency’s interpretation was reasonable. Its elimination curtails federal agencies’ ability to regulate—including, as this article from LawAI explains, their ability to regulate AI.

The Chevron doctrine expanded federal agencies’ ability to regulate in at least two ways. First, agencies could draw on their technical expertise to interpret ambiguous statutes rather than rely on lawmakers or courts to provide clarity. Second, they could more easily apply existing statutes to emerging areas of regulation.

The loss of Chevron will be particularly impactful for AI regulation. AI is a technical and rapidly changing industry. More than most areas of regulation, then, federal agencies will require technical expertise and flexibility to effectively regulate AI. The loss of Chevron makes the efforts of AI regulators vulnerable to lengthy and uncertain challenges in the courts.

Lawmakers should create AI-specific legislation that explicitly grants agencies regulatory discretion. Federal agencies will no longer be able to easily apply existing legislation to emerging domains. Instead, the end of Chevron creates an onus on US lawmakers to develop new, AI-specific legislation to enable regulating authorities—for example, California’s SB 1047.

Lawmakers should also explicitly grant federal agencies broad discretion to interpret key parts of such legislation—for example, the definition of a “frontier model.”

“Circuit Breakers” for AI Systems

LLMs and other AI systems are vulnerable to adversarial attacks. For example, users can often get LLMs to generate harmful outputs through jailbreaking. While several approaches have been proposed to defend against these kinds of attacks, they fail to generalize across the wide range of vulnerabilities. In this story, we explain a new approach.

“Circuit breakers” can prevent harmful output by intervening in a model’s internal representations. A new paper introduces “circuit breakers” as a method for defending AI systems from adversarial attack. Rather than focusing solely on harmful outputs, “circuit breakers” interrupt harmful processes within AI systems by observing and intervening in their internal representations.

One such technique presented in the paper is “Representation Rerouting.” This technique redirects internal representations related to harmful processes towards incoherent or refusal states, effectively “short-circuiting” the generation of harmful content. The technique is versatile, as it can be applied to both LLMs and AI agents across various modalities.

This method requires the creation of two datasets: a “circuit breaker” set which contains actions or responses that are prohibited, and a “retain” set which includes actions or responses that are allowed. By training the model with these datasets, researchers can fine-tune the AI’s internal processes to recognize and halt harmful outputs.

“Circuit breakers” yield promising results, although more work is needed. The paper’s approach was able to achieve significant reduction in harmful outputs: 87% for Llama-3 (8B). It also reduced harmful actions in AI agents by 83-84%.

At the same time, the approach had minimal impact on model capabilities (less than 1% decrease in overall performance). It was also effective in multimodal settings, including resistance to image-based hijacking.

However, while the initial results are encouraging, it’s important to be aware that no single approach is likely to be a perfect, permanent solution. The code is available here.

Updates on China’s AI Industry

In this story, we cover three recent developments that affect the outlook for China’s domestic AI industry: (1) bankruptcies among Chinese semiconductor companies, (2) a CCP-backed investment fund, and (3) new US restrictions on AI investment in China.

China’s semiconductor industry faces low investor confidence. A $2.5 billion Chinese semiconductor company, Shanghai Wusheng Semiconductor, recently went bankrupt. This is but one high-profile case in a wave of increasing financial instability in the industry: in 2023, 10,900 Chinese semiconductor-related companies closed down, nearly double that of 2022.

Surging closures have undermined investor confidence in Chinese semiconductors. Since early 2023, 23 Chinese semiconductor companies have withdrawn their IPO applications, indicating investors’ growing caution toward the sector.

Low investor confidence will likely hurt China’s push for self-sufficiency in semiconductors and its AI competitiveness. However, some argue that this “failure phase” of semiconductors, in line with China’s past economic policy, does not yet signal trouble for the industry.

China’s government continues to invest heavily in the semiconductor industry. In May, China launched the third phase of its government-backed fund for the semiconductor industry—known as the “Big Fund”, or the China Integrated Circuit Industry Investment Fund—raising a total of $47.5 billion.

This move has already positively impacted the market: the CES CN Semiconductor Index, which measures semiconductor chips stocks’ performance on China’s A-share market, rose by over 3%, marking its biggest one-day gain in more than a month. (The index has since fallen again, but remains above pre-investment levels.)

The US proposes restrictions on AI and tech investments in China. The Treasury Department recently issued draft rules for banning or reporting AI and technology investments in China which could threaten national security. The draft rules would ban transactions in AI systems for certain uses and over certain compute thresholds, and require notification of transactions related to AI development and semiconductors.

The proposal follows through on Biden’s Executive Order last August, which ordered regulations for U.S. foreign investments in sensitive technologies such as semiconductors, quantum computing and AI. The rules are expected to be implemented by the end of the year.

Overall, it is unclear whether the US’ new restrictions and existing low investor confidence will outbalance China’s strong government subsidies.

Links

News and Opinion

Technical Content

See also: CAIS website, CAIS X account, our ML Safety benchmark competition, our new course, and our feedback form. The Center for AI Safety is also hiring a project manager.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe here to receive future versions.