The Pentagon Is Invoking Wartime Law Over AI Nobody Can Govern

Comprehension Decay Went Live This Week

This is a follow-up to my earlier post on comprehension decay, which laid out the finding that current AI systems score near-perfect on comprehensibility but near-zero on reversibility. They explain themselves fluently while providing no mechanisms for contestation, audit, or correction. I called this pattern “comprehension decay” and argued it represents a structural gap between readability and governability. Three things happened this week that turned that argument from theory into breaking news.

On Tuesday, Holden Karnofsky published his detailed account of why Anthropic rewrote its Responsible Scaling Policy, and Anthropic released the new RSP, dropping its binding commitment to pause development if safety thresholds weren’t met. On Tuesday, too, Bloomberg reported that Defense Secretary Pete Hegseth threatened Anthropic with the Defense Production Act, a Korean War-era industrial compulsion law, unless the company drops its AI usage restrictions for military applications by Friday. The Pentagon wants Claude deployed in classified operations and kill chains with no usage policy constraints.

I want to walk through why these three events, taken together, are the clearest real-world demonstration so far of the governance gap I described in my earlier post. And I want to suggest that the proposed solutions, Karnofsky’s flexible Roadmaps on one side and Hegseth’s brute-force ultimatum on the other, both fail for the same underlying reason: there’s no governance infrastructure that can actually evaluate whether a given AI deployment is legitimate.

To be clear about where I’m coming from: I’m not advocating for stopping AI development or for any particular political outcome in the Pentagon standoff. I’m advocating for building evaluation infrastructure, concrete tools that can tell regulators, deployers, and the public whether a given AI system in a given context meets the minimum requirements of legitimate governance. That infrastructure doesn’t exist. This week showed why it needs to.

What Karnofsky said and what it means

Karnofsky’s post is worth reading in full. It’s unusually honest. He takes personal responsibility for the RSP changes, describes a year-long push to make them, and explains in detail why Anthropic’s original binding commitments, the promise to pause development if capability thresholds were crossed without adequate safeguards, weren’t working.

His core argument: the old RSP created “distorting pressures” on risk assessment. Anthropic knew that if it declared a model to cross certain capability thresholds, the RSP would require a pause or slowdown that would be “extremely damaging to the company” while having “little discernible public benefit.” The result was “an enormous amount of pressure to declare our systems to lack relevant capabilities, to declare our risk mitigations to be on track to be strong enough.” Karnofsky doesn’t claim Anthropic actually made bad calls under this pressure, but he’s clear that the pressure existed and that it distorted the company’s epistemic environment.

This is comprehension decay applied to governance policy itself.

The old RSP appeared to be a robust accountability structure. Clear if-then commitments: if capabilities cross threshold X, then safeguards Y must be in place, or development pauses. It was readable. It was structured. Karnofsky himself acknowledges that “it’s been easy to get the impression that the RSP is ‘binding ourselves to the mast’ and committing to unilaterally pause AI development and deployment under some conditions, and Anthropic is responsible for that.”

But beneath that readable surface, the governance functions weren’t working. The capability assessments that would trigger commitments were subject to institutional pressure to come out the right way. The safeguard requirements for higher capability levels (ASL-4, ASL-5) were, by Karnofsky’s own account, not achievable on realistic timelines. The forcing function that was supposed to drive safety work was instead driving either Hail Mary research bets or quiet hope that the thresholds wouldn’t be crossed too soon.

Readable but ungovernable. Clear on the surface, inaccessible at depth. The pattern I observed in AI outputs was reproduced in the governance structure designed to oversee them.

What the Pentagon standoff reveals

Now put Karnofsky’s post next to Tuesday’s Bloomberg report.

Karnofsky’s proposed replacement for binding commitments is a system of voluntary Roadmaps, Risk Reports, and eventual external review. The idea is that companies publish transparent assessments of their risks and mitigation plans, and a “race to the top” dynamic, where companies compete on the quality of these visible artifacts, drives actual risk reduction. It’s flexible, iterative, and relies on good-faith engagement from companies that have “truly bought-in employees.”

The next day, the Defense Secretary threatened to invoke wartime industrial law to compel Anthropic to hand over its AI tools with no usage restrictions whatsoever.

I don’t think Karnofsky failed to anticipate a hostile political environment. He’s explicit that the current environment lacks political will for serious AI regulation. But his proposed framework assumes that the relevant pressure on companies will come from comparative transparency, that publishing good Risk Reports creates reputational incentives. The Pentagon standoff shows that the relevant pressure can also come from raw coercion, and that voluntary frameworks have no mechanism to resist it.

A Pentagon worker told Bloomberg that regardless of Anthropic’s usage policy or audit capabilities, the company “would never be privy to all the details of how its AI was deployed in classified and real-time operations.” That’s the reversibility problem at its most extreme. Once Claude is inside classified systems, the company that built it can’t trace how it’s being used, can’t contest decisions made with it, can’t audit outcomes, and can’t verify that its own usage policies are being followed.

Anthropic’s CEO Dario Amodei is actually making the interpretive symmetry argument himself, though not in those terms. He told a New York Times podcast this month that “the constitutional protections in our military structures depend on the idea that there are humans who would, we hope, disobey illegal orders. With fully autonomous weapons, we don’t necessarily have those protections.” That’s the core of my framework: governance requires that humans can understand and contest what a system does. Strip that away and authority becomes domination regardless of the system’s capabilities.

The gap both sides are missing

Here’s what strikes me about this week’s events. Both Karnofsky and Hegseth are arguing past a problem neither has the tools to address.

Karnofsky’s framework assumes that honest risk assessment plus transparent planning will produce adequate governance. But his own account of the old RSP shows that risk assessment gets distorted under institutional pressure, and the entire history of comprehension decay research shows that AI systems resist the kind of auditability that would make Risk Reports trustworthy at the technical level.

Hegseth’s position assumes that if you strip away company-imposed guardrails and let the military use AI tools freely, the existing chain of command provides sufficient governance. But the Bloomberg piece itself quotes a legal scholar observing that companies like Anthropic are “forced to assert their own AI usage policy” precisely because Congress has failed to stipulate how the Pentagon should think about AI in weapons systems. There’s no regulatory framework. There’s no evaluation infrastructure. There’s just a billionaire and a defense secretary negotiating the terms of military AI deployment in private, with a Friday (today) deadline.

What’s missing in both cases is the thing I’ve been trying to build: concrete, empirical tools for evaluating whether a specific AI deployment meets minimum governance requirements. Not aspirational commitments. Not voluntary transparency. Not brute-force legal threats. Actual measurement infrastructure, the kind that tells you, for a given system in a given deployment context, whether decisions can be traced, contested, audited, and justified.

The Symmetrian Index is a prototype of this. It won’t resolve the Pentagon standoff. But consider what would change if something like it existed at institutional scale. Regulators wouldn’t need to choose between trusting company self-assessment and threatening wartime law. Deployers wouldn’t be guessing about whether their oversight structures actually function. The conversation would shift from “does this company promise to be safe?” to “does this deployment meet measurable governance standards?”

The timeline

Consider what happened in seventeen days:

February 9: The New Yorker publishes a major investigation confirming that Anthropic, the lab most invested in understanding its own model, can’t trace Claude’s reasoning well enough for governance purposes. On the same day, Anthropic’s safeguards research lead resigns, warning about constant internal pressure to compromise values.

February 24: Karnofsky publishes his case for abandoning binding safety commitments. Anthropic releases RSP v3, dropping its commitment to pause development if safety thresholds aren’t met.

February 24: The Pentagon threatens to use the Defense Production Act to force Anthropic to hand over AI tools with no governance constraints for military use, with a Friday deadline.

February 27 (today): Hegseth’s deadline.

That’s the trajectory of comprehension decay moving from research finding to a live governance crisis. From “we can’t fully trace the reasoning” to “the safeguards lead is leaving” to “we’re dropping binding commitments” to “the government is invoking wartime law to strip remaining guardrails from lethal AI deployments.”

Each step follows from the last. Each was predictable from the structural gap between readability and governability.

What I’m working on and where I need help

My earlier post laid out the Symmetrian Index and the comprehension decay finding. This week’s events make the case for building this into real governance infrastructure more urgent than I expected.

I’m working on the framework at Aluna Labs. What I need now is:

Validation: Replicating comprehension decay across more models, larger samples, multiple annotators. The pattern held for GPT-4 and Claude. Does it hold for Gemini, Llama, Mistral? Does it hold under adversarial conditions? With domain-specific prompts?

Regulatory translation: The Pentagon standoff happened because Congress hasn’t produced standards for AI in weapons systems. Karnofsky’s RSP revision happened because Anthropic’s self-imposed standards couldn’t survive institutional pressure. Both point to the same need: governance frameworks that don’t depend on company goodwill or political leverage. I’m developing policy language that operationalizes governability requirements. If you work in AI policy, I want to talk to you!

Deployment case studies: Organizations willing to run their AI systems through the Symmetrian Index and publish results. The framework needs testing in real high-stakes contexts, not just research prompts.

Technical solutions: Can reversibility actually improve through architectural changes, or is it fundamental to how transformers work? I need collaborators from the interpretability and alignment research communities.

The Symmetrian Index methodology and scoring rubrics are available on request. Reach out to me.

The past three weeks have been clarifying. The governance gap I’ve been measuring in AI outputs is now reproducing itself in the institutions built to oversee those outputs. Voluntary commitments erode under commercial pressure. Voluntary commitments erode under political pressure. And when there’s no measurement infrastructure to ground the conversation, the result is what we saw this week: a billionaire and a defense secretary arguing about the rules for military AI with nothing between them but leverage.

We need something between them other than leverage. That’s what I’m trying to build.