Great article! The point about measurement enabling governance is right (100% agree); we can’t regulate what we can’t measure. I’ve been thinking/working on a related gap.
I’ve been working on measuring not just AI behavior, but whether AI systems are actually governable. By governable, I mean: can institutions trace how a decision was made, contest the reasoning, and audit it after the fact?
I ran evaluations on GPT-4 and Claude Opus and found a pattern I’ve started calling “comprehension decay.” Both models score nearly perfect on comprehensibility = clear, readable outputs. But both score very low on reversibility = almost no mechanisms to trace reasoning, contest decisions, or audit what happened.
The outputs are readable. The reasoning is unverifiable.
This matters for the measurement-enables-governance thesis. Even if we build great behavioral benchmarks (sycophancy, deception, etc.), we face a deeper problem: when you ask an LLM (using this system as an example, but it applies to all) why it said something, the answer is just more generated text. There’s no trace of what actually happened inside the model. You can’t verify it, contest it, or audit against it.
So I’d add something to the framework: we need measurement of governability itself, not just behavior. Can decisions be traced to computations? Can those traces be challenged? Are there audit hooks?
Without this, one of the biggest risks is building infrastructure that creates the appearance of accountability without the substance of it; we have fluent explanations that don’t connect to anything you can actually govern.
I posted last week about if you want to check it out in more detail!
I completely agree. That’s the core issue I’m trying to untangle. Commitments erode under pressure; we saw it this week in action and quickly going downhill, and I don’t think it will be the first or last time, given the stakes. But for me, the deeper issue is that even when commitments hold (perfect world), we don’t have tools to verify whether they’re actually being met.
Look at Karnofsky’s post. He’s remarkably blunt about how Anthropic’s own RSP created pressure to declare systems below capability thresholds to avoid triggering pause requirements. The commitments existed on paper (good). The institutional incentives worked against honest evaluation (good). If they couldn’t even verify whether their own commitments were being met honestly, that tells you the verification tools don’t exist yet (failed).
On leverage, I think the answer isn’t promises but actual infrastructure that makes it costly to lie. Take for examole how financial markets work and regulate themselves. They don’t rely on companies’ goodwill. They rely on auditing standards and independent verification. AI governance has none of that. The Symmetrian Index is a first step toward building that auditing layer.