Current AI safety evaluation approaches share a common bottleneck: every safety-relevant computation requires at least one LLM inference call. For multi-agent architectures, this creates an O(N) cost and latency problem — a 10-agent safety council using GPT-4 class models costs ~$0.10 per evaluation at ~15 seconds latency. At production scale (10^6 evaluations/day), this becomes $100,000/day in API costs alone.
This post summarizes TSCWH (The System for Covenant-Weighted Heuristics), an architecture that eliminates LLM calls from the safety evaluation pipeline entirely. I’d welcome critical feedback from the alignment community.
---
## The Core Problem
> Safety-critical decisions should not depend on probabilistic language model > outputs. They should be formally verifiable deterministic computations.
The entire field of multi-agent deliberation for alignment (AutoGPT, CrewAI, AutoGen, LangGraph) treats LLM calls as a given. I believe this assumption is worth challenging.
---
## Five Contributions
**1. Zero-copy inter-agent communication** A cache-resident data structure where 10 agents read and write with zero serialization overhead. No JSON-to-object conversion at any agent boundary.
**2. Sub-microsecond ethical evaluation** All eight ethical dimensions (Charity, Grace, Stewardship, Truth, Dignity, Courage, Community, Creation Dignity) reduce to a single CPU operation at hardware speed. Each dimension has its own evaluation logic encoded independently.
**3. Formal runtime verification — in production, not just tests** A formal verification engine proves governance invariants (mercy floor, emergency thresholds, redline rules, stability constraints) on *every evaluation cycle* — not as pre-deployment unit tests, but as production proofs. No agent vote sequence can produce a prohibited decision.
**4. Incentive-compatible probabilistic consensus** Each agent’s vote is modeled as a probability distribution rather than a binary signal. The council aggregates the full posterior distribution, making consensus structurally robust to adversarial manipulation — an agent cannot shift the group estimate without also updating its prediction of what the group believes.
**5. Mathematical alignment attractor** A mathematical stability mechanism where the system’s ethical state space has a structural attractor at full alignment — making misalignment structurally expensive rather than merely rule-forbidden.
---
## The 10-Agent Council
10 specialized evaluators arranged in adversarial pairs with intentional tension-by-design:
| Paired evaluators | Designed tension | |-----------------------|-----------------------------------------------| | Caution ↔ Proactive | Conservatism vs. proactive threat forecasting | | Mercy ↔ Accountability| Leniency vs. rule enforcement | | Logic ↔ Adversarial | Formal reasoning vs. devil’s advocacy |
The remaining four (Synthesis, Memory, Feasibility, Forecast) are unpaired. No single perspective can dominate the final consensus.
---
## Results
On a standard development machine, full 10-agent deliberation completes in **under 50 ms** with **zero API calls**: - **~$0 per safety evaluation** (vs. ~$0.10 for LLM-based equivalents) - **360× latency improvement** over LLM-based 10-agent alternatives - **< 2% false positive rate** on adversarial jailbreak detection - **Formal verification**: governance invariants proven on every cycle
28 integrated safety layers. 42 phases of systematic adversarial hardening.
---
## On Reproducibility
Code is not yet public. A provisional patent is pending (USPTO App# 63⁄998,573, filed March 6, 2026), and I’m navigating the tradeoff between open science and IP protection for independent research with no institutional backing. I acknowledge this limits immediate reproducibility — that’s a real limitation and I’m not asking anyone to take the claims on faith. The preprint is intended to establish priority on the architectural contributions and invite scrutiny of the *approach*, not the implementation.
If you have specific technical questions about any of the five contributions, I’ll answer them as directly as I can within those constraints.
---
## Open Questions
1. Are there alignment failure modes where LLM-based reasoning is necessary and cannot be replaced by formal methods? 2. Under what conditions does adversarial-pair deliberation degenerate to one agent dominating? 3. Does a mathematical alignment attractor property hold under recursive self-improvement assumptions?
Zero-LLM Multi-Agent Architecture for AI Safety Evaluation: 10-Agent Deliberation in Under 50ms
Link post
Current AI safety evaluation approaches share a common bottleneck: every
safety-relevant computation requires at least one LLM inference call. For
multi-agent architectures, this creates an O(N) cost and latency problem — a
10-agent safety council using GPT-4 class models costs ~$0.10 per evaluation
at ~15 seconds latency. At production scale (10^6 evaluations/day), this
becomes $100,000/day in API costs alone.
This post summarizes TSCWH (The System for Covenant-Weighted Heuristics), an
architecture that eliminates LLM calls from the safety evaluation pipeline
entirely. I’d welcome critical feedback from the alignment community.
---
## The Core Problem
> Safety-critical decisions should not depend on probabilistic language model
> outputs. They should be formally verifiable deterministic computations.
The entire field of multi-agent deliberation for alignment (AutoGPT, CrewAI,
AutoGen, LangGraph) treats LLM calls as a given. I believe this assumption is
worth challenging.
---
## Five Contributions
**1. Zero-copy inter-agent communication**
A cache-resident data structure where 10 agents read and write with zero
serialization overhead. No JSON-to-object conversion at any agent boundary.
**2. Sub-microsecond ethical evaluation**
All eight ethical dimensions (Charity, Grace, Stewardship, Truth, Dignity,
Courage, Community, Creation Dignity) reduce to a single CPU operation at
hardware speed. Each dimension has its own evaluation logic encoded
independently.
**3. Formal runtime verification — in production, not just tests**
A formal verification engine proves governance invariants (mercy floor,
emergency thresholds, redline rules, stability constraints) on *every
evaluation cycle* — not as pre-deployment unit tests, but as production
proofs. No agent vote sequence can produce a prohibited decision.
**4. Incentive-compatible probabilistic consensus**
Each agent’s vote is modeled as a probability distribution rather than a
binary signal. The council aggregates the full posterior distribution,
making consensus structurally robust to adversarial manipulation — an agent
cannot shift the group estimate without also updating its prediction of what
the group believes.
**5. Mathematical alignment attractor**
A mathematical stability mechanism where the system’s ethical state space
has a structural attractor at full alignment — making misalignment
structurally expensive rather than merely rule-forbidden.
---
## The 10-Agent Council
10 specialized evaluators arranged in adversarial pairs with
intentional tension-by-design:
| Paired evaluators | Designed tension |
|-----------------------|-----------------------------------------------|
| Caution ↔ Proactive | Conservatism vs. proactive threat forecasting |
| Mercy ↔ Accountability| Leniency vs. rule enforcement |
| Logic ↔ Adversarial | Formal reasoning vs. devil’s advocacy |
The remaining four (Synthesis, Memory, Feasibility, Forecast) are unpaired.
No single perspective can dominate the final consensus.
---
## Results
On a standard development machine, full 10-agent deliberation completes in
**under 50 ms** with **zero API calls**:
- **~$0 per safety evaluation** (vs. ~$0.10 for LLM-based equivalents)
- **360× latency improvement** over LLM-based 10-agent alternatives
- **< 2% false positive rate** on adversarial jailbreak detection
- **Formal verification**: governance invariants proven on every cycle
28 integrated safety layers. 42 phases of systematic adversarial hardening.
---
## On Reproducibility
Code is not yet public. A provisional patent is pending (USPTO App#
63⁄998,573, filed March 6, 2026), and I’m navigating the tradeoff between
open science and IP protection for independent research with no institutional
backing. I acknowledge this limits immediate reproducibility — that’s a real
limitation and I’m not asking anyone to take the claims on faith. The preprint
is intended to establish priority on the architectural contributions and
invite scrutiny of the *approach*, not the implementation.
If you have specific technical questions about any of the five contributions,
I’ll answer them as directly as I can within those constraints.
---
## Open Questions
1. Are there alignment failure modes where LLM-based reasoning is necessary
and cannot be replaced by formal methods?
2. Under what conditions does adversarial-pair deliberation degenerate to one
agent dominating?
3. Does a mathematical alignment attractor property hold under recursive
self-improvement assumptions?
---
Preprint: https://doi.org/10.5281/zenodo.18969708
Contact: ejmariquit@tscwh.org
*Independent research. No institutional affiliation.*