Hallucination Transparency Report

How BenchSlap prevents language-model-generated legal misinformation

Data as of April 2026

Verification Corpus

Utah appellate opinions (1996-present)28,541

Utah rules and statutes indexed22,895

Authorities cryptographically pinned (SHA-256)35,397

Case dispositions pre-extracted8,968

Overruled/superseded cases flagged129

Treatment graph edges238

AEGIS PRIME Verification Gates

Layer 1: Citation existence checkActive

Layer 2: Holding text containment (SHA-256 + shingles)Active

Layer 3: Structural fact verification (disposition, panel, binding weight)Active

Verification time per citation< 5ms

Gate bypass possibleNo

What Gets Blocked

Fabricated case citationHARD BLOCK

Wrong disposition (e.g., "affirmed" when reversed)HARD BLOCK

Dissent quoted as majority opinionHARD BLOCK

Overruled case cited as current lawHARD BLOCK

Dicta presented as binding holdingHARD BLOCK

Holding not found in opinion textSoft warning

How It Works

Every citation passes through three independent verification layers before reaching the user.

Layer 1 — Does the citation exist? Checked against our local database of 28,541 Utah opinions (794,036 total across 101 courts), then against utcourts.gov, CourtListener, and Caselaw Access Project. If the case doesn't exist, the draft is blocked.

Layer 2 — Does the holding match? The engine's claimed holding is checked for textual containment against the actual stored opinion text. The authority is cryptographically pinned with SHA-256 — if anyone modifies the stored text, the hash mismatch is detected.

Layer 3 — Do the structural facts match? Pre-extracted facts (disposition, panel composition, binding weight, treatment history) are compared against the engine's claims. "The court affirmed" is checked against a stored disposition enum. This is a database lookup, not a model judgment. It takes less than 5 milliseconds. It is deterministic.

The gates cannot be bypassed. There is no skip button. There is no admin override. There is no environment variable that turns verification off. 77 automated tests guard the enforcement permanently.

Comparison

ChatGPT hallucination rate (Stanford 2023)GPT-3.5: 69% / GPT-4: 36%

BenchSlap hallucination rate (Utah corpus)Structurally impossible

How?Closed corpus + deterministic verification

See it for yourself.

Try the Live Demo

No account required. Ask any legal question.