How AI Hallucinates Citations — and How To Stop It

By Richard L. Sanders, attorney, Utah Bar No. 15728. Built BenchSlap because, after the fourth time I caught my AI tool inventing a case, I stopped believing the verification was the AI's problem to solve.

In June 2023, a Manhattan federal judge ordered two attorneys to show cause why they should not be sanctioned. They had filed a personal-injury opposition brief on behalf of one Roberto Mata, and Avianca Airlines' counsel had done what any opposing counsel does: pulled the cited cases. Six of them weren't there. Not "couldn't be located on Westlaw." Not "different docket number than reported." Not "decided but unpublished." Did not exist.

The attorneys, Steven Schwartz and Peter LoDuca, had asked ChatGPT to find supporting authority for a tolling argument. ChatGPT produced names: Varghese v. China Southern Airlines, Shaboon v. EgyptAir, Petersen v. Iran Air. It produced citations: 925 F.3d 1339, 2013 IL App (1st) 111279, 905 F.2d 906. It produced quoted holdings, paragraph-numbered pinpoint cites, plausible-sounding procedural postures. Every single one was a confabulation. The case names sounded real because the AI had seen ten thousand real airline-tort case names; the citations looked real because the AI had seen the format of ten million real reporter citations; the holdings sounded real because the AI is, fundamentally, a confidence-shaped pattern-completion engine that does not know what is and isn't.

I was a solo practitioner watching the Mata v. Avianca opinion drop in real time. My first thought was the obvious one — that's not me, I would never. My second thought, after I sat with it for an hour, was: I would have. The brief format was right. The fact pattern was right. The cited holdings aligned with my argument. If I were on deadline and the AI returned that block of authority, would I really have pulled every citation? Every one? After the fourth one checked out?

The honest answer was no. And the dishonest answer is what now gets attorneys sanctioned in at least eight federal districts and a growing list of state courts.

What the rules now require

The American Bar Association published Formal Opinion 512 in July 2024, addressing generative AI in legal practice directly. The opinion is short, surprisingly readable, and severe in its conclusion. The duty of competence under Model Rule 1.1 — the rule mirrored in every state's professional conduct code, including Utah's URPC 1.1 — requires that lawyers using generative AI understand its limitations and verify the accuracy of its work product before relying on it.

Rule 3.3 (Candor Toward the Tribunal) is the harder one. A lawyer shall not knowingly make a false statement of fact or law. The cases applying this rule after Mata have not been gentle on the "I didn't know" defense. Several have applied a constructive-knowledge standard: if a reasonable attorney would have caught the fabrication through standard verification, the failure to verify is itself a violation. One court bluntly put it: citing a case is a representation that the case exists.

Rule 8.4 (Misconduct), Rule 1.3 (Diligence), and your state's equivalents complete the matrix. The practical effect is that AI-assisted legal work without independent verification is not just bad lawyering — it's a sanctionable ethics violation that the bench is now actively looking for.

Why this keeps happening

The temptation is to attribute these incidents to lazy attorneys, and that's not wrong. But it's incomplete. The deeper problem is that the AI tools marketed to lawyers are built on an architecture that structurally invites the failure mode. Understanding the architecture is the prerequisite to understanding why a verification layer bolted on after the fact will never fully solve it.

Every large language model marketed for legal research today follows the same basic shape:

The model is trained on a vast corpus of text scraped from the internet, including some quantity of legal text.
At query time, the model is given the user's question and asked to generate a response.
The model produces tokens one at a time, each one chosen because it is statistically likely to follow the prior tokens.
Citations emerge from this token-by-token process. The model is not looking up a citation; it is generating a citation that resembles real citations the model has seen.

This is what hallucination is, mechanically. The model isn't lying; it has no concept of true or false. It is producing the citation-shaped continuation that the surrounding text most plausibly invites. Sometimes that produces a real citation. Sometimes it produces a citation that sounds exactly like a real citation but is the joint output of a thousand half-remembered training-data fragments. The model has no internal mechanism to distinguish between the two.

The standard response is retrieval-augmented generation: bolt on a retrieval system that pulls real authority from a database, then ask the model to write the brief using that retrieved authority. This helps. It does not solve. The model is still emitting citations token by token; it can still drift mid-sentence; it can still mis-attribute a quote to the wrong opinion in the same paragraph; it can still describe a real case's holding exactly backwards. Varghese v. China Southern Airlines didn't exist because the model invented it. The next class of failure is what happens when the model cites a real case and gets the disposition exactly wrong.

The fundamental architectural problem is that verification happens after generation. By the time a second model is asked "did this first model hallucinate?", the damage is partway done. The verifying model often shares failure modes with the generating model. Even when verification catches a fabrication, the user has already seen the confident output and decided whether to file. Verification as an afterthought is verification that occasionally fails.

What a different architecture looks like

I started writing BenchSlap when I realized the verification had to happen before the citation ever reached me. Not as a check; as a constraint. The system has to be designed so that emitting a citation that doesn't exist is structurally impossible — not flagged, not warned, not "low confidence," but impossible at the kernel level.

The architecture I built has a name now: AEGIS. It has two halves.

The first half is content-hash pinning. Every authority in the corpus — opinions, statutes, court rules, all of it — gets a SHA-256 hash of its canonical text at ingest time, plus a 5-gram shingle signature for fuzzy matching. The corpus today holds 3,222,931 opinions and 1,931,971 rules, 100% pinned. When the AI emits a citation that purports to come from this corpus, the verification gate recomputes the hash from the row stored in the database. If the hash doesn't match, the row was tampered with and the filing hard-blocks. If the AI's quoted text doesn't appear in the row's content (verified by substring match or 5-gram shingle containment ≥ 0.7), the gate flags the citation as unverified.

The second half is structural verification — what I call AEGIS PRIME. Every Utah opinion (and we're working outward) is pre-extracted at ingest time for its disposition (AFFIRMED, REVERSED, VACATED, etc.), its panel vote (which judges signed which opinion), its holding's binding weight (BINDING, PERSUASIVE, DICTA), and its treatment graph (which later cases overruled or distinguished it). When the AI emits a claim like "the court affirmed the conviction in Smith v. Smith", the gate doesn't ask another AI to verify. It does a single PostgreSQL SELECT on the disposition column. If the stored disposition is REVERSED, the claim hard-blocks. No second-LLM call. No verifier-shares-the-same-failure-mode problem. Deterministic set membership.

The architectural commitment that enables this is mundane: the corpus is closed, the facts are pre-extracted, and the verification path is a database lookup rather than a model call. A vendor whose architecture is generate-then-verify cannot retrofit this without rebuilding from scratch. The corpus has to be under their control. The facts have to be pre-extracted on ingest. The verification gate has to be wired before the user sees the output. These are not features that bolt on; they are commitments that have to be present at the foundation.

What this looks like in practice

If I draft an opposition motion using BenchSlap and the AI returns:

"As the Tenth Circuit observed in Varghese v. China Southern Airlines, 925 F.3d 1339, 1342 (10th Cir. 2019), 'the doctrine of equitable tolling …'"

The closed-corpus verification gate fires. Varghese v. China Southern Airlines is not in the opinion library. The closed corpus does not contain the case. The output is rewritten before it reaches my screen, the citation is removed, and the AI is forced to find authority that does exist or to acknowledge that it doesn't have what it claimed.

If the AI returns a real case but gets the disposition backwards — "In Smith v. Smith, the Utah Supreme Court affirmed the conviction" when in fact the court reversed — AEGIS PRIME's disposition check fires. The stored disposition is REVERSED. The output is rewritten to say "reversed." The user sees the correct fact.

If the AI cites a case that has been overruled, the treatment graph fires. superseded_by_opinion_id is non-null. The brief gets a banner telling me the cited case is no longer good law and the overruling case is the one to cite.

None of this requires me to remember to verify. None of it requires me to ask a second AI. None of it requires me to trust the model. The architecture enforces it.

The bench's question, answered

The reason this matters operationally is the question I now expect every attorney to be asked when they file an AI-assisted brief: "Counsel, what verification did you perform on the citations in this filing?"

The honest answer for most generate-then-verify pipelines is some variation of "I trusted the tool." That answer is no longer survivable in front of an informed judge. The honest answer using BenchSlap is concrete:

Every citation was checked against a 3.2-million-opinion closed corpus before the brief was finalized.
Every authority in the corpus is hash-pinned with a SHA-256 of its canonical text; the hash was recomputed on the verify pass; the mismatched-hash hard-block did not fire.
Every cited disposition was checked against the pre-extracted structured fact (disposition / panel / binding weight / treatment graph). The mismatched-disposition hard-block did not fire.
The verification record is available as an HMAC-signed JSON certificate from /api/verify-certificate, downloadable by the attorney at filing time and independently re-verifiable by the court.

That answer is reviewable. The court can independently re-verify the citations using the public endpoint at /api/demo/cite-check. The corpus is open-source and reproducible. The verification kernel (lib/authority-hash.js, lib/verification-pipeline.js) is auditable. There is a paper trail that a competent attorney can produce on cross-examination.

What the closed-corpus moat actually protects

I get asked, often, why I built this as a solo. The reason is that the failure mode the platform exists to prevent is the exact one that ends careers. Once an attorney has been sanctioned for a fabricated citation, that sanction is reported, published, indexed, and citable for the rest of their career. The bar disciplinary record follows them. The opposing counsel in their next deposition will know. The judge in their next motion hearing will know. The local bar association will know.

And the alternative — the $500-a-month Westlaw subscription that solos can't afford, the $5,000-to-$20,000-per-year that small firms can barely afford — isn't actually safer if the underlying problem is "the attorney used AI to draft." A bad citation that came out of Westlaw via an associate is still a bad citation when the brief gets filed. The verification has to live in the workflow, not in the price tag.

So the work I'm doing is for the solo attorney who can't afford to be sanctioned for a hallucination, and for the pro se litigant who never had the option of paying $500 a month for Westlaw and who needs defensible legal output to fight an eviction or a custody motion against an opposing counsel with a research budget. The architecture has to make hallucination structurally impossible, because the user can't afford to verify it after the fact.

AEGIS is the visible layer. Here's what sits underneath.

The picture above — closed corpus, hash pinning, structural verification — is the layer most easily described. It is not the whole architecture. AEGIS is one of five overlapping defenses, each of which would, on its own, eventually let something through, and each of which catches what the others miss. The companion whitepaper covers all five in detail with file-and-line citations and 323 of 323 tests passing; if you want the inside-the-machine view, start there.

The short version: (1) the retrieval path is deterministic, not RAG — verification is a database lookup, not a model call, so the model cannot invent a citation that the database doesn't have; (2) the database itself refuses to store malformed authorities, with twelve CHECK constraints and an idempotent hygiene trigger at the storage boundary; (3) every argument decomposes into a (Law × Fact × Logic) tuple that resolves against a thirty-state truth table whose default is CRITICAL_FAIL — the system can only emit a verified argument by matching one of the non-BLACK states, never by falling through; (4) AEGIS does content-hash pinning and structural verification, as described above; (5) the post-stream gate's final disposition is assume hallucination unless deterministic proof grants VERIFIED, and any exception anywhere in the chain becomes a hard block, not a pass-through. Most software defaults to PASS when the verifier is unavailable. For citation verification, the inverse is the only defensible default. Asymmetric stakes demand asymmetric defaults.

The full architecture, with each claim cited to source file and line number and a one-line reproducer for the test suite: benchslap.pro/journal/defense-in-depth.

What's next

I'm going to keep building. The corpus is at 3.2 million opinions and growing daily. The structural-verification layer is extending to every state's appellate dispositions. The closed-corpus verification gate fires on every Drafter, Auditor, and Strategist output. The verification endpoint at benchslap.pro/verify is free, no signup, no card — paste any citation and find out whether it exists.

If you've been burned by a fabricated citation, or you've been close enough to a colleague who has, you know what this product exists to prevent. If you haven't, the next reported sanctions opinion is coming, and the question you should ask yourself is what your answer is going to be when the bench asks you the question.

Try it free

Paste any citation and find out instantly whether it's real, fabricated, or overruled. No signup. No card. No data retention. The verification record is yours.

Verify a citation → Solo-attorney plan · $59/mo

How AI hallucinates citations — and how to stop it.