Verify-Before-Cite Resolution Gate
also known as Citation Resolution Gate, Authority Existence Check
After generation, resolve every cited authority against an external ground-truth registry and strip or block any citation that does not exist before the answer reaches the reader.
Context
A research, legal, or medical assistant produces answers that cite external authorities — case names, docket numbers, statutes, papers, digital object identifiers — and those citations carry weight the reader will act on. The system may already retrieve documents, but the model can still write a citation that was never retrieved, paraphrase a real case under a wrong number, or invent a plausible authority outright. The authorities being cited live in external indexes the model does not control, such as Westlaw, LexisNexis, CourtListener, or PubMed.
Problem
Language models are fluent at producing authoritative-looking references that do not resolve to anything real, and the fabrication looks correct until a reader checks it. In regulated domains a single non-existent citation that ships can trigger sanctions, retractions, or lasting loss of trust. Binding a citation to a retrieved chunk is not enough, because the cited authority may sit outside the retrieval set entirely, and the model's own confidence in the citation is uncorrelated with whether the authority exists. The system needs an external, deterministic existence check that runs on the finished output rather than trusting the text as written.
Forces
- A citation that resolves inside a closed retrieval registry can still name an authority that does not exist in the wider world the registry does not cover.
- Resolving against an external authority index is a deterministic lookup, but the index has rate limits, latency, and coverage gaps that a citation gate must tolerate.
- Stripping a non-resolving citation protects the reader but can leave a claim unsupported, forcing a choice between a weaker answer and a blocked one.
Example
A legal-research assistant drafts a memo that cites four cases and two statutes. Before the memo renders, a resolution gate parses out each citation and queries CourtListener and a statute index. Three cases resolve and are annotated with links; the fourth — a real-sounding name with a docket number that matches no court record — fails to resolve, so the gate strips it and flags the supporting claim as unsourced. One statute resolves but to a repealed version, and the gate marks it for review. The reader never sees an authority that does not exist.
Diagram
Solution
Therefore:
Run a deterministic post-generation stage between the model and the reader. First parse the structured citations out of the answer: case names and docket numbers, statute identifiers, paper titles, digital object identifiers. For each, query an external authority index — a legal database such as Westlaw, LexisNexis, or CourtListener, or a medical index such as PubMed — and require an exact match on the load-bearing fields (the docket number, the jurisdiction, the title verbatim). A citation that resolves is kept and annotated with the resolved record. A citation that fails to resolve — fabricated, repealed, wrong jurisdiction, non-existent docket — is stripped from the output, flagged for review, or, in the strictest setting, blocks delivery of the whole answer until a human or a regeneration pass repairs it. The gate is deterministic and runs on every output, so the model cannot smuggle an invented authority past it regardless of how confident the prose sounds.
What this pattern forbids. A citation may not appear in the output until it resolves against the external authority index; non-resolving citations are stripped, flagged, or block delivery, and a citation that cannot be checked against any registry must not be presented as verified.
And the patterns that stand alongside it, or against it —
- alternative-toHallucinated Citations✕— Anti-pattern: let the model emit citations as free text and trust them.
- complementsCitation Attribution★★— Track and surface, alongside a RAG-grounded answer, which retrieved chunks supported which claims, so the binding between answer span and source survives all the way to the user.
- complementsCanonical-Entity Grounding★— Require the agent to resolve every business identifier it uses — SKU, account, supplier, customer — through an authoritative lookup against the system of record, rather than emitting the identifier from the model's parametric memory.
- complementsDeterministic-LLM Sandwich★— Bracket every LLM call with deterministic checks on both sides.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.