XII · Streaming & UXMature★★

Citation Streaming

also known as Inline Citations, Source-Anchored Output

Stream citations alongside generated text so the UI can render source links in place as content appears.

This pattern helps complete certain larger patterns —

  • specialisesStreaming Typed Events★★Push partial results to the client as typed events as they become available, rather than waiting for the full response.

Context

A team is building a retrieval-augmented agent — Retrieval-Augmented Generation, where the model answers from a set of documents pulled in at query time — and the user needs to see which source each claim came from. The answer streams to the user token by token so the interface feels responsive. The team has to decide when and how the citations should appear alongside the streaming text.

Problem

Two obvious choices both fail. Generating the answer first and the citation list afterwards hides every source until the streaming finishes, which defeats the responsiveness the streaming was meant to deliver and trains users to wait for the end before they trust anything. Asking the model to weave citation markers into its prose and hoping it does so consistently is unreliable: marker formats drift, citations attach to the wrong span, and a free-form text channel cannot tell the user-interface code which characters are a citation and which are prose.

Forces

  • Citation events must align with generated tokens.
  • Source spans need stable ids.
  • UI needs to render mid-stream without flickering.

Example

A medical-information agent answers 'what are the side effects of metformin?' As the answer streams to the user, each clinical claim arrives with a citation pointing back to the exact paragraph in the prescribing-information PDF. The user can click any sentence to verify the source — they don't have to trust the model alone.

Diagram

Solution

Therefore:

Define a streaming event vocabulary that includes citation events linked to source ids. The model is prompted to emit citation markers; the host extracts them into typed events alongside text deltas. The UI renders sources progressively. Final output includes a citation map.

What this pattern forbids. Source claims in the output must reference a citation event with a valid source id.

And the patterns that stand alongside it, or against it —

  • complementsNaive RAG★★Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
  • alternative-toHallucinated CitationsAnti-pattern: let the model emit citations as free text and trust them.
  • alternative-toAttention-Manipulation Explainability·Surface which input tokens caused a given output by perturbing attention across all transformer layers and measuring the resulting change in output probability, producing a per-token relevance map alongside the model's response.
  • complementsCitation Attribution★★Track and surface, alongside a RAG-grounded answer, which retrieved chunks supported which claims, so the binding between answer span and source survives all the way to the user.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.