Lineage Tracking

also known as Data Lineage, Artefact Provenance

Track which prompt version, model version, and data sources produced each agent output.

This pattern helps complete certain larger patterns —

used-bySovereign Inference Stack★— Run the entire agent stack (model weights, inference, tool layer, vector stores, logs) inside a jurisdictional and operational boundary the operator controls, so no request, prompt, or output crosses into a third-party API.

Context

A team runs an agent whose outputs may be referenced weeks or months after they were produced — an underwriting decision, a generated contract clause, a research summary cited in another document. Over that time the prompts evolve, the model is upgraded, the tool set changes, and the retrieval index is rebuilt. When a customer or auditor surfaces a specific past output and asks how it was produced, the team needs to be able to answer precisely.

Problem

Without recording which prompt template, which model version, which tool versions, and which retrieved documents produced each output, the team cannot reconstruct what happened six weeks ago. Disputes become unanswerable and rollbacks become guesswork, because there is no record of which combination of ingredients was even live at that time. The team is forced to choose between manual reconstruction from incomplete clues or accepting that the system effectively forgets why it said what it said.

Forces

Lineage metadata adds storage.
Schema evolution of lineage is itself a problem.
PII in lineage records (prompts contain user data).

Example

A customer files a complaint about an agent answer they got six weeks ago. The team has no record of which prompt template was live then, which model version answered, or which retrieved documents were used; reproducing the answer is impossible. They add lineage-tracking: every output is tagged with prompt-template hash, model id and version, tool versions, retrieved-document ids, and the decision-log id, all stored in a queryable lineage table. The next disputed answer is fully reconstructed in minutes and traced to a since-rolled-back prompt change.

Diagram

classDiagram class Output { +id +text } class Lineage { +prompt_template_hash +model_id_version +tool_versions +retrieved_doc_ids +decision_log_id } class LineageStore { +queryable } Output --> Lineage : tagged_with Lineage --> LineageStore : stored_in LineageStore ..> Output : joinable

Solution

Therefore:

Tag every agent output with: prompt template hash, model id and version, tool versions, retrieved-document ids, decision-log id. Store in a queryable lineage store. Make lineage joinable to the output store.

What this pattern forbids. Outputs without lineage tags are not promoted to production storage.

And the patterns that stand alongside it, or against it —

complementsProvenance Ledger★★— Log every agent decision and state change with enough metadata to explain or reverse it later.
complementsCost Observability★★— Surface per-request, per-user, and per-feature cost and token consumption to operators in near-real-time.
complementsReplay / Time-Travel★★— Re-run a past agent trace from any step with modified inputs/prompts/tools to debug or branch.
alternative-toBlack-Box Opaqueness✕— Anti-pattern: ship an agent without traces, decision logs, or provenance, then debug from user reports.
alternative-toHidden Mode Switching✕— Anti-pattern: silently swap the underlying model between requests without disclosing the change to users or operators.
composes-withPrompt Versioning★★— Treat prompts as immutable, hashed, semver'd artefacts in a registry; deploy and roll back like code.
complementsAttention-Manipulation Explainability·— Surface which input tokens caused a given output by perturbing attention across all transformer layers and measuring the resulting change in output probability, producing a per-token relevance map alongside the model's response.
complementsPostmortem Pattern Mining★— Mine a corpus of thousands of written postmortems through a staged model pipeline that summarises, classifies, analyses, and aggregates so that recurring incident causes surface as one short report.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

Used in frameworks

References

NIST AI Risk Management Framework
spec

Provenance

Source: patterns/lineage-tracking.md on GitHub · commit 4fa1213 · view history
Added to catalog: 2026-04-30
Last updated: 2026-05-21
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.