Prompt Versioning
also known as Prompt-as-Artifact, Prompt Registry, Versioned Prompts
Treat prompts as immutable, hashed, semver'd artefacts in a registry; deploy and roll back like code.
Context
A team runs an agent where the system prompt and task prompts are major levers on quality. Multiple engineers edit those prompts, sometimes inline in code, sometimes through a prompt-management tool. The team needs to know exactly which prompt text was live at any given time, to be able to roll back a bad prompt cleanly, and to tie evaluation results to the specific prompt being scored.
Problem
When prompts live as plain strings inside the application code, a wording change becomes a code change: rolling back the prompt requires reverting a deployment, comparing two prompt versions side by side requires diffing branches, and there is no clean way to say which prompt produced last week's outputs. Evaluation runs cannot be tied back to specific prompt text once that text has been edited in place. The team is forced to choose between treating every prompt edit as a full code release or losing the ability to audit and revert prompts precisely.
Forces
- Registry adds infrastructure.
- Prompt versioning must integrate with eval harness.
- Signed prompts vs editable prompts.
Example
A team rolls a small wording change into a prompt at 14:00 and by 16:00 the agent's behaviour has shifted in ways nobody predicted. There is no clean rollback short of redeploying the entire service from a prior commit. They adopt prompt-versioning: prompts live in a registry as immutable, hashed, semver-tagged artefacts; code references them by name plus version; deployments pin a specific version; rollback is a one-line config change. Eval-harness metrics tie to prompt versions. The next bad-prompt incident is reverted in under a minute.
Diagram
Solution
Therefore:
Prompts live in a registry as immutable, hashed, version-tagged artefacts. Code references prompts by name + version (semver). Deployments pin specific versions; rollback by version. Eval harness ties metric outcomes to prompt versions. Optionally signed for provenance.
What this pattern forbids. Production calls reference pinned prompt versions only; ad-hoc inline prompts are forbidden.
The smaller patterns that complete this one —
- usesEval as Contract★★— Treat the eval suite as the contract the agent must satisfy; releases ship only if evals pass.
- generalisesOwn Your Prompts (12-Factor Agents)★— Every prompt in a production agent is versioned, tested, and owned by the team in the application repo — never inherited as a framework default.
And the patterns that stand alongside it, or against it —
- composes-withLineage Tracking★★— Track which prompt version, model version, and data sources produced each agent output.
- complementsShadow Canary★★— Run a candidate agent version in shadow alongside the champion, comparing outputs without affecting users.
- complementsPrompt/Response Optimiser★★— At runtime, transform user inputs and model outputs into standardised, template-aligned prompts and responses against predefined constraints, so the agent and its downstream consumers see consistent shapes.
- complementsAgentic Context Engineering Playbook·— Treat the agent's system prompt and long-lived memory as a structured, item-addressable playbook that evolves through small delta updates from a Generator/Reflector/Curator loop, so accumulated tactics resist the context collapse that monolithic rewrites cause.
- complementsPrompt Variant Evaluation★★— Author multiple variants of the same prompt node, run them as a batch against a shared dataset, and let an automated evaluation flow score them so the winning variant is selected by measurement.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.