Replay Divergence

also known as Replay-Time Output Drift, Non-Deterministic Event Replay

Anti-pattern: treat an append-only event log whose consumers are LLMs as deterministically replayable, so replaying it under a changed model or prompt reconstructs different downstream events than the original run.

Context

A system records agent activity as an append-only event log and treats replay as a first-class capability — to recover state after a crash, to re-derive an audit trail, to branch a past run for debugging, or to reprocess history under an upgraded model. Event sourcing's contract is that replaying the log reconstructs the same state, and the team relies on that determinism. Some consumers of the log are LLM calls.

Problem

An LLM call is not a pure function of its inputs: the same event replayed under a newer model version, a changed prompt template, or even nominally identical sampling settings can emit a different downstream event than the first run produced. When the replayed output feeds the next step, the divergence compounds — a tool is called with arguments the original never generated, a branch is taken that never happened, and the reconstructed state no longer matches what actually occurred. Nothing errors, because each replayed call is individually well-formed, so the log silently stops being a faithful record. Recovery then restores a state the system was never in, an audit replay yields a different decision than the customer received, and a debugging branch diverges from the very trace it was meant to reproduce.

Forces

Event sourcing and durable execution assume replay is deterministic, but an LLM consumer breaks that assumption the moment the model or prompt changes.
Replaying to re-derive under a new model is sometimes the goal, so journaling the original output defeats that purpose and cannot be the only answer.
Each replayed call is individually valid, so the divergence raises no error and surfaces only as corrupted downstream state.
Pinning the model and every sampling input keeps replay faithful but freezes the system on an old model and grows the journal without bound.

Example

A support-automation team event-sources every agent run so they can replay logs to recover state after a crash. After upgrading the underlying model, an outage forces a replay of the day's log, and where the old model had routed a refund to manual review the new model approves it outright. The replay reconstructs a state in which refunds were issued that never were, each replayed step looks valid, and nobody notices until the books fail to reconcile.

Diagram

flowchart TD L[Append-only event log with LLM consumers] --> R[Replay under changed model or prompt] R --> E[Different downstream events than the original run] E --> S[Reconstructed state != what actually happened] L -.journal outputs + version-stamp.-> F[Faithful recovery; changed-model replay diffed, not trusted]

Solution

Therefore:

Separate the two reasons to replay and handle each explicitly. For faithful recovery and audit, record each non-deterministic step's output on first execution and replay the recorded value instead of re-invoking the model, and stamp every event with the model version and prompt hash that produced it. For deliberate re-derivation under a new model, treat the replay as a fresh run rather than a reconstruction: diff its events against the original, surface every divergence, and gate any changed decision behind review. Measure how reproducible the agent actually is and require the strictest determinism tier for events that drive regulated or irreversible actions. Never let a replay whose model or prompt has changed overwrite recovered state as if it were the original.

What this pattern forbids. An LLM-consumed event log must not be assumed to replay deterministically; replay for recovery may not re-invoke the model but must use journaled outputs, and a replay whose model or prompt has changed cannot overwrite reconstructed state as if it were the original run.

The patterns that counter or replace it —

complementsJournaled LLM Call★— Record the output of every non-deterministic step on first execution and replay that recorded value during crash-recovery instead of re-invoking the model.
complementsDeterminism-Tiered Replay Gate·— Classify an agent into a reproducibility tier by re-running identical inputs, require the strictest decision-determinism tier for regulated decisions, and gate deployment and validation-sample size on the measured tier.
complementsReplay / Time-Travel★★— Re-run a past agent trace from any step with modified inputs/prompts/tools to debug or branch.
complementsConfident Inconsistency✕— Anti-pattern: in a regulated workflow the same query produces materially different outputs at different times, each looking correct and passing review, so the variance stays invisible unless outputs are deliberately re-run and compared across time.
complementsStochastic-Deterministic Boundary (SDB)★— Formalize the seam between an LLM proposal and a system action as a four-part contract — proposer, verifier, commit step, reject signal — so the contract itself, not the agent's good intent, gates side-effects.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents
paper

Provenance

Source: patterns/replay-divergence.md on GitHub · commit e18c94c · view history
Added to catalog: 2026-06-19
Last updated: 2026-06-19
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.