Decision Log

also known as Reasoning Trace, Thought Trace

Persist the agent's reasoning trace alongside its actions so post-hoc review can explain why.

This pattern helps complete certain larger patterns —

used-byReplay / Time-Travel★★— Re-run a past agent trace from any step with modified inputs/prompts/tools to debug or branch.
used-byAgent-as-a-Judge★— Evaluate an agent's full trajectory (steps, tool calls, intermediate states) by another agent rather than scoring only the final output.
used-bySampled Prompt Trace Eval★— Capture full prompt/response/metadata traces from production into a monitoring dataset, but only run LLM-judge evaluation on a random sample so monitoring cost stays bounded as traffic grows.

Context

A team runs an agent that makes consequential choices in production, for example a trading agent that opens positions or a support agent that takes refund actions. When something goes wrong days or weeks later, an engineer, auditor, or compliance reviewer wants to understand not only which action the agent took but the reasoning the agent considered at the time. The team already keeps a log of actions taken; what is missing is the thinking that produced each action.

Problem

An action-only log can tell the reviewer that the agent shorted a position at 14:32, but not which signals it weighed or which alternatives it rejected. Debugging a wrong action degenerates into guessing what the model might have been thinking, and user-facing explanations become impossible to provide truthfully. The team is forced to choose between piecing the reasoning back together from incomplete clues or accepting that some agent decisions are simply unexplainable after the fact.

Forces

Reasoning traces are large.
Sensitive content in reasoning may need redaction.
Trace fidelity vs cost: full chain-of-thought, key decisions, summary?

Example

A trading agent decided to short a position at 14:32. At 16:00, the trade lost money. The decision log shows: at 14:32 the agent considered three signals (RSI was low, volume spiked, news sentiment was negative), weighted them, and chose short. The human reviewer can now ask 'was the weighting wrong?' instead of 'what was the agent thinking?'

Diagram

flowchart TD A[Agent action] --> P[(Provenance ledger)] A --> R[Reasoning trace] R --> L[(Decision log<br/>indexed by request id + time)] P -->|link| L Rev[Post-hoc reviewer] -->|query| L

Solution

Therefore:

Persist reasoning at a chosen granularity (full trace, key decisions, or summary). Link each action in the provenance ledger to its trace. Indexed by request id and time for retrieval.

What this pattern forbids. Action records cannot be written without a corresponding decision-log entry.

The smaller patterns that complete this one —

generalisesProvenance Ledger★★— Log every agent decision and state change with enough metadata to explain or reverse it later.
usesAppend-Only Thought Stream★— Make the agent's thought log append-only so the agent cannot rewrite its own history.

And the patterns that stand alongside it, or against it —

alternative-toBlack-Box Opaqueness✕— Anti-pattern: ship an agent without traces, decision logs, or provenance, then debug from user reports.
complementsAttention-Manipulation Explainability·— Surface which input tokens caused a given output by perturbing attention across all transformer layers and measuring the resulting change in output probability, producing a per-token relevance map alongside the model's response.
complementsSelf-Archaeology·— Synthesize the agent's past thought history into time-layered trajectory notes so it can articulate how its understanding evolved without recomputing the narrative each time.
complementsMemo-As-Source Confusion✕— Anti-pattern: the agent cites its own past memos as ground truth instead of re-verifying them against the artifacts they describe, accumulating false confidence in stale summaries.
complementsInterrupt-Resumable Thought·— Preserve multi-step reasoning across interrupts by supporting paused-and-resumed thought frames so a new message handles cleanly without clobbering in-flight work.
complementsIntra-Agent Memo Scheduling★— Let an agent drop a note for its own future self at a specified time so present decisions can hand off context to a later run without external infrastructure.
complementsEcho Recognition·— Recognize human message repetition as emphasis or a re-ask rather than as an independent input, so the agent does not produce a near-duplicate reply when the human repeats themselves.
alternative-toErrors Swept Under the Rug✕— Anti-pattern: scrub failed actions, stack traces, and error observations from the agent's own context so the trace looks clean, leaving the model with no evidence of what did not work.
complementsTyped Refusal Codes★— Define a single source of truth for machine-readable refusal codes across all guard surfaces, so refusals can be triaged mechanically rather than by string-grepping ad-hoc human-readable messages.
complementsCommitment Tracking·— Extract stated intents from each agent turn into a structured ledger with open / followed-through / expired status, making the gap between promise and follow-through visible and auditable.
alternative-toAgentic Skill Atrophy✕— Anti-pattern: let agents take over routine architectural and debugging decisions in code until developers no longer form the implicit knowledge that lets them review the agent's output or recover when it fails.
alternative-toAgentic Debt✕— Anti-pattern: deploy agents on top of an unconsolidated data foundation, weak governance, or missing MLOps infrastructure, so every subsequent capability — observability, retraining, compliance retrofit — pays compounding interest on the skipped foundational work.
complementsRigor Relocation★— Relocate verification rigor from the model loop to surrounding scaffolding (evals, judges, decision logs, policy gates) so failures are caught by the wrapper rather than the agent.
complementsSynchronous Execution-Plan Confirmation★— Agent synchronously emits its full execution plan for user confirmation before any side-effect step, and provides asynchronous operation recordings for post-hoc review.
complementsPolicy-Gated Agent Action (KRITIS)★— Each agent action passes through a policy gate (NIS2, EU AI Act, BSI rules) and is tagged with Run ID + Model Digest + Policy Hash for WORM-audit reconstruction.
complementsDecision Context Maps★— Before any consequential decision, require the agent to gather a declared set of contextual inputs (resource availability, schedules, downstream dependencies) into a 'context map' the decision must cite.
complementsAgent Middleware Chain★— Wrap every model call, tool call, and memory access in a composable pre/execute/post interceptor pipeline so cross-cutting concerns attach without touching agent or orchestrator code.
complementsMulti-Principal Welfare Aggregation·— When an agent serves multiple humans with conflicting preferences, declare the aggregation rule explicitly rather than letting it be implicit in the prompt or fine-tune.
complementsDecision Token★— Mint a self-contained record at the moment a consequential action executes, bundling the rule that fired, the exact data read, the conclusion reached, and the authorizing identity.
complementsRe-Proposing Rejected Decisions✕— Anti-pattern: a stateless agent sees the code but not the decision history, so it keeps proposing options already considered and rejected, forcing reviewers to relitigate settled choices turn after turn.
complementsPostmortem Pattern Mining★— Mine a corpus of thousands of written postmortems through a staged model pipeline that summarises, classifies, analyses, and aggregates so that recurring incident causes surface as one short report.

Decision Log

Context

Problem

Forces

Example

Diagram

Solution

Neighbourhood

Used in recipes

Used in frameworks

References

Provenance