Reasoning Trace Carry-Forward
also known as Reasoning Content Episode, CoT Carry Across Tool Calls, Episode-Bound Reasoning
For reasoning models that emit a separate reasoning trace, preserve that trace in context across the same logical task episode (across tool-call/result turns) but drop it at user-turn boundaries.
This pattern helps complete certain larger patterns —
- specialisesShort-Term Thread Memory★★— Carry the relevant slice of conversation context across turns within a session.
Context
A team is using a reasoning-capable model (for example one of the OpenAI o-series, Claude with extended thinking, or DeepSeek-R1) that returns the model's chain-of-thought in a separate reasoning_content field, distinct from the user-visible content. The agent runs in a tool-use loop with multi-turn history: the model reasons, calls a tool, sees the result, reasons again, possibly answers, and then a new user message starts the next turn.
Problem
Two failure modes pull in opposite directions. If the reasoning trace is dropped between a tool call and its result, the model loses the thread of why it called the tool in the first place, and the next reasoning step starts from a degraded context. If the reasoning trace is instead preserved across user-turn boundaries, conversation history bloats with stale reasoning from earlier tasks and the next user message inherits irrelevant prior thinking that pollutes its own reasoning. Neither 'always carry forward' nor 'always drop' is correct; the team needs a rule keyed to where in the loop the trace appears.
Forces
- Reasoning trace is the bridge between tool-call intent and post-tool-result interpretation.
- Reasoning trace is private intermediate state, not conversational record.
- Tokens are expensive; preserving traces forever costs money.
- Stale reasoning leaks bias into the next task.
Example
An agent built on a reasoning model debugs flaky CI by calling a log-fetch tool. Without trace carry-forward, the model emits its hidden reasoning, calls the tool, then on the result turn the reasoning is dropped and it forgets why it asked for those logs and re-derives from scratch, sometimes incorrectly. The team scopes an episode from one user turn to the next and preserves reasoning_content across all intervening tool calls, dropping it only at the next user turn. Tool-result interpretations stop drifting and token usage stays bounded.
Diagram
Solution
Therefore:
Define an episode as: from one user turn to the next user turn (inclusive of all intervening tool calls and tool results). Within an episode, preserve assistant reasoning_content as part of the context concatenation across all turns. At the next user turn boundary, drop reasoning_content from prior episodes (the API silently ignores it when passed across boundaries). The user-visible content remains in history; only the reasoning trace is episode-scoped.
What this pattern forbids. Internal reasoning content may not cross user-task boundaries; only user-visible content persists in conversation history.
The smaller patterns that complete this one —
- usesContext Window Packing★★— Choose what fits in the context window each turn given a fixed token budget.
And the patterns that stand alongside it, or against it —
- complementsExtended Thinking★★— Spend a configurable budget of internal reasoning tokens before producing a user-visible answer.
- complementsPrompt Caching★★— Order prompts so the unchanging prefix can be cached by the provider, cutting per-call cost and latency.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.