Short-Term Thread Memory
also known as Conversation State, Per-Thread State, Working Memory
Carry the relevant slice of conversation context across turns within a session.
This pattern helps complete certain larger patterns —
- used-byAgent Resumption★★— Persist agent execution state so a long-running run survives restarts, deploys, or user disconnects.
- used-byInterrupt-Resumable Thought·— Preserve multi-step reasoning across interrupts by supporting paused-and-resumed thought frames so a new message handles cleanly without clobbering in-flight work.
- used-byEcho Recognition·— Recognize human message repetition as emphasis or a re-ask rather than as an independent input, so the agent does not produce a near-duplicate reply when the human repeats themselves.
- used-byAugmented LLM★★— Build the foundational agent block as an LLM augmented with retrieval, tools, and memory that the model actively chooses to use, rather than a bare-model call.
Context
A multi-turn agent needs continuity across recent turns — what screen the user is currently on, what the active plan looks like, what tools have been called and what they returned — but it does not need this information forever. The next few turns will use it; the next conversation almost certainly will not.
Problem
Replaying the entire conversation history on every turn becomes expensive quickly and pollutes the context with stale facts that no longer matter. On the other hand, throwing away history between turns breaks continuity: the agent forgets what it was just doing, the user has to re-state their goal, and tool results disappear before the agent has a chance to use them. The team needs a bounded, recent slice of state that survives turn-to-turn within a session and is bounded by something other than 'everything that has ever been said'.
Forces
- TTL choice (minutes? hours? days?) trades freshness for cost.
- What to keep vs. summarise is a quality-vs-cost tension.
- Multi-device sessions complicate where state lives.
Example
A chat assistant replays the entire conversation each turn and by message thirty the prompt is bloated with stale facts and the cost-per-turn has tripled. The team defines a typed thread state (recent messages, current screen, active plan, agent step) persisted with a 24-hour TTL and reloads only that on the next turn. Token cost per turn flatlines; the assistant still feels continuous within a session and resets cleanly on TTL.
Diagram
Solution
Therefore:
Define a typed state object per thread (messages, current screen, active plan, agent step). Persist with a TTL (commonly 24h). Reload on the next turn; expire and reset on TTL.
What this pattern forbids. The agent cannot rely on facts older than the TTL window without re-fetching them.
The smaller patterns that complete this one —
- generalisesReasoning Trace Carry-Forward★— For reasoning models that emit a separate reasoning trace, preserve that trace in context across the same logical task episode (across tool-call/result turns) but drop it at user-turn boundaries.
And the patterns that stand alongside it, or against it —
- complementsEpisodic Summaries★★— Compress past episodes into summaries that preserve gist while shedding token cost.
- complementsSession Isolation★★— Keep one user's session state and memory unreachable from another user's agent.
- complementsCross-Session Memory★★— Persist user-specific facts, preferences, and prior context across all sessions, threads, and devices.
- complementsScratchpad★★— Give the agent a writable scratch space for intermediate notes that informs later turns but does not pollute the response.
- complementsCo-Located Memory Surfacing·— Surface relevant persistent memories proactively when the human mentions a concrete entity the agent has prior knowledge of, so the human does not bear the burden of remembering to ask.
- complementsThree Layers of Agentic AI Memory★— Architect agent memory as three integrated concentric layers — Short-Term Memory (outer), Long-Term Memory (middle), Feedback Loops (core) — operating together as a unit rather than as separable optional components.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.