V · MemoryMature★★

Episodic Summaries

also known as Compaction, Conversation Summarisation, Chunk Summaries, Reduce Token Cost, Shrink Context, Cuts Token Use, Too Many Tokens Reduction

Compress past episodes into summaries that preserve gist while shedding token cost.

This pattern helps complete certain larger patterns —

  • used-byFive-Tier Memory Cascade·Stage agent memory across sensory, working, short-term, episodic, and long-term tiers with explicit promotion and decay between them.
  • used-byContext Window Packing★★Choose what fits in the context window each turn given a fixed token budget.
  • used-byEpisodic Memory★★Record past events as time-stamped first-person experiences the agent can recall later, separately from extracted facts (semantic) and learned how-to (procedural).

Context

A long-running agent has accumulated more conversation history, tool results, and intermediate reasoning than fits in the model's context window. Replaying the raw history on every turn is impossible because of size, and even when it would fit, it is wasteful to re-read all of it for what is usually a small follow-up step.

Problem

Without some form of compaction, the agent has only two bad options. Either the context grows unboundedly until it overflows the window, at which point the call fails or the most recent state is silently dropped. Or a sliding-window strategy truncates the oldest content, which lets important early facts (the original task, an early decision the agent made, a constraint the user stated up front) fall off the back even though the agent still needs them. The team needs a way to summarise older history into compact episodes that retain the load-bearing facts while shedding the verbatim noise.

Forces

  • Token savings vs summary fidelity loss.
  • Compaction LLM cost vs context-window relief.
  • Single source of truth vs raw-archive availability.

Example

A long-running customer-success agent has accumulated forty-five conversation episodes with one account over six months. The full history blows the context window; a sliding window drops the early conversation where the customer's renewal terms were set. The team uses Episodic Summaries: each closed episode is compressed into a few sentences capturing what happened, what was decided, and any open threads, and the summaries replace the raw transcripts in the prompt. Token cost stays bounded and the renewal-terms decision survives.

Diagram

Solution

Therefore:

On a schedule (or at thresholds), summarise blocks of recent thoughts/conversation into compact representations. Store summaries in a higher tier; archive originals. Reads consult summaries first, originals on demand.

What this pattern forbids. Past events older than the compaction horizon are accessible only via summary, not raw.

And the patterns that stand alongside it, or against it —

  • complementsReflexion·Have the agent write linguistic lessons from past failures and consult them in future episodes.
  • complementsShort-Term Thread Memory★★Carry the relevant slice of conversation context across turns within a session.
  • complementsSelf-Archaeology·Synthesize the agent's past thought history into time-layered trajectory notes so it can articulate how its understanding evolved without recomputing the narrative each time.
  • complementsSalience Attention MechanismScore every candidate memory item with a weighted salience function so each tick attends to a small, relevant top-k subset rather than re-reading all memory.
  • complementsDream Consolidation CycleRun a deeper, slower reflection pass distinct from per-tick reflection — reading hours of recent thoughts, promoting themes, releasing affective residue, and clearing working memory — so the agent does not accumulate residue indefinitely.
  • alternative-toCluster-Capped Insight Store·Cap the number of insights per stem-token cluster and archive the oldest variants by mtime so the long-term store keeps the active research edge instead of accumulating near-duplicates.
  • complementsSleep-Time Compute·During idle or downtime, run the model offline against the user's standing context to pre-compute dense summaries and likely future answers, so test-time latency and cost drop when the user actually asks.
  • complementsProcedural MemoryMaintain a third agent memory type alongside episodic (past events) and semantic (facts): procedural memory captures *learned how-to* — reusable skills, workflows, and self-rewritten system instructions that map situations directly to actions.
  • complementsAgentic MemoryExpose memory management as first-class tool actions (ADD, UPDATE, DELETE, RETRIEVE, SUMMARY, FILTER) the LLM chooses at every step, trained end-to-end so short-term and long-term memory live under one learned policy.
  • complementsContext Window Dumb-Zone CapHold context-window utilization below a working threshold (~40%) to keep the model out of the 'dumb zone' where it begins ignoring earlier instructions and hallucinating.
  • complementsInformation Chunking for Agent Memory★★Structure inputs into digestible topical segments (chunks) before feeding to short-term memory rather than throwing the full input at the model; reduces overload and increases accuracy (~40% improvement observed in customer-service deployment).

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.