Memory

Context Compaction

When the context window nears its limit, replace the older conversation span with a model-written digest that preserves decisions, commitments, and active constraints while discarding noise, so the agent keeps running without losing the thread.

Problem

A fixed context window caps how much history an agent can carry, but a long task generates more history than fits. Truncating the oldest turns blindly drops the decisions and commitments the agent still depends on; keeping everything overflows the window or inflates cost and latency on every subsequent call. The agent needs to shed token volume without shedding the conclusions that volume produced.

Solution

Track context-window utilisation. When it crosses a threshold (for example 80% of the window), run a compaction pass: feed the older span of the conversation to the model with an instruction to produce a dense digest that preserves goals, decisions, open commitments, and any constraints the agent must still honour, while discarding raw tool output, superseded plans, and dead-end reasoning. Replace that span in the working context with the digest, keep the most recent turns verbatim so local continuity survives, and resume. Pin content that must never be compacted away — the original task statement and hard constraints — outside the compactable region. Anthropic ships this as automatic compaction in Claude Code and the Agent SDK; the Chinese context-engineering literature names it 压实 (compaction).

When to use

  • The agent runs long enough that history approaches the context-window limit.
  • Older turns are dominated by raw tool output and superseded reasoning.
  • The task must continue past the point where the window would otherwise overflow.
  • You can identify decisions and constraints worth preserving in a digest.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related