Tool-Result Eviction

also known as Tool Clearing, Observation Pruning, Tool-Output Eviction

Once a tool's raw output has been consumed, replace it in the live context window with a short marker of what was done, reclaiming tokens without losing that the call happened.

Context

A tool-using agent calls search, file reads, API queries, or code execution, and each returns a bulky payload — a page of JSON, a file's full contents, a stack trace. The agent reads the payload, extracts what it needs, and acts. Turns later that raw payload is still sitting in the context window, consuming tokens and attention even though only its conclusion is still relevant.

Problem

Raw tool outputs are the largest and most disposable thing in an agent's context. Keeping every observation verbatim crowds the window, raises cost, and buries the signal the agent actually reasoned over; but deleting a tool turn entirely loses the record that the call was made and what it concluded, which the agent may need to avoid repeating work or to justify its actions.

Forces

Raw observations dominate token usage but are mostly dead weight once consumed.
Deleting an observation outright erases the trace that the call happened at all.
What is 'consumed' is not always obvious — a result may be needed again later.
Replacement markers must carry enough to prevent the agent re-issuing the same call.
Eviction policy competes with caching: one discards, the other retains for replay.

Example

A research agent runs ten web fetches, each returning a full page of markdown. After pulling the one figure it needs from each, the runtime replaces every fetched page in the window with a line like 'fetched acme.com/pricing: enterprise tier is $499/mo' and offloads the full pages to a blob store. The window stays lean across the next twenty reasoning steps, and when the agent later needs a page in full it restores it by call id instead of re-fetching.

Diagram

flowchart TD TC[Tool call] --> OBS[Raw observation in window] OBS --> EX[Agent extracts needed values] EX --> Q{Consumed?} Q -->|no| OBS Q -->|yes| EV[Evictor] EV --> MK[Replace payload with marker:<br/>call + conclusion] EV -.offload.-> ST[(External payload store)] ST -.restore by id.-> OBS

Solution

Therefore:

Treat tool observations as evictable. When a tool result has been consumed — its needed values extracted into the agent's reasoning or into external memory — replace the raw payload in the working context with a short marker that records the call, its target, and the one-line conclusion ('read config.yaml: 3 services defined', 'searched docs: no rate-limit setting found'). Keep the marker so the agent does not re-issue the call; offload the full payload to external storage if it might be needed verbatim again. Apply eviction lazily (oldest-consumed first) or eagerly (immediately after extraction) depending on how tight the window is. Manus and the Chinese context-engineering literature describe this as tool clearing.

What this pattern forbids. The agent must not retain raw tool payloads in the live window after they have been consumed; a consumed observation has to be replaced with a marker that preserves the call and its conclusion. Eviction must not delete the record that a call happened, only its bulky body.

And the patterns that stand alongside it, or against it —

complementsTool Result Caching★★— Cache the result of expensive deterministic tool calls keyed by their arguments so repeat calls within a session return immediately.
complementsContext Compaction★— When the context window nears its limit, replace the older conversation span with a model-written digest that preserves decisions, commitments, and active constraints while discarding noise, so the agent keeps running without losing the thread.
complementsContext Window Packing★★— Choose what fits in the context window each turn given a fixed token budget.
complementsFilesystem as Context★— Use the filesystem as the agent's externalized working memory, writing plans, notes, and large tool outputs to files, dropping them out of the live window, and re-reading on demand.
alternative-toAdaptive Memory Decay★— Give each long-term memory item a retention score that decays over time through a function modulated by relevance, access frequency, and recency, so unreinforced items fade or fuse while items that are used persist.