Context Folding
also known as Sub-Trajectory Folding
Let the agent branch into a temporary sub-context for a subtask and fold it back into a short summary on completion, so a long-horizon task stays within a small active window.
Context
An agent works a task that spans hundreds of steps, such as a repository-wide refactor, a multi-document research sweep, or a long tool chain. Every step appends its tool calls and observations to the running context, and the window fills long before the task is finished. Reactive truncation or whole-window summarisation near the limit either drops detail the agent still needs or pays to re-read it.
Problem
A single linear context cannot hold a hundred-step trajectory, yet most of the intermediate detail produced while exploring one subtask stops mattering once that subtask returns its result. Keeping every step wastes the window and slows the model, while discarding steps blindly loses the thread. The agent needs a way to spend a large working context on a subtask and then reclaim almost all of it, retaining only the outcome.
Forces
- Aggressive compression buys a long horizon under a fixed token budget, but folding away detail risks discarding something a later subtask needs.
- A subtask's exploration is high-value while it runs and near-worthless once it returns a result.
- Reliable folding is a learned skill: prompting an agent to self-summarise is brittle, while training the fold end-to-end (for example with FoldGRPO) is costly.
Example
A coding agent is asked to migrate a 40-file module to a new API. For each file it opens a branch, reads the file, runs the tests, and fixes the call sites — dozens of steps per file. When a file is done it returns a one-line summary such as 'migrated payments.py, 3 call sites updated, tests pass' and folds away the rest, so by the fortieth file the main context still fits comfortably instead of overflowing.
Diagram
Solution
Therefore:
Expose two control actions to the agent. The branch action opens a new sub-context seeded with just the subtask goal; the return action closes it and writes back only a short outcome summary to the parent trajectory. The agent reasons and calls tools freely inside the branch, and when it returns, the intermediate steps are folded away so the parent sees one compact result. The decision of when to branch and what to keep is learned during training (FoldGRPO assigns credit through the fold) rather than hard-coded by the harness, so the agent folds where it pays off.
What this pattern forbids. Folded sub-trajectories are no longer visible to the parent context; once a branch returns, the agent cannot read its intermediate steps, only the retained summary.
And the patterns that stand alongside it, or against it —
- alternative-toContext Compaction★— When the context window nears its limit, replace the older conversation span with a model-written digest that preserves decisions, commitments, and active constraints while discarding noise, so the agent keeps running without losing the thread.
- complementsAgent-as-Tool Embedding★— Wrap a sub-agent (with its own loop, prompt, and tool palette) behind a single function-shaped tool signature, so the parent agent calls it like any other tool and never sees the sub-agent's internal turns.
- complementsSubagent Isolation★— Run subagents in isolated workspaces so their writes do not collide and parallelism is safe.
- alternative-toMemGPT-Style Paging★— Treat the LLM context window as RAM and external storage as disk, with the model issuing tool calls to page memory in and out.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.