Information Chunking for Agent Memory
also known as STM Chunking, Topical Segmentation for Memory
Structure inputs into digestible topical segments (chunks) before feeding to short-term memory rather than throwing the full input at the model; reduces overload and increases accuracy (~40% improvement observed in customer-service deployment).
Context
An agent is given a long input — multi-turn conversation history, large document, multi-source context. The default is to dump it all into the model's context window and hope. STM is overwhelmed; attention diffuses across irrelevant content; response quality degrades.
Problem
Unchunked inputs into STM trigger the context-window-dumb-zone and lost-in-the-middle effects: degradation that starts well before the nominal context limit. The model can't prioritize, attention mechanisms get confused, retrieval quality drops.
Forces
- Chunking is an upstream preprocessing investment.
- Chunk boundaries require domain understanding — bad boundaries cut meaning in half.
- Per-domain chunking heuristics need design and maintenance.
Example
A customer-service agent gets a 20-page case history before each call. Pre-chunking: full history into context, agent gives generic responses, 65% accuracy. Post-chunking: history split into topical segments (billing issues, technical complaints, account changes, escalations) with metadata; agent loads only relevant chunks for the current call topic. Accuracy climbs to 91%.
Diagram
Solution
Therefore:
Before feeding context into STM, run a chunker: split the input into topic-coherent, size-bounded segments. Tag each chunk with topic / source metadata so retrieval can prioritize. Feed only relevant chunks at decision time. Bornet's measured impact: 40% accuracy improvement in a customer-service deployment. Pair with context-window-packing, episodic-summaries, context-window-dumb-zone, contextual-retrieval.
What this pattern forbids. No raw long input enters STM directly; all long inputs pass through the chunker first.
And the patterns that stand alongside it, or against it —
- complementsContext Window Packing★★— Choose what fits in the context window each turn given a fixed token budget.
- complementsEpisodic Summaries★★— Compress past episodes into summaries that preserve gist while shedding token cost.
- complementsContext Window Dumb-Zone Cap★— Hold context-window utilization below a working threshold (~40%) to keep the model out of the 'dumb zone' where it begins ignoring earlier instructions and hallucinating.
- complementsContextual Retrieval★— Prepend a short LLM-generated description to each chunk before embedding so the chunk carries its situating context.
- complementsLost in the Middle (Positional Bias)✕— LLM accuracy on retrieving information from long contexts drops sharply when relevant content sits in the middle of the prompt rather than at the start or end.
- complementsLandmark Attention·— Long-context attention mechanism placing sparse landmark tokens across very long inputs so the model jumps directly to relevant sections via landmark lookup rather than scanning linearly.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.