V · MemoryEmerging

MemGPT-Style Paging

also known as Virtual Context, Memory Paging, OS-Style Memory

Treat the LLM context window as RAM and external storage as disk, with the model issuing tool calls to page memory in and out.

Context

A long-running agent's conversation or document state grows past the model's context window. The team needs to keep the agent useful over interactions that may span thousands of turns, or over documents that are larger than any window the provider offers.

Problem

A fixed context window forces a hard choice between losing state and stuffing irrelevant content. Naive truncation drops whatever happens to be at the boundary, which may be exactly the information the next turn needs. Stuffing the window with potentially-relevant content from the past inflates cost and dilutes the model's attention on the actually-relevant pieces. Neither option scales; both degrade quality. The team needs a paging discipline — the way an operating system pages between main memory and disk — where the model itself can decide what to load in and what to swap out as the task evolves.

Forces

  • Paging tools compete for context space themselves.
  • Eviction policy (LRU? LFU? salience?) affects quality.
  • Tool latency on page faults adds to user-visible time.

Example

A long-running personal assistant that tracks a user's projects across six months hits the context window every conversation and starts dropping older but still relevant context. The team adopts memgpt-paging: a small main context holds the system prompt and the active turn; recall and archival tiers live in external storage; the model uses search_archival and read_recall tool calls to page in what it needs. The agent now treats the window as RAM it explicitly manages instead of as a hard ceiling.

Diagram

Solution

Therefore:

Two memory tiers. Main context: system prompt, working set, recent messages. External context: recall (raw history) and archival (vector store). The model has tool calls for read_recall, write_archival, search_archival. Paging happens at the agent's discretion; the model treats main context as RAM and external as disk.

What this pattern forbids. Memory beyond the working set is accessible only via paging tool calls; the agent cannot directly read external state.

The smaller patterns that complete this one —

  • usesVector Memory★★Store memories as embeddings in a vector index and retrieve the most semantically similar items at query time.
  • usesTool Use★★Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.

And the patterns that stand alongside it, or against it —

  • alternative-toFive-Tier Memory Cascade·Stage agent memory across sensory, working, short-term, episodic, and long-term tiers with explicit promotion and decay between them.
  • alternative-toCross-Session Memory★★Persist user-specific facts, preferences, and prior context across all sessions, threads, and devices.
  • alternative-toContext Window Packing★★Choose what fits in the context window each turn given a fixed token budget.
  • alternative-toAgentic MemoryExpose memory management as first-class tool actions (ADD, UPDATE, DELETE, RETRIEVE, SUMMARY, FILTER) the LLM chooses at every step, trained end-to-end so short-term and long-term memory live under one learned policy.
  • complementsContext Window Dumb-Zone CapHold context-window utilization below a working threshold (~40%) to keep the model out of the 'dumb zone' where it begins ignoring earlier instructions and hallucinating.
  • complementsLandmark Attention·Long-context attention mechanism placing sparse landmark tokens across very long inputs so the model jumps directly to relevant sections via landmark lookup rather than scanning linearly.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.