Memory

Agentic Memory

Expose memory management as first-class tool actions (ADD, UPDATE, DELETE, RETRIEVE, SUMMARY, FILTER) the LLM chooses at every step, trained end-to-end so short-term and long-term memory live under one learned policy.

Problem

When memory management lives in auxiliary controllers (summarisers, evictors, retrievers) tuned by hand, the agent's policy and its memory policy are optimised separately and cannot co-adapt. The agent cannot decide 'I should remember this exchange in detail because it will matter in three turns' or 'this fact is now stale, delete it' — those decisions belong to heuristics it cannot see. End-to-end optimisation across the agent loop and the memory loop is impossible because the memory loop is not differentiable, not callable, and not part of the agent's action space.

Solution

Define six memory operations as first-class tools available to the agent at every step: ADD (write a new memory item with metadata), UPDATE (modify an existing item), DELETE (remove obsolete items), RETRIEVE (semantic search over long-term memory, results injected into context), SUMMARY (compress a dialogue span), FILTER (narrow short-term memory by criteria). Train the agent end-to-end via reinforcement learning with a step-wise objective that credits memory operations against eventual task reward — published work uses a step-wise GRPO variant to handle the sparse and discontinuous reward signal from memory actions. Short-term and long-term memory share one learned policy rather than separate controllers.

When to use

  • The agent runs over long horizons (days, weeks) and hand-tuned memory heuristics have plateaued.
  • Task reward is well-defined and an RL training loop is available.
  • Memory decisions are task-dependent in ways generic policies miss.
  • The team can afford end-to-end training and the larger action-space exploration.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related