Agentic Memory
also known as Memory Operations as Tools, AgeMem, Unified STM-LTM Tool Interface, 智能体记忆
Expose memory management as first-class tool actions (ADD, UPDATE, DELETE, RETRIEVE, SUMMARY, FILTER) the LLM chooses at every step, trained end-to-end so short-term and long-term memory live under one learned policy.
Context
A long-running agent accumulates conversation history, intermediate results, and learned facts that exceed any context window. Standard practice splits this into short-term memory (the live context) and long-term memory (an external store) managed by separate controllers: a summariser decides what gets compressed, a retrieval policy decides what gets pulled back, an eviction heuristic decides what gets dropped. Each controller is hand-tuned and the agent's actual reasoning has no visibility into or control over them.
Problem
When memory management lives in auxiliary controllers (summarisers, evictors, retrievers) tuned by hand, the agent's policy and its memory policy are optimised separately and cannot co-adapt. The agent cannot decide 'I should remember this exchange in detail because it will matter in three turns' or 'this fact is now stale, delete it' — those decisions belong to heuristics it cannot see. End-to-end optimisation across the agent loop and the memory loop is impossible because the memory loop is not differentiable, not callable, and not part of the agent's action space.
Forces
- Memory decisions are task-dependent; what to keep depends on what the agent is doing.
- Hand-tuned heuristics (summarise every N turns, evict when over budget) are local optima.
- End-to-end training requires memory operations to be part of the agent's action space.
- Sparse and discontinuous reward from memory operations makes naive RL unstable.
Example
A customer-support agent runs across multi-day cases. Under heuristic controllers, every 20 turns a summariser collapses the oldest window; a retrieval policy pulls related items on every turn whether or not they're needed. Under Agentic Memory, the agent learns from training that customer name and order id should ADD to long-term memory immediately, that intermediate troubleshooting steps can stay in short-term memory and be FILTERed out once resolved, and that on resumption it should RETRIEVE on the order id rather than the case id. Memory operations appear as named tool calls in the trace; the same RL signal that taught task-resolution also taught memory hygiene.
Diagram
Solution
Therefore:
Define six memory operations as first-class tools available to the agent at every step: ADD (write a new memory item with metadata), UPDATE (modify an existing item), DELETE (remove obsolete items), RETRIEVE (semantic search over long-term memory, results injected into context), SUMMARY (compress a dialogue span), FILTER (narrow short-term memory by criteria). Train the agent end-to-end via reinforcement learning with a step-wise objective that credits memory operations against eventual task reward — published work uses a step-wise GRPO variant to handle the sparse and discontinuous reward signal from memory actions. Short-term and long-term memory share one learned policy rather than separate controllers.
What this pattern forbids. Memory state may only be modified through the named tool actions (ADD/UPDATE/DELETE/RETRIEVE/SUMMARY/FILTER); auxiliary heuristic controllers cannot mutate memory out-of-band, so every memory change is attributable to a single LLM action in the trace.
And the patterns that stand alongside it, or against it —
- alternative-toMemGPT-Style Paging★— Treat the LLM context window as RAM and external storage as disk, with the model issuing tool calls to page memory in and out.
- composes-withSemantic Memory★— Maintain a dedicated store of what the agent holds to be true about the user and the world, separate from event records (episodic) and learned how-to (procedural).
- composes-withEpisodic Memory★★— Record past events as time-stamped first-person experiences the agent can recall later, separately from extracted facts (semantic) and learned how-to (procedural).
- composes-withVector Memory★★— Store memories as embeddings in a vector index and retrieve the most semantically similar items at query time.
- complementsEpisodic Summaries★★— Compress past episodes into summaries that preserve gist while shedding token cost.
- complementsTest-Time Memorization (Titans)·— Memory module that learns at inference time by incorporating recent inputs into its parameters during the session rather than relying solely on pre-trained weights.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.