Three Layers of Agentic AI Memory
also known as STM+LTM+Feedback Onion, Concentric Memory Architecture
Architect agent memory as three integrated concentric layers — Short-Term Memory (outer), Long-Term Memory (middle), Feedback Loops (core) — operating together as a unit rather than as separable optional components.
Context
A team building or operating an agent that needs to remember across sessions. The default is to treat short-term context window, long-term retrieval store, and feedback-improvement as three independent concerns. They interact in ways that surface only at scale.
Problem
Treating the three memory concerns as independent leads to silos: the STM forgets what LTM stored; the LTM never gets refined by feedback; feedback loops don't update either memory cleanly. Bornet's onion model insists they're one architecture, not three add-ons.
Forces
- Three layers means three components to maintain.
- Each layer uses different storage technology (in-memory cache, vector DB, workflow store).
- Boundary semantics between layers (when does STM promote to LTM?) require explicit design.
Example
A customer-service agent at a logistics firm. STM holds the current conversation. LTM persists customer history, past tickets, learned-workflows across sessions. Feedback Loops ingest CSAT scores, agent corrections, ticket-reopen rates and refine both layers — STM gets better at attention to high-priority customers, LTM gets better at classifying past resolutions. Six months in, error rates have dropped 50% per Bornet's case data.
Diagram
Solution
Therefore:
Three coordinated layers. STM: bounded session context, attention mechanisms, token management. LTM: persistent, structured, indexed (typically vector or graph). Feedback Loops: ingest explicit (corrections, ratings) and implicit (engagement, errors) signals to refine both STM and LTM over time. Define promotion rules (when STM content gets written to LTM) and refinement triggers. Pair with short-term-memory, episodic-memory, semantic-memory, procedural-memory, memory-type-storage-specialization, agentic-memory.
What this pattern forbids. All three layers must be present and connected; an agent missing any layer is not considered fully memory-enabled.
And the patterns that stand alongside it, or against it —
- complementsShort-Term Thread Memory★★— Carry the relevant slice of conversation context across turns within a session.
- complementsEpisodic Memory★★— Record past events as time-stamped first-person experiences the agent can recall later, separately from extracted facts (semantic) and learned how-to (procedural).
- complementsSemantic Memory★— Maintain a dedicated store of what the agent holds to be true about the user and the world, separate from event records (episodic) and learned how-to (procedural).
- complementsProcedural Memory★— Maintain a third agent memory type alongside episodic (past events) and semantic (facts): procedural memory captures *learned how-to* — reusable skills, workflows, and self-rewritten system instructions that map situations directly to actions.
- complementsMemory-Type Storage Specialization★★— Use different storage technologies optimized per memory type — fast in-memory stores (Redis-class) for episodic, vector databases (Pinecone/Weaviate) for semantic, relational or workflow engines for procedural — instead of one general store for everything.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.