Vector Memory

also known as Embedding-Indexed Memory, Vector Store Memory

Store memories as embeddings in a vector index and retrieve the most semantically similar items at query time.

This pattern helps complete certain larger patterns —

used-byMemGPT-Style Paging★— Treat the LLM context window as RAM and external storage as disk, with the model issuing tool calls to page memory in and out.
specialisesNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
used-bySelf-Archaeology·— Synthesize the agent's past thought history into time-layered trajectory notes so it can articulate how its understanding evolved without recomputing the narrative each time.
used-byCo-Located Memory Surfacing·— Surface relevant persistent memories proactively when the human mentions a concrete entity the agent has prior knowledge of, so the human does not bear the burden of remembering to ask.
used-bySemantic Memory★— Maintain a dedicated store of what the agent holds to be true about the user and the world, separate from event records (episodic) and learned how-to (procedural).
used-byEpisodic Memory★★— Record past events as time-stamped first-person experiences the agent can recall later, separately from extracted facts (semantic) and learned how-to (procedural).
used-byCDC-Driven Vector Sync★★— Treat the source-of-truth document store as the only writer; keep the vector index in sync by emitting change-data-capture events onto a queue that the feature pipeline consumes.
used-byStreaming Feature Pipeline★— Process raw documents into RAG features as a continuous stream rather than a batch job, with typed models pinning each stage.
used-byFTI LLM Pipeline Split★★— Decompose an LLM/RAG system into three independently-deployable pipelines — feature, training, inference — communicating only via a feature store and a model registry.

Context

A long-running agent accumulates facts and observations over time, and on each step it needs to find the small subset of past items that is relevant to the current situation. Relevance is best judged by semantic similarity rather than by exact term match or chronological recency: 'find the past notes whose meaning is close to what is happening now'.

Problem

An append-only log of everything the agent has seen grows unboundedly and quickly becomes too large to search by linear scan. Without a semantic retrieval layer, the agent has no way to find the relevant past, because keyword search misses paraphrase and chronological recency misses older but topically relevant items. The team needs a memory store that supports similarity queries against an embedding of the current context, so that the agent can pull back exactly the items it should be thinking about now.

Forces

Embedding choice constrains retrieval quality.
Index updates have non-trivial latency.
Forgetting is achieved by deletion or decay; both have failure modes.

Example

A long-running personal agent's append-only thought log grows past a million entries; finding relevant past becomes hopeless and dumping it all into context is impossible. The team embeds each memory item, indexes it in a vector store, and at query time retrieves top-k semantically similar items (plus optional recency boost). Now 'what did I decide about latency three months ago' returns the actual right entries rather than the most recent or none, and prompt size stays bounded as memory grows.

Diagram

flowchart TD Mem[New memory item] --> Emb[Embed] Emb --> Idx[(Vector index)] Q[Query / current state] --> QEmb[Embed] QEmb --> Top[Retrieve top-k similar] Idx --> Top Top --> Decay[Apply decay / salience weighting] Decay --> Ctx[Prepend to context]

Solution

Therefore:

Each memory item is embedded and indexed. At query time, embed the query (or a summary of current state), retrieve top-k most similar memories, prepend to context. Optional decay (boost recent, age old) and salience weighting.

What this pattern forbids. The agent reads memory only through the retriever; full-store scans are not part of the loop.

And the patterns that stand alongside it, or against it —

alternative-toKnowledge Graph Memory★— Persist agent memory as entities and relations in a structured graph so symbolic queries (path, neighbour, type) become possible.
complementsSalience Attention Mechanism★— Score every candidate memory item with a weighted salience function so each tick attends to a small, relevant top-k subset rather than re-reading all memory.
complementsSelf-Corpus Vocabulary·— Mine a small bounded vocabulary from the agent's own writing and cache it as the conceptual axis for scoring new thoughts, so relevance reflects the agent's actual frame rather than a generic embedding space.
composes-withAgentic Memory★— Expose memory management as first-class tool actions (ADD, UPDATE, DELETE, RETRIEVE, SUMMARY, FILTER) the LLM chooses at every step, trained end-to-end so short-term and long-term memory live under one learned policy.
complementsMemory-Type Storage Specialization★★— Use different storage technologies optimized per memory type — fast in-memory stores (Redis-class) for episodic, vector databases (Pinecone/Weaviate) for semantic, relational or workflow engines for procedural — instead of one general store for everything.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

Production LLM Platform
core

Used in frameworks

References

Generative Agents: Interactive Simulacra of Human Behavior
paper

Provenance

Source: patterns/vector-memory.md on GitHub · commit 4fa1213 · view history
Added to catalog: 2026-04-30
Last updated: 2026-05-22
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.