Letta
also known as MemGPT
Type: full-code · Vendor: Letta · Language: Python · License: Apache-2.0 · Status: active · Status in practice: mature
Build stateful LLM agents that remember, learn, and improve over time by self-managing a tiered memory (in-context blocks plus archival/recall stores) via tool calls.
Description. Letta (formerly MemGPT) is the open-source platform created by the authors of the MemGPT paper for building stateful agents with advanced memory. Letta agents process messages through a tool-calling loop in which the model can read and edit its own memory: in-context memory blocks live inside the prompt, archival memory is a vector store queried on demand, and recall memory logs full conversational history. All agent state — memory blocks, messages, reasoning, and tool calls — is persisted in a database so nothing is lost even when content is evicted from the context window. Letta exposes its agents as a stateful REST API.
Agent loop shape. Tool-calling loop on top of MemGPT-style virtual context. Memory blocks are always-visible XML-like sections prepended to the prompt; the agent can call memory tools to edit them, archive content, or search recall/archival memory. All state — memory, messages, reasoning, tool calls — is written to a database after each step, so the agent persists across server restarts and is addressable through the Letta REST API.
Primary use cases
- long-running personal/companion agents that retain user context across sessions
- research agents that accumulate facts in archival memory
- stateful customer-support agents exposed as a REST service
- MemGPT-style virtual-context experiments and continual learning
Key concepts
- Memory blocks (core memory) → cross-session-memory (docs) — Structured sections of the prompt that persist across all interactions and are always visible to the agent; the agent edits them via memory tools.
- Archival memory → vector-memory (docs) — Vector-DB store of long-term facts and knowledge; queried on demand via tools because contents cannot be pinned to the context window.
- Recall memory (docs) — Database table that logs the agent's full conversational history; surfaces older messages when needed.
- MemGPT virtual context → memgpt-paging (docs) — OS-inspired paging mechanism from the MemGPT paper: the agent self-manages what lives in the limited context window via tool calls.
- Letta REST API / stateful messages.create (docs) — Agents are addressed as stateful resources; client SDKs hit a REST endpoint that processes one user message and returns assistant, reasoning, tool-call, and tool-return messages.
Patterns this full-code implements —
- ★MemGPT-Style Paging
Letta is the production reimplementation of the MemGPT paper: the agent self-manages a finite context window via tool calls that page content between core, recall, and archival memory.
- ★★Cross-Session Memory
Memory blocks plus archival and recall memory persist across all interactions; the agent retains user/world facts between sessions.
- ★★Tool Use
Letta agents act exclusively through tool calls — memory edits, searches, and user-defined tools; the messages.create response surfaces ToolCallMessage and ToolReturnMessage.
- ★★Vector Memory
Archival memory is a semantically searchable vector DB that the agent queries on demand via tools when content cannot fit in core memory.
- ★★Agent Resumption
All agent state (memory, messages, reasoning, tool calls) is persisted in a database after every interaction, so the agent resumes from where it was on the next API call or after a restart.
- ★★Structured Output
Letta supports JSON mode and JSON-schema strict outputs via response_format in model_settings; the setting persists as part of agent state. The messages.create response is also a typed envelope of As…
- ·Five-Tier Memory Cascade
Letta exposes a tiered memory (in-context blocks, recall, archival) inherited from MemGPT; not exactly five tiers but the same cascade idea. Mapping is approximate, not vendor-named.