Augmented LLM

also known as Augmented Model, LLM + Tools + Memory, Foundational Agent Block

Build the foundational agent block as an LLM augmented with retrieval, tools, and memory that the model actively chooses to use, rather than a bare-model call.

This pattern helps complete certain larger patterns —

used-byPrompt Chaining★★— Decompose a task into a fixed sequence of LLM calls where each step's output becomes the next step's input.
used-byRouting★★— Classify an incoming request and dispatch it to the specialist (lane / agent / model) best suited to handle it.
used-byOrchestrator-Workers★★— An orchestrator dynamically breaks a task into subtasks at runtime and delegates each to a worker LLM, then synthesises results.
specialisesReAct★★— Interleave a single thought, a single tool call, and a single observation per step so the agent reasons over fresh evidence.

Context

A team is building any non-trivial agentic system: a support assistant, a coding agent, a research agent, an internal workflow runner. They need a uniform building block so that higher-level patterns (chaining, routing, orchestrator-worker setups, multi-agent loops) can compose it without reinventing the basics each time.

Problem

A bare large language model call cannot look up fresh facts, change state in any external system, or remember anything between turns. If each higher-level pattern wires up retrieval, tool calling, and memory in its own ad-hoc way, the building blocks stop being interoperable: a routing layer cannot drop in a worker that was built against a different memory shape, and observability has to be re-implemented per integration.

Forces

Each augmentation (retrieval, tools, memory) is independently useful but composes badly if not tailored to the specific use case.
The model must decide when to retrieve, when to call a tool, and what to remember — pushing this decision out of the prompt into surrounding code defeats the augmentation.
Adding all three augmentations naively bloats every prompt; capabilities should be exposed only where they pay off.

Example

A support agent is built as one augmented LLM: it can call a tool to look up the customer's order, retrieve a knowledge-base article via vector search, and read/write a session memory of the conversation so far. Every higher-level workflow (routing tickets, escalating to a human, parallel ranking of suggested replies) composes instances of this block rather than rewiring the model with capabilities each time.

Diagram

Solution

Therefore:

Wire the model with three capabilities and expose each via a model-driven interface: (1) retrieval queries the model can issue against external corpora; (2) tool calls the model can emit and whose results stream back; (3) memory the model can read from and write to across turns. The model — not the surrounding code — decides which augmentation to invoke at each step. Other workflow patterns (prompt-chaining, routing, orchestrator-workers, etc.) compose instances of this block, not bare model calls.

What this pattern forbids. Higher-level patterns must compose this block, not raw model calls; capability use is decided by the model, not hardcoded in surrounding code.

The smaller patterns that complete this one —

usesTool Use★★— Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.
usesNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
usesShort-Term Thread Memory★★— Carry the relevant slice of conversation context across turns within a session.
generalisesTalker-Reasoner★— Split an interactive agent into a fast Talker for conversational responses and a slow Reasoner for deliberative planning and tool use, so the conversational loop never blocks on reasoning.

And the patterns that stand alongside it, or against it —

alternative-toMulti-Agent on Sequential Workloads✕— Anti-pattern: split a fundamentally sequential workload across multiple agents, degrading accuracy by 39–70% with no parallelization benefit.
complementsMRKL Systems (Modular Neuro-Symbolic)★★— Route each request through an LLM dispatcher to specialized symbolic or neural expert modules (calculator, knowledge base, code executor) rather than asking one LLM to do everything; integrate the modules' results for the final response.
complementsBusiness + LLM Microservice Split★★— Split an LLM application into a CPU-bound business microservice (retrieval, prompt assembly, orchestration) and a GPU-bound LLM microservice (only model.generate behind REST), so each tier scales on its own hardware budget.
complementsFTI LLM Pipeline Split★★— Decompose an LLM/RAG system into three independently-deployable pipelines — feature, training, inference — communicating only via a feature store and a model registry.
complementsCrawler Dispatcher★★— Route each incoming URL to a domain-specific crawler through a central dispatcher mapping URL patterns to registered crawler classes.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in frameworks

Claude Agent SDK
core18 patternsAgent SDKs★ emerging
Anthropic defines the foundational agentic building block as an LLM enhanced with retrieval, tools, and memory that the model itself drives.

References

Building Effective Agents
blog

Provenance

Source: patterns/augmented-llm.md on GitHub · commit 00fc059 · view history
Added to catalog: 2026-05-17
Last updated: 2026-05-21
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.