Augmented LLM
also known as Augmented Model, LLM + Tools + Memory, Foundational Agent Block
Build the foundational agent block as an LLM augmented with retrieval, tools, and memory that the model actively chooses to use, rather than a bare-model call.
This pattern helps complete certain larger patterns —
- used-byPrompt Chaining★★— Decompose a task into a fixed sequence of LLM calls where each step's output becomes the next step's input.
- used-byRouting★★— Classify an incoming request and dispatch it to the specialist (lane / agent / model) best suited to handle it.
- used-byOrchestrator-Workers★★— An orchestrator dynamically breaks a task into subtasks at runtime and delegates each to a worker LLM, then synthesises results.
- specialisesReAct★★— Interleave a single thought, a single tool call, and a single observation per step so the agent reasons over fresh evidence.
Context
A team is building any non-trivial agentic system: a support assistant, a coding agent, a research agent, an internal workflow runner. They need a uniform building block so that higher-level patterns (chaining, routing, orchestrator-worker setups, multi-agent loops) can compose it without reinventing the basics each time.
Problem
A bare large language model call cannot look up fresh facts, change state in any external system, or remember anything between turns. If each higher-level pattern wires up retrieval, tool calling, and memory in its own ad-hoc way, the building blocks stop being interoperable: a routing layer cannot drop in a worker that was built against a different memory shape, and observability has to be re-implemented per integration.
Forces
- Each augmentation (retrieval, tools, memory) is independently useful but composes badly if not tailored to the specific use case.
- The model must decide when to retrieve, when to call a tool, and what to remember — pushing this decision out of the prompt into surrounding code defeats the augmentation.
- Adding all three augmentations naively bloats every prompt; capabilities should be exposed only where they pay off.
Example
A support agent is built as one augmented LLM: it can call a tool to look up the customer's order, retrieve a knowledge-base article via vector search, and read/write a session memory of the conversation so far. Every higher-level workflow (routing tickets, escalating to a human, parallel ranking of suggested replies) composes instances of this block rather than rewiring the model with capabilities each time.
Diagram
Solution
Therefore:
Wire the model with three capabilities and expose each via a model-driven interface: (1) retrieval queries the model can issue against external corpora; (2) tool calls the model can emit and whose results stream back; (3) memory the model can read from and write to across turns. The model — not the surrounding code — decides which augmentation to invoke at each step. Other workflow patterns (prompt-chaining, routing, orchestrator-workers, etc.) compose instances of this block, not bare model calls.
What this pattern forbids. Higher-level patterns must compose this block, not raw model calls; capability use is decided by the model, not hardcoded in surrounding code.
The smaller patterns that complete this one —
- usesTool Use★★— Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.
- usesNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
- usesShort-Term Thread Memory★★— Carry the relevant slice of conversation context across turns within a session.
- generalisesTalker-Reasoner★— Split an interactive agent into a fast Talker for conversational responses and a slow Reasoner for deliberative planning and tool use, so the conversational loop never blocks on reasoning.
And the patterns that stand alongside it, or against it —
- alternative-toMulti-Agent on Sequential Workloads✕— Anti-pattern: split a fundamentally sequential workload across multiple agents, degrading accuracy by 39–70% with no parallelization benefit.
- complementsMRKL Systems (Modular Neuro-Symbolic)★★— Route each request through an LLM dispatcher to specialized symbolic or neural expert modules (calculator, knowledge base, code executor) rather than asking one LLM to do everything; integrate the modules' results for the final response.
- complementsBusiness + LLM Microservice Split★★— Split an LLM application into a CPU-bound business microservice (retrieval, prompt assembly, orchestration) and a GPU-bound LLM microservice (only model.generate behind REST), so each tier scales on its own hardware budget.
- complementsFTI LLM Pipeline Split★★— Decompose an LLM/RAG system into three independently-deployable pipelines — feature, training, inference — communicating only via a feature store and a model registry.
- complementsCrawler Dispatcher★★— Route each incoming URL to a domain-specific crawler through a central dispatcher mapping URL patterns to registered crawler classes.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.