III · Tool Use & EnvironmentMature★★

Augmented LLM

also known as Augmented Model, LLM + Tools + Memory, Foundational Agent Block

Build the foundational agent block as an LLM augmented with retrieval, tools, and memory that the model actively chooses to use, rather than a bare-model call.

This pattern helps complete certain larger patterns —

  • used-byPrompt Chaining★★Decompose a task into a fixed sequence of LLM calls where each step's output becomes the next step's input.
  • used-byRouting★★Classify an incoming request and dispatch it to the specialist (lane / agent / model) best suited to handle it.
  • used-byOrchestrator-Workers★★An orchestrator dynamically breaks a task into subtasks at runtime and delegates each to a worker LLM, then synthesises results.
  • specialisesReAct★★Interleave a single thought, a single tool call, and a single observation per step so the agent reasons over fresh evidence.

Context

A team is building any non-trivial agentic system: a support assistant, a coding agent, a research agent, an internal workflow runner. They need a uniform building block so that higher-level patterns (chaining, routing, orchestrator-worker setups, multi-agent loops) can compose it without reinventing the basics each time.

Problem

A bare large language model call cannot look up fresh facts, change state in any external system, or remember anything between turns. If each higher-level pattern wires up retrieval, tool calling, and memory in its own ad-hoc way, the building blocks stop being interoperable: a routing layer cannot drop in a worker that was built against a different memory shape, and observability has to be re-implemented per integration.

Forces

  • Each augmentation (retrieval, tools, memory) is independently useful but composes badly if not tailored to the specific use case.
  • The model must decide when to retrieve, when to call a tool, and what to remember — pushing this decision out of the prompt into surrounding code defeats the augmentation.
  • Adding all three augmentations naively bloats every prompt; capabilities should be exposed only where they pay off.

Example

A support agent is built as one augmented LLM: it can call a tool to look up the customer's order, retrieve a knowledge-base article via vector search, and read/write a session memory of the conversation so far. Every higher-level workflow (routing tickets, escalating to a human, parallel ranking of suggested replies) composes instances of this block rather than rewiring the model with capabilities each time.

Diagram

Solution

Therefore:

Wire the model with three capabilities and expose each via a model-driven interface: (1) retrieval queries the model can issue against external corpora; (2) tool calls the model can emit and whose results stream back; (3) memory the model can read from and write to across turns. The model — not the surrounding code — decides which augmentation to invoke at each step. Other workflow patterns (prompt-chaining, routing, orchestrator-workers, etc.) compose instances of this block, not bare model calls.

What this pattern forbids. Higher-level patterns must compose this block, not raw model calls; capability use is decided by the model, not hardcoded in surrounding code.

The smaller patterns that complete this one —

  • usesTool Use★★Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.
  • usesNaive RAG★★Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
  • usesShort-Term Thread Memory★★Carry the relevant slice of conversation context across turns within a session.
  • generalisesTalker-ReasonerSplit an interactive agent into a fast Talker for conversational responses and a slow Reasoner for deliberative planning and tool use, so the conversational loop never blocks on reasoning.

And the patterns that stand alongside it, or against it —

  • alternative-toMulti-Agent on Sequential WorkloadsAnti-pattern: split a fundamentally sequential workload across multiple agents, degrading accuracy by 39–70% with no parallelization benefit.
  • complementsMRKL Systems (Modular Neuro-Symbolic)★★Route each request through an LLM dispatcher to specialized symbolic or neural expert modules (calculator, knowledge base, code executor) rather than asking one LLM to do everything; integrate the modules' results for the final response.
  • complementsBusiness + LLM Microservice Split★★Split an LLM application into a CPU-bound business microservice (retrieval, prompt assembly, orchestration) and a GPU-bound LLM microservice (only model.generate behind REST), so each tier scales on its own hardware budget.
  • complementsFTI LLM Pipeline Split★★Decompose an LLM/RAG system into three independently-deployable pipelines — feature, training, inference — communicating only via a feature store and a model registry.
  • complementsCrawler Dispatcher★★Route each incoming URL to a domain-specific crawler through a central dispatcher mapping URL patterns to registered crawler classes.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance