Tool Use & Environment
How the agent reaches outside itself.
34 patterns in this book. · Updated
When to reach for each
01. Tool Use Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse. Best for: The model must affect external state or query authoritative systems. Tradeoff: Tool palette design becomes the bottleneck; bad tools propagate to every call site. Watch for: The deliverable is free prose; structuring it as a tool call is overhead.
02. Model Context Protocol Standardise how agents discover and call tools so that a tool written once is usable by any conformant agent. Best for: Tool palettes need to be portable across multiple host applications. Tradeoff: Adds a process boundary; latency and operational surface increase. Watch for: Single host, single language, no portability requirement; native function calls are simpler.
03. Code-as-Action Agent Have the agent emit a code snippet as its action each step, executed in a constrained interpreter, instead of emitting JSON tool calls; tool composition becomes function nesting and control flow inside the snippet. Best for: Tool composition is natural in code (filter, map, conditional chains) and clumsy as JSON tool calls. Tradeoff: Sandbox correctness is load-bearing; weak sandbox means arbitrary code execution. Watch for: The deployment cannot host or trust a sandboxed interpreter.
04. Code Execution Let the model emit code, run it in a sandbox, and treat the run as the answer instead of trusting the model to compute in its head. Best for: The task involves calculation, parsing, or transformations that LLMs hallucinate. Tradeoff: Sandbox security is its own engineering problem. Watch for: The task is pure language with no computation that benefits from running code.
05. Computer Use Let the model drive a desktop end-to-end via screenshots plus virtual mouse/keyboard tool calls instead of bespoke per-app APIs. Best for: The target software has no clean API and the agent must drive a real desktop visually. Tradeoff: Slow and brittle on dynamic UIs. Watch for: A clean API exists and is faster, cheaper, and more reliable than visual control.
All patterns in this book
Tool Use
×122Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.
Model Context Protocol
×52Standardise how agents discover and call tools so that a tool written once is usable by any conformant agent.
Code-as-Action Agent
×32Have the agent emit a code snippet as its action each step, executed in a constrained interpreter, instead of emitting JSON tool calls; tool composition becomes function nesting and control flow insi…
Code Execution
×31Let the model emit code, run it in a sandbox, and treat the run as the answer instead of trusting the model to compute in its head.
Computer Use
×20Let the model drive a desktop end-to-end via screenshots plus virtual mouse/keyboard tool calls instead of bespoke per-app APIs.
Agent-Computer Interface
×19Design the tool surface for an LLM agent specifically, with affordances different from human-facing CLIs.
Browser Agent
×17Expose websites to the agent through a structured DOM/accessibility tree plus a small action vocabulary, sitting between raw HTML and pixel-level Computer Use.
Multilingual Voice Agent Stack
×17Compose a voice agent as a tightly co-located pipeline of speech-to-text, language-aware LLM reasoning, and text-to-speech, where one vendor owns all three so language and dialect propagate cleanly a…
Sandbox Isolation
×13Run agent-emitted code or actions in a contained environment with restricted filesystem, network, and process privileges.
Agent Skills
×10Package author-time procedures (markdown + optional resources) the agent loads on demand for specific task types.
Skill Library
×7Let the agent grow its own toolkit by writing reusable skills that subsequent runs can call.
Prompt Caching
×4Order prompts so the unchanging prefix can be cached by the provider, cutting per-call cost and latency.
Tool Result Caching
×3Cache the result of expensive deterministic tool calls keyed by their arguments so repeat calls within a session return immediately.
Dual-System GUI Agent
×3Split a GUI agent into a decision model that plans and recovers from errors and a grounding model that observes pixels and emits the precise action; route each subproblem to the better-suited model.
Tool Loadout
×2Select a small task-relevant subset of available tools per request rather than exposing the full registry to the model.
Crawler Dispatcher
×1Route each incoming URL to a domain-specific crawler through a central dispatcher mapping URL patterns to registered crawler classes.
Mobile UI Agent
×1Drive a smartphone end-to-end through a small, touch-native action vocabulary (tap, long-press, swipe, type, back, home) over screenshots, as a distinct interaction surface from desktop Computer Use…
Tool Discovery
×1Let the agent discover available tools at runtime rather than hardcoding the tool list at agent build time.
Tool Transition Fusion
×1Mine tool-call telemetry for high-probability X-then-Y transitions and fuse those pairs into a single composite tool, shrinking the planner's step count.
Agent Adapter
An interface layer connecting an agent's tool-calling protocol to heterogeneous external tools, normalizing their schemas into one the agent expects.
Augmented LLM
Build the foundational agent block as an LLM augmented with retrieval, tools, and memory that the model actively chooses to use, rather than a bare-model call.
Translation Layer
Insert a typed boundary between the agent's clean domain model and a messy or legacy external API.
Agent-Initiated Payment
Give an agent a bounded wallet so it can settle a payment mid-request to unlock a resource — answering a payment-required challenge with a verifiable proof — instead of routing every purchase through…
Hierarchical Tool Selection
Organise tools into a tree of categories so the agent first picks a branch and then a specific tool within it.
MCP Bidirectional Bridge
Run a framework as both MCP client (consuming external MCP servers as tools) and MCP server (publishing its own agents, tools, and workflows back over MCP) so capabilities flow both directions across…
MCP-as-Code-API
Materialize MCP servers as a directory of typed code wrappers so the agent writes code that imports them and large tool outputs flow between calls inside the sandbox without ever entering the model's…
Policy-Localizer-Validator
Split a GUI agent into three specialist models — a Policy that plans, a Localizer that grounds elements to pixels, and a Validator that judges completion — so each role uses the smallest sufficient m…
Tool Search Lazy Loading
Defer loading tool schemas into the context window until a search step shows they are needed.
Tool/Agent Registry
Maintain a single queryable catalogue of both available tools and available agents, with metadata (capability, cost, latency, quality) the agent can use to pick the right one for a task.
App Exploration Phase
Before deploying an agent against an opaque app, have it explore (or watch a human demonstrate) the app, generating a per-element documentation knowledge base; at deployment, retrieve element docs to…
Large Action Models (LAMs)
Use a model class specifically trained for action execution (tool calls, UI navigation, workflow steps) rather than text generation, when the workload is dominated by reliably completing actions in r…
Synthetic Filesystem Overlay
Project heterogeneous enterprise data sources into a single Unix-like tree exposed through filesystem primitives so the agent reuses path semantics it already knows instead of learning a bespoke API…
WebAssembly Skill Runtime
Package each agent skill as a WebAssembly module with a capability manifest, and run it inside a Wasm runtime that enforces those capabilities, so untrusted skills cannot weaken the host's sandbox.
Toolformer
Train the model to learn when and how to call tools through self-supervised data, without human annotation.