← All booksBook III

Tool Use & Environment

How the agent reaches outside itself.

34 patterns in this book. · Updated

↓ download as png

When to reach for each

01. Tool Use Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse. Best for: The model must affect external state or query authoritative systems. Tradeoff: Tool palette design becomes the bottleneck; bad tools propagate to every call site. Watch for: The deliverable is free prose; structuring it as a tool call is overhead.

02. Model Context Protocol Standardise how agents discover and call tools so that a tool written once is usable by any conformant agent. Best for: Tool palettes need to be portable across multiple host applications. Tradeoff: Adds a process boundary; latency and operational surface increase. Watch for: Single host, single language, no portability requirement; native function calls are simpler.

03. Code-as-Action Agent Have the agent emit a code snippet as its action each step, executed in a constrained interpreter, instead of emitting JSON tool calls; tool composition becomes function nesting and control flow inside the snippet. Best for: Tool composition is natural in code (filter, map, conditional chains) and clumsy as JSON tool calls. Tradeoff: Sandbox correctness is load-bearing; weak sandbox means arbitrary code execution. Watch for: The deployment cannot host or trust a sandboxed interpreter.

04. Code Execution Let the model emit code, run it in a sandbox, and treat the run as the answer instead of trusting the model to compute in its head. Best for: The task involves calculation, parsing, or transformations that LLMs hallucinate. Tradeoff: Sandbox security is its own engineering problem. Watch for: The task is pure language with no computation that benefits from running code.

05. Computer Use Let the model drive a desktop end-to-end via screenshots plus virtual mouse/keyboard tool calls instead of bespoke per-app APIs. Best for: The target software has no clean API and the agent must drive a real desktop visually. Tradeoff: Slow and brittle on dynamic UIs. Watch for: A clean API exists and is faster, cheaper, and more reliable than visual control.

All patterns in this book

Tool Use

×122

Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.

Model Context Protocol

×52

Standardise how agents discover and call tools so that a tool written once is usable by any conformant agent.

Code-as-Action Agent

×32

Have the agent emit a code snippet as its action each step, executed in a constrained interpreter, instead of emitting JSON tool calls; tool composition becomes function nesting and control flow insi…

Code Execution

×31

Let the model emit code, run it in a sandbox, and treat the run as the answer instead of trusting the model to compute in its head.

Computer Use

×20

Let the model drive a desktop end-to-end via screenshots plus virtual mouse/keyboard tool calls instead of bespoke per-app APIs.

Agent-Computer Interface

×19

Design the tool surface for an LLM agent specifically, with affordances different from human-facing CLIs.

Browser Agent

×17

Expose websites to the agent through a structured DOM/accessibility tree plus a small action vocabulary, sitting between raw HTML and pixel-level Computer Use.

Multilingual Voice Agent Stack

×17

Compose a voice agent as a tightly co-located pipeline of speech-to-text, language-aware LLM reasoning, and text-to-speech, where one vendor owns all three so language and dialect propagate cleanly a…

Sandbox Isolation

×13

Run agent-emitted code or actions in a contained environment with restricted filesystem, network, and process privileges.

Agent Skills

×10

Package author-time procedures (markdown + optional resources) the agent loads on demand for specific task types.

Skill Library

×7

Let the agent grow its own toolkit by writing reusable skills that subsequent runs can call.

Prompt Caching

×4

Order prompts so the unchanging prefix can be cached by the provider, cutting per-call cost and latency.

Tool Result Caching

×3

Cache the result of expensive deterministic tool calls keyed by their arguments so repeat calls within a session return immediately.

Dual-System GUI Agent

×3

Split a GUI agent into a decision model that plans and recovers from errors and a grounding model that observes pixels and emits the precise action; route each subproblem to the better-suited model.

Tool Loadout

×2

Select a small task-relevant subset of available tools per request rather than exposing the full registry to the model.

Crawler Dispatcher

×1

Route each incoming URL to a domain-specific crawler through a central dispatcher mapping URL patterns to registered crawler classes.

Mobile UI Agent

×1

Drive a smartphone end-to-end through a small, touch-native action vocabulary (tap, long-press, swipe, type, back, home) over screenshots, as a distinct interaction surface from desktop Computer Use…

Tool Discovery

×1

Let the agent discover available tools at runtime rather than hardcoding the tool list at agent build time.

Tool Transition Fusion

×1

Mine tool-call telemetry for high-probability X-then-Y transitions and fuse those pairs into a single composite tool, shrinking the planner's step count.

Agent Adapter

An interface layer connecting an agent's tool-calling protocol to heterogeneous external tools, normalizing their schemas into one the agent expects.

Augmented LLM

Build the foundational agent block as an LLM augmented with retrieval, tools, and memory that the model actively chooses to use, rather than a bare-model call.

Translation Layer

Insert a typed boundary between the agent's clean domain model and a messy or legacy external API.

Agent-Initiated Payment

Give an agent a bounded wallet so it can settle a payment mid-request to unlock a resource — answering a payment-required challenge with a verifiable proof — instead of routing every purchase through…

Hierarchical Tool Selection

Organise tools into a tree of categories so the agent first picks a branch and then a specific tool within it.

MCP Bidirectional Bridge

Run a framework as both MCP client (consuming external MCP servers as tools) and MCP server (publishing its own agents, tools, and workflows back over MCP) so capabilities flow both directions across…

MCP-as-Code-API

Materialize MCP servers as a directory of typed code wrappers so the agent writes code that imports them and large tool outputs flow between calls inside the sandbox without ever entering the model's…

Policy-Localizer-Validator

Split a GUI agent into three specialist models — a Policy that plans, a Localizer that grounds elements to pixels, and a Validator that judges completion — so each role uses the smallest sufficient m…

Tool Search Lazy Loading

Defer loading tool schemas into the context window until a search step shows they are needed.

Tool/Agent Registry

Maintain a single queryable catalogue of both available tools and available agents, with metadata (capability, cost, latency, quality) the agent can use to pick the right one for a task.

App Exploration Phase

Before deploying an agent against an opaque app, have it explore (or watch a human demonstrate) the app, generating a per-element documentation knowledge base; at deployment, retrieve element docs to…

Large Action Models (LAMs)

Use a model class specifically trained for action execution (tool calls, UI navigation, workflow steps) rather than text generation, when the workload is dominated by reliably completing actions in r…

Synthetic Filesystem Overlay

Project heterogeneous enterprise data sources into a single Unix-like tree exposed through filesystem primitives so the agent reuses path semantics it already knows instead of learning a bespoke API…

WebAssembly Skill Runtime

Package each agent skill as a WebAssembly module with a capability manifest, and run it inside a Wasm runtime that enforces those capabilities, so untrusted skills cannot weaken the host's sandbox.

Toolformer

Train the model to learn when and how to call tools through self-supervised data, without human annotation.