III · Tool Use & EnvironmentEmerging

Tool Search Lazy Loading

also known as Lazy Tool Loading, On-Demand Tool Schema Loading, ToolSearch Primitive

Defer loading tool schemas into the context window until a search step shows they are needed.

Context

A team is running an agent connected to many Model Context Protocol (MCP) servers, plugin endpoints, or API gateways, where the combined tool catalogue holds fifty or more tools. The full set of tool schemas, if loaded eagerly into the system prompt, would consume a substantial fraction of the context window before the user has even spoken.

Problem

Injecting every available tool definition into the system prompt up front spends tokens on tools that will never be used in this session, slows every request through the larger prompt, and forces the model to pick a relevant tool out of a long list of mostly irrelevant ones. Static per-request loadouts can help but require choosing the subset before the user's intent is fully known. There is no way to keep a large catalogue discoverable without paying for all of it on every call.

Forces

  • Tool definitions are large; a catalogue of 50+ tools can dominate the prompt budget.
  • The model needs enough description to pick the right tool, but only when it is actually about to call one.
  • Searching for tools at runtime adds an extra round trip before the first tool call.
  • Hidden tools must still be discoverable — otherwise the model behaves as if they do not exist.

Example

An assistant is wired to seven MCP servers exposing 60 tools combined. Preloading every schema costs roughly 30k tokens before the user has even spoken. Instead the host advertises only a ToolSearch tool plus a one-line index. When the user asks to file a Linear ticket, the model calls ToolSearch with the query "linear issue create", receives the schema for two relevant tools, and only then calls the real create-issue tool. The other 58 tools never enter the context.

Diagram

Solution

Therefore:

Replace the eager tool list with a single search primitive (for example a ToolSearch tool) that returns matching tool schemas by query. The system prompt lists only the search primitive plus a short index of tool names or categories. When the model decides it needs a tool, it calls the search primitive, receives the full schema for the matching tools, and only then calls the tool by name. Schemas loaded by search are kept in context for the rest of the session so repeat use does not pay the lookup cost again.

What this pattern forbids. Tool schemas are not in context until the search primitive has returned them; the model may not call a tool whose schema has not yet been loaded by search or preloaded by the host.

The smaller patterns that complete this one —

  • usesModel Context Protocol★★Standardise how agents discover and call tools so that a tool written once is usable by any conformant agent.

And the patterns that stand alongside it, or against it —

  • alternative-toTool Loadout★★Select a small task-relevant subset of available tools per request rather than exposing the full registry to the model.
  • complementsTool DiscoveryLet the agent discover available tools at runtime rather than hardcoding the tool list at agent build time.
  • complementsContext Window Packing★★Choose what fits in the context window each turn given a fixed token budget.
  • complementsMCP-as-Code-APIMaterialize MCP servers as a directory of typed code wrappers so the agent writes code that imports them and large tool outputs flow between calls inside the sandbox without ever entering the model's context window.
  • alternative-toTool Loadout Hot-SwapAnti-pattern: add or remove tool definitions during a running task so the tool set the model sees changes from turn to turn.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.