Tool Search Lazy Loading

also known as Lazy Tool Loading, On-Demand Tool Schema Loading, ToolSearch Primitive

Defer loading tool schemas into the context window until a search step shows they are needed.

Context

A team is running an agent connected to many Model Context Protocol (MCP) servers, plugin endpoints, or API gateways, where the combined tool catalogue holds fifty or more tools. The full set of tool schemas, if loaded eagerly into the system prompt, would consume a substantial fraction of the context window before the user has even spoken.

Problem

Injecting every available tool definition into the system prompt up front spends tokens on tools that will never be used in this session, slows every request through the larger prompt, and forces the model to pick a relevant tool out of a long list of mostly irrelevant ones. Static per-request loadouts can help but require choosing the subset before the user's intent is fully known. There is no way to keep a large catalogue discoverable without paying for all of it on every call.

Forces

Tool definitions are large; a catalogue of 50+ tools can dominate the prompt budget.
The model needs enough description to pick the right tool, but only when it is actually about to call one.
Searching for tools at runtime adds an extra round trip before the first tool call.
Hidden tools must still be discoverable — otherwise the model behaves as if they do not exist.

Example

An assistant is wired to seven MCP servers exposing 60 tools combined. Preloading every schema costs roughly 30k tokens before the user has even spoken. Instead the host advertises only a ToolSearch tool plus a one-line index. When the user asks to file a Linear ticket, the model calls ToolSearch with the query "linear issue create", receives the schema for two relevant tools, and only then calls the real create-issue tool. The other 58 tools never enter the context.

Diagram

sequenceDiagram participant Model participant Host participant Index as Tool Index participant Tool Model->>Host: ToolSearch("linear issue create") Host->>Index: rank by query Index-->>Host: matching schemas Host-->>Model: schemas for 2 tools Model->>Host: call create_issue(…) Host->>Tool: invoke Tool-->>Host: result Host-->>Model: result

Solution

Therefore:

Replace the eager tool list with a single search primitive (for example a ToolSearch tool) that returns matching tool schemas by query. The system prompt lists only the search primitive plus a short index of tool names or categories. When the model decides it needs a tool, it calls the search primitive, receives the full schema for the matching tools, and only then calls the tool by name. Schemas loaded by search are kept in context for the rest of the session so repeat use does not pay the lookup cost again.

What this pattern forbids. Tool schemas are not in context until the search primitive has returned them; the model may not call a tool whose schema has not yet been loaded by search or preloaded by the host.

The smaller patterns that complete this one —

usesModel Context Protocol★★— Standardise how agents discover and call tools so that a tool written once is usable by any conformant agent.

And the patterns that stand alongside it, or against it —

alternative-toTool Loadout★★— Select a small task-relevant subset of available tools per request rather than exposing the full registry to the model.
complementsTool Discovery★— Let the agent discover available tools at runtime rather than hardcoding the tool list at agent build time.
complementsContext Window Packing★★— Choose what fits in the context window each turn given a fixed token budget.
complementsMCP-as-Code-API★— Materialize MCP servers as a directory of typed code wrappers so the agent writes code that imports them and large tool outputs flow between calls inside the sandbox without ever entering the model's context window.
alternative-toTool Loadout Hot-Swap✕— Anti-pattern: add or remove tool definitions during a running task so the tool set the model sees changes from turn to turn.
alternative-toDependency-Aware Skill Retrieval·— Retrieve from a large skill library by returning each relevant skill together with its prerequisite dependency closure as an ordered subgraph, so the bundle the agent receives is executable rather than topically relevant but incomplete.
complementsRetrieval-Saturation Tool Attack✕— Anti-pattern: trust a tool-retrieval layer to surface tools, while an adversary injects a few crafted tools whose embeddings cover the query space and saturate the top-k, so benign tools never reach the agent's context.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in frameworks

DeerFlow 2.0 (SuperAgent harness)
first-class11 patternsResearch Agents★ emerging
Tools (including MCP-provided ones) are exposed as deferred entries the agent can see by name but cannot call until it fetches their full schema through a `tool_search` tool, keep…

References

Provenance

Source: patterns/tool-search-lazy-loading.md on GitHub · commit b7f6659 · view history
Added to catalog: 2026-05-15
Last updated: 2026-05-21
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.