III · Tool Use & EnvironmentExperimental·

On-Demand Tool Synthesis

also known as Tool Creation, LLMs as Tool Makers

When no available tool fits a subtask, have the agent write, validate, and register a new tool on the spot, separating the tool-creating role from the tool-using role.

Context

An agent faces an open-ended task space, but its toolset is fixed at deployment. Sooner or later a subtask needs a capability no available tool provides — parse an unusual format, call an API with no wrapper, or run a computation the tools cannot express. The agent either gives up, fakes the result, or contorts existing tools into something brittle.

Problem

A fixed toolset cannot cover an open task space, yet shipping every conceivable tool is impossible and bloats tool selection. When the agent hits a capability gap mid-task it has no clean way forward: hallucinating a tool fails, and forcing the wrong tool produces wrong results. The agent needs a way to manufacture the missing capability as a proper, callable tool, and to do so without blindly trusting code it just wrote.

Forces

  • A fixed toolset cannot cover an open task space, but shipping every possible tool bloats discovery and selection.
  • Letting the agent author and run its own code adds an untrusted-code surface that needs validation and sandboxing.
  • A tool created for one subtask is wasted effort unless it is registered for reuse, yet over-eager tool creation clutters the registry.

Example

An agent answering data questions is asked to compute a Gini coefficient, but it has only a generic SQL tool and a calculator. No tool fits, so the creator role writes a small gini(values) function, runs it against a known example to confirm the output, and registers it. The user role then calls gini on the query result, and the tool stays available for the next inequality question.

Diagram

Solution

Therefore:

Split the work into a tool-creator and a tool-user. When the user role finds no tool fits the subtask, it hands the specification to the creator role, which writes the tool's code and interface, generates or runs a test to confirm it behaves, and registers it in the tool registry. The user role then calls the new tool exactly as it would a built-in one, and the tool persists for later reuse. Execution of synthesized code runs in a sandbox, and a tool that fails its validation check is discarded rather than registered.

What this pattern forbids. A synthesized tool may not be called until it has passed a validation check; untested generated code is never registered or invoked, and its execution is confined to a sandbox.

And the patterns that stand alongside it, or against it —

  • alternative-toSkill LibraryLet the agent grow its own toolkit by writing reusable skills that subsequent runs can call.
  • complementsCode-as-Action AgentHave the agent emit a code snippet as its action each step, executed in a constrained interpreter, instead of emitting JSON tool calls; tool composition becomes function nesting and control flow inside the snippet.
  • complementsTool DiscoveryLet the agent discover available tools at runtime rather than hardcoding the tool list at agent build time.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.