III · Tool Use & EnvironmentAnti-pattern✕

Toolformer

also known as Self-Supervised Tool Learning

Train the model to learn when and how to call tools through self-supervised data, without human annotation.

This pattern helps complete certain larger patterns —

specialisesTool Use★★— Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.

Context

A team is deploying tool use at scale and has noticed that prompt-based function-calling — telling the model in the system prompt what tools are available and hoping it calls them well — underperforms in production. They do not have a dataset of human-labelled tool-use traces showing when each tool should have been called and with what arguments, and creating one at scale is not affordable.

Problem

Prompt-based tool calling is brittle: the model often forgets to call a tool when it should, calls the wrong one, or invents wrong arguments. The natural alternative — supervised fine-tuning on tool-use traces — requires costly human-labelled data the team does not have. They need a way to teach the model when and how to call tools using only self-supervised signals derived from outputs the model can already produce, so that the training data scales without human annotation.

Forces

Self-supervised data must distinguish helpful from unhelpful tool calls.
The training-time tool surface diverges from runtime over time.
Filtering noise dominates training cost.

Example

A team wants their model to call a calculator and a search tool reliably without writing thousands of human-labelled tool-use traces. They use Toolformer-style self-supervision: at training time, candidate tool calls are inserted into many contexts and scored by whether the resulting completion's perplexity drops on the gold continuation; helpful insertions become training data. The fine-tuned model learns when and how to call tools without any human annotation.

Diagram

flowchart TD Ctx[Training context] --> Cand[Generate candidate tool calls] Cand --> Ins[Insert into context] Ins --> Score[Score: perplexity drop on gold continuation?] Score -- helpful --> Keep[Keep as training example] Score -- not --> Drop[Drop] Keep --> FT[Fine-tune model to emit calls in those positions]

Solution

Therefore:

Generate candidate tool calls during training. Insert each into a context. Score whether the resulting completion is improved (perplexity drop on the gold continuation). Keep helpful insertions as training data. Fine-tune the model to emit tool calls in those positions.

What this pattern forbids. Tool use is bound to positions where self-supervised filtering judged the call helpful; ungrounded tool calls are not reinforced.

And the patterns that stand alongside it, or against it —

complementsAgent Skills★— Package author-time procedures (markdown + optional resources) the agent loads on demand for specific task types.
alternative-toTool Discovery★— Let the agent discover available tools at runtime rather than hardcoding the tool list at agent build time.
complementsMRKL Systems (Modular Neuro-Symbolic)★★— Route each request through an LLM dispatcher to specialized symbolic or neural expert modules (calculator, knowledge base, code executor) rather than asking one LLM to do everything; integrate the modules' results for the final response.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Toolformer: Language Models Can Teach Themselves to Use Tools
paper

Provenance

Source: patterns/toolformer.md on GitHub · commit 4fa1213 · view history
Added to catalog: 2026-04-30
Last updated: 2026-05-21
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.