Toolformer
also known as Self-Supervised Tool Learning
Train the model to learn when and how to call tools through self-supervised data, without human annotation.
This pattern helps complete certain larger patterns —
- specialisesTool Use★★— Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.
Context
A team is deploying tool use at scale and has noticed that prompt-based function-calling — telling the model in the system prompt what tools are available and hoping it calls them well — underperforms in production. They do not have a dataset of human-labelled tool-use traces showing when each tool should have been called and with what arguments, and creating one at scale is not affordable.
Problem
Prompt-based tool calling is brittle: the model often forgets to call a tool when it should, calls the wrong one, or invents wrong arguments. The natural alternative — supervised fine-tuning on tool-use traces — requires costly human-labelled data the team does not have. They need a way to teach the model when and how to call tools using only self-supervised signals derived from outputs the model can already produce, so that the training data scales without human annotation.
Forces
- Self-supervised data must distinguish helpful from unhelpful tool calls.
- The training-time tool surface diverges from runtime over time.
- Filtering noise dominates training cost.
Example
A team wants their model to call a calculator and a search tool reliably without writing thousands of human-labelled tool-use traces. They use Toolformer-style self-supervision: at training time, candidate tool calls are inserted into many contexts and scored by whether the resulting completion's perplexity drops on the gold continuation; helpful insertions become training data. The fine-tuned model learns when and how to call tools without any human annotation.
Diagram
Solution
Therefore:
Generate candidate tool calls during training. Insert each into a context. Score whether the resulting completion is improved (perplexity drop on the gold continuation). Keep helpful insertions as training data. Fine-tune the model to emit tool calls in those positions.
What this pattern forbids. Tool use is bound to positions where self-supervised filtering judged the call helpful; ungrounded tool calls are not reinforced.
And the patterns that stand alongside it, or against it —
- complementsAgent Skills★— Package author-time procedures (markdown + optional resources) the agent loads on demand for specific task types.
- alternative-toTool Discovery★— Let the agent discover available tools at runtime rather than hardcoding the tool list at agent build time.
- complementsMRKL Systems (Modular Neuro-Symbolic)★★— Route each request through an LLM dispatcher to specialized symbolic or neural expert modules (calculator, knowledge base, code executor) rather than asking one LLM to do everything; integrate the modules' results for the final response.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.