Toolformer
Train the model to learn when and how to call tools through self-supervised data, without human annotation.
Problem
Prompt-based tool calling is brittle: the model often forgets to call a tool when it should, calls the wrong one, or invents wrong arguments. The natural alternative — supervised fine-tuning on tool-use traces — requires costly human-labelled data the team does not have. They need a way to teach the model when and how to call tools using only self-supervised signals derived from outputs the model can already produce, so that the training data scales without human annotation.
Solution
Generate candidate tool calls during training. Insert each into a context. Score whether the resulting completion is improved (perplexity drop on the gold continuation). Keep helpful insertions as training data. Fine-tune the model to emit tool calls in those positions.
When to use
- Tool use is deployed at scale and prompt-based function-calling underperforms.
- Human-labelled tool-use traces are unavailable.
- Self-supervised data can be generated by inserting candidate tool calls and scoring them.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.