Tool Use & Environment

Toolformer

Train the model to learn when and how to call tools through self-supervised data, without human annotation.

Problem

Prompt-based tool calling is brittle: the model often forgets to call a tool when it should, calls the wrong one, or invents wrong arguments. The natural alternative — supervised fine-tuning on tool-use traces — requires costly human-labelled data the team does not have. They need a way to teach the model when and how to call tools using only self-supervised signals derived from outputs the model can already produce, so that the training data scales without human annotation.

Solution

Generate candidate tool calls during training. Insert each into a context. Score whether the resulting completion is improved (perplexity drop on the gold continuation). Keep helpful insertions as training data. Fine-tune the model to emit tool calls in those positions.

When to use

  • Tool use is deployed at scale and prompt-based function-calling underperforms.
  • Human-labelled tool-use traces are unavailable.
  • Self-supervised data can be generated by inserting candidate tool calls and scoring them.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related