III · Tool Use & EnvironmentAnti-pattern

Toolformer

also known as Self-Supervised Tool Learning

Train the model to learn when and how to call tools through self-supervised data, without human annotation.

This pattern helps complete certain larger patterns —

  • specialisesTool Use★★Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.

Context

A team is deploying tool use at scale and has noticed that prompt-based function-calling — telling the model in the system prompt what tools are available and hoping it calls them well — underperforms in production. They do not have a dataset of human-labelled tool-use traces showing when each tool should have been called and with what arguments, and creating one at scale is not affordable.

Problem

Prompt-based tool calling is brittle: the model often forgets to call a tool when it should, calls the wrong one, or invents wrong arguments. The natural alternative — supervised fine-tuning on tool-use traces — requires costly human-labelled data the team does not have. They need a way to teach the model when and how to call tools using only self-supervised signals derived from outputs the model can already produce, so that the training data scales without human annotation.

Forces

  • Self-supervised data must distinguish helpful from unhelpful tool calls.
  • The training-time tool surface diverges from runtime over time.
  • Filtering noise dominates training cost.

Example

A team wants their model to call a calculator and a search tool reliably without writing thousands of human-labelled tool-use traces. They use Toolformer-style self-supervision: at training time, candidate tool calls are inserted into many contexts and scored by whether the resulting completion's perplexity drops on the gold continuation; helpful insertions become training data. The fine-tuned model learns when and how to call tools without any human annotation.

Diagram

Solution

Therefore:

Generate candidate tool calls during training. Insert each into a context. Score whether the resulting completion is improved (perplexity drop on the gold continuation). Keep helpful insertions as training data. Fine-tune the model to emit tool calls in those positions.

What this pattern forbids. Tool use is bound to positions where self-supervised filtering judged the call helpful; ungrounded tool calls are not reinforced.

And the patterns that stand alongside it, or against it —

  • complementsAgent SkillsPackage author-time procedures (markdown + optional resources) the agent loads on demand for specific task types.
  • alternative-toTool DiscoveryLet the agent discover available tools at runtime rather than hardcoding the tool list at agent build time.
  • complementsMRKL Systems (Modular Neuro-Symbolic)★★Route each request through an LLM dispatcher to specialized symbolic or neural expert modules (calculator, knowledge base, code executor) rather than asking one LLM to do everything; integrate the modules' results for the final response.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance