VII · Verification & ReflectionEmerging

Tool-Augmented Self-Correction

also known as Tool-Interactive Self-Correction, CRITIC

Self-correct LLM outputs by interactively critiquing them with external tools (search, code execution, calculator).

This pattern helps complete certain larger patterns —

  • specialisesReflection★★Have the model review its own output and produce a revised version in one or more passes.

Context

A team runs a large language model on a generation task where mistakes can in principle be caught by an external check: factual claims could be verified by a web search, generated code could be verified by actually running it, and arithmetic could be verified with a calculator. The agent has access to those tools but currently uses them only during drafting, not during review. After producing a draft the model is asked to self-critique, but the critique is itself a model call with no grounding outside the model's own beliefs.

Problem

When self-critique is done by the same model that produced the draft and is not allowed to consult any external tool, the critique recycles the same blind spots that produced the original error. The model that confidently asserted a wrong fact will confidently agree with itself when asked to review the assertion. Without a way to compare the draft against an outside source of truth, the iterative loop is a model talking to itself and slowly converging on whatever it believed at the start. The team needs the critic to be able to actually test claims, not just re-read them.

Forces

  • Tool selection per critique step.
  • Critique cost adds to generation cost.
  • Tools may themselves be wrong or limited.

Example

A coding agent answers 'what's the time complexity of this sort?' confidently, but its self-critique just talks itself in circles using the same blind spots that produced the answer. The team wires in a Critic equipped with external tools: the critic runs the proposed code on benchmarks, queries an algorithms reference, and uses a calculator to double-check claimed bounds. When the critic has a measurement that contradicts the draft, the agent revises against an actual signal instead of recycling its own prior.

Diagram

Solution

Therefore:

After draft generation, the model emits a critique that names suspected errors and queries tools to verify. Tool results inform the revised output. Iterate until tools find no more issues or budget exhausted.

What this pattern forbids. The critic may revise outputs only when an external tool corroborates a defect; ungrounded edits are forbidden.

The smaller patterns that complete this one —

  • usesTool Use★★Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.

And the patterns that stand alongside it, or against it —

  • alternative-toChain of VerificationReduce hallucination by drafting an answer, generating independent verification questions, answering them in isolation, and revising.
  • alternative-toPolicy-Localizer-ValidatorSplit a GUI agent into three specialist models — a Policy that plans, a Localizer that grounds elements to pixels, and a Validator that judges completion — so each role uses the smallest sufficient model.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance