Verification & Reflection

Evaluator-Optimizer

One LLM generates; another evaluates and feeds back; loop until criteria are met.

Problem

When generation and evaluation happen in one prompt the model has no incentive to disagree with itself: it produces a draft and then signs off on it. Single-shot generation tops out below what a loop with an explicit evaluator achieves, but a naive loop where the same prompt does both jobs collapses into self-approval and adds cost without quality. The team needs separate roles for proposing and judging, and a bounded loop between them, otherwise the system either fails to improve past one pass or runs forever chasing diminishing critique.

Solution

Generator produces a candidate. Evaluator scores it against criteria with feedback. Generator revises with the feedback. Loop until evaluator passes or max iterations.

When to use

Single-shot generation tops out below the quality the task requires.
An evaluator can score candidates against criteria with actionable feedback.
Iteration budget (max iterations or pass threshold) is acceptable in the latency model.

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Problem

Solution

When to use

Related