Verification & Reflection

Evaluator-Optimizer

One LLM generates; another evaluates and feeds back; loop until criteria are met.

Problem

When generation and evaluation happen in one prompt the model has no incentive to disagree with itself: it produces a draft and then signs off on it. Single-shot generation tops out below what a loop with an explicit evaluator achieves, but a naive loop where the same prompt does both jobs collapses into self-approval and adds cost without quality. The team needs separate roles for proposing and judging, and a bounded loop between them, otherwise the system either fails to improve past one pass or runs forever chasing diminishing critique.

Solution

Generator produces a candidate. Evaluator scores it against criteria with feedback. Generator revises with the feedback. Loop until evaluator passes or max iterations.

When to use

  • Single-shot generation tops out below the quality the task requires.
  • An evaluator can score candidates against criteria with actionable feedback.
  • Iteration budget (max iterations or pass threshold) is acceptable in the latency model.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related