Evaluator-Optimizer
One LLM generates; another evaluates and feeds back; loop until criteria are met.
Problem
When generation and evaluation happen in one prompt the model has no incentive to disagree with itself: it produces a draft and then signs off on it. Single-shot generation tops out below what a loop with an explicit evaluator achieves, but a naive loop where the same prompt does both jobs collapses into self-approval and adds cost without quality. The team needs separate roles for proposing and judging, and a bounded loop between them, otherwise the system either fails to improve past one pass or runs forever chasing diminishing critique.
Solution
Generator produces a candidate. Evaluator scores it against criteria with feedback. Generator revises with the feedback. Loop until evaluator passes or max iterations.
When to use
- Single-shot generation tops out below the quality the task requires.
- An evaluator can score candidates against criteria with actionable feedback.
- Iteration budget (max iterations or pass threshold) is acceptable in the latency model.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.
Related
- Reflection
- Best-of-N Sampling
- Planner-Executor-Observer
- LLM-as-Judge
- Same-Model Self-Critique
- Self-Refine
- CRAG
- Dynamic Expert Recruitment
- Voting-Based Cooperation
- Planner-Generator-Evaluator Harness
- Policy-Localizer-Validator
- Blind Grader with Isolated Context
- Darwin-Gödel Self-Rewrite
- Scorer Live Monitoring
- Human Reflection
- Planner-Executor-Verifier (PEV)
- Compound Error Degradation
- Bayesian Bandit Experimentation