Verification & Reflection
Catching the model's mistakes.
27 patterns in this book. · Updated
When to reach for each
01. Evaluator-Optimizer One LLM generates; another evaluates and feeds back; loop until criteria are met. Best for: Single-shot generation tops out below the quality the task requires. Tradeoff: Cost = (generator + evaluator) x iterations. Watch for: Single-shot generation already meets quality targets.
02. Self-Refine Iterate generate → feedback (same model) → refine until a stop criterion fires, with no separate critic model. Best for: The same model can produce useful self-feedback against an explicit improvement target. Tradeoff: Reinforces same-model blind spots (Reflexion replication studies). Watch for: A different model family is available and would give independent critique.
03. Best-of-N Sampling Sample N candidate outputs and select the highest-ranked by a reward model or scorer. Best for: A scorer or reward model exists that ranks candidates better than the generator picks them. Tradeoff: Cost scales with N. Watch for: No reliable scorer is available to pick among candidates.
04. Process Reward Model Train a verifier that scores each reasoning step rather than only the final answer. Best for: Outcome-only reward reinforces shortcut reasoning that lands on the right answer the wrong way. Tradeoff: Annotation cost. Watch for: Outcome reward already produces robust generators on the target task.
05. Self-Modification Diff Gate Gate the agent's edits to its own code or rules through a separate critic persona that reviews the diff before it lands. Best for: The agent edits its own code, prompts, or rules and bad edits would be hard to reverse. Tradeoff: Critic prompt is a load-bearing artefact; bad critics are worse than no critic. Watch for: Self-modification is not part of the agent's design.
All patterns in this book
Evaluator-Optimizer
×5One LLM generates; another evaluates and feeds back; loop until criteria are met.
Self-Refine
×5Iterate generate → feedback (same model) → refine until a stop criterion fires, with no separate critic model.
Best-of-N Sampling
×5Sample N candidate outputs and select the highest-ranked by a reward model or scorer.
Process Reward Model
×3Train a verifier that scores each reasoning step rather than only the final answer.
Self-Modification Diff Gate
×3Gate the agent's edits to its own code or rules through a separate critic persona that reviews the diff before it lands.
Reflection
×2Have the model review its own output and produce a revised version in one or more passes.
Confidence Reporting
×2Surface the agent's uncertainty about its answer alongside the answer itself.
Frozen Rubric Reflection
×2Constrain reflection to a fixed, hand-authored rubric of criteria so the reviewer cannot invent new ones each run.
Reflexion
×2Have the agent write linguistic lessons from past failures and consult them in future episodes.
Prompt Variant Evaluation
×1Author multiple variants of the same prompt node, run them as a batch against a shared dataset, and let an automated evaluation flow score them so the winning variant is selected by measurement.
Deterministic-LLM Sandwich
×1Bracket every LLM call with deterministic checks on both sides.
Dimensional Synthetic Eval Set
×1Generate evaluation inputs not by free-form LLM prompting (which mode-collapses) but by enumerating tuples over explicitly named dimensions and seeding generation from each tuple.
Darwin-Gödel Self-Rewrite
×1An agent rewrites its own source code, archives every successful variant, and samples mutation parents from the archive rather than the latest version, using archive diversity as stepping-stones to e…
Echo Recognition
×1Recognize human message repetition as emphasis or a re-ask rather than as an independent input, so the agent does not produce a near-duplicate reply when the human repeats themselves.
Self-Consistency
Sample the same question multiple times at non-zero temperature and aggregate by majority or judge to mitigate hallucination.
Blind Grader with Isolated Context
Run an evaluator in a separately-allocated context window with access only to the artifact and the rubric, never the producing agent's reasoning trace, so the grader cannot be primed by the producer'…
Confidence-Checking Workflow
Always ask the agent, for each part of its output, to state its confidence and identify which parts need human verification, like triaging a junior analyst's work.
Cross-Reflection
Reflection step performed by a *different* agent or foundation model from the original generator, so critique error is decorrelated from generation error.
Generator-Critic Separation
Strict role separation between a Generator agent that produces drafts and a Critic agent that judges them against pre-defined criteria; the Critic never generates.
Human Reflection
Reflection loop that explicitly collects human feedback (not approval) on agent plans to improve them, distinct from approval gates where the human only says yes/no.
Planner-Executor-Verifier (PEV)
Triadic specialization where a planner produces the plan, an executor runs it, and a separate verifier checks each step's effects against the original goal.
Red-Team Sandbox Reproduction
Routinely re-reproduce canonical alignment-failure modes inside a sealed sandbox per release; treat the alignment regression suite as a deployment gate.
Stochastic-Deterministic Boundary (SDB)
Formalize the seam between an LLM proposal and a system action as a four-part contract — proposer, verifier, commit step, reject signal — so the contract itself, not the agent's good intent, gates si…
Tool-Augmented Self-Correction
Self-correct LLM outputs by interactively critiquing them with external tools (search, code execution, calculator).
Agentic Context Engineering Playbook
Treat the agent's system prompt and long-lived memory as a structured, item-addressable playbook that evolves through small delta updates from a Generator/Reflector/Curator loop, so accumulated tacti…
Commitment Tracking
Extract stated intents from each agent turn into a structured ledger with open / followed-through / expired status, making the gap between promise and follow-through visible and auditable.
World Model as Tool
Let a planning agent invoke a generative world model as a tool to roll out hypothetical futures before committing to an action, treating the world model as a callable simulator rather than a training…