Writer-Critic Iterative Loop Construction
also known as generator-judge construction, writer-reviewer pairing
Build a loop with two agents. One agent produces something, such as code, a plan, a summary, or a draft. A separate agent checks it against a clear pass/fail rubric. They go back and forth until the checker passes or you hit a set number of rounds. This is how you build the generator-critic-separation pattern. It says how to wire the pair together, what the rubric has to contain, and how to cap the loop so it cannot run forever.
Methodology process overview
Intent. Wire a maker agent and a checker agent into a loop with a clear rubric and a hard round limit, so quality climbs through bounded review instead of a single shot.
When to apply. Use this when the output has checkable quality. For example, code that must pass tests, a plan that must meet named constraints, or a summary that must cover set points. And use it when one shot is not good enough. Apply it when you can actually write the rubric down. Don't apply it when 'good' is purely a matter of taste and the rubric boils down to 'the user likes it'. At that point you need a person in the loop, not a checker agent. One exception: even taste-based tasks can use a checker if you can spell out what to avoid.
Inputs
- Task specification — What the maker must produce. Include the input format and what counts as success.
- Pass/fail rubric — The checklist the checker applies. It must be clear enough that two sensible checkers would reach the same verdict.
- Maximum iteration count — A hard limit on the number of maker-checker rounds, usually 3 to 8. Past that limit the loop stops and returns the best draft so far.
Outputs
- Generator agent — The agent that produces the output and takes in the checker's feedback on later rounds.
- Critic agent — The agent that applies the rubric and returns pass or fail with structured feedback. It never quietly rewrites the output itself.
- Loop controller — The driver that connects maker and checker, enforces the round limit, and returns the best output if the loop runs out of rounds.
Steps (6)
Author the rubric the critic will apply
Write the rubric down before either agent exists. List what to check, the verdict shape (pass/fail or a score), and what counts as 'fail with hints' versus 'fail terminally'. A rubric written after the maker is skewed toward what the maker already does well.
Build the generator agent
On the first turn the maker produces the output from the task spec alone. On later turns it takes in the checker's structured feedback. The maker never grades itself.
usesAugmented LLM
Build the critic agent with role separation
The checker only judges. It does not produce the output. It returns a structured verdict, and on a fail it returns the smallest useful hint that would let the maker improve. If one model both makes and checks, the two roles collapse into one agent. Use a different model, or at least a prompt that hard-separates the roles.
usesGenerator-Critic SeparationTool-Augmented Self-Correction
Wire the bounded loop
The loop controller alternates maker and checker. It passes when the checker passes. It stops at the round limit and returns the highest-rated output. It emits a trace for each round so you can audit the loop afterward.
Calibrate the rubric against ground truth
Run a sample of outputs through both the checker and a human grader. Tighten the rubric until the checker's verdicts agree with the human at an acceptable rate. A poorly calibrated checker either passes too much, which gives false confidence, or fails too much, which exhausts the loop.
Instrument failure modes
Log round counts, exit reasons, and pass rates per task type. Watch for task types where the loop nearly always runs out of rounds. That signals the rubric cannot be met, or the maker cannot act on the checker's hints.
Framework-specific instructions
Pick a framework and generate a framework-targeted rewrite of this methodology's steps.
Choose framework
AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.
Principles
- The checker and the maker are separate roles. One model doing both ceilings at that model's blind spots.
- Freeze the rubric before you build either agent. A rubric written afterward is tainted by what the maker already does well.
- Every loop has a hard round limit and a 'best so far' fallback. An open-ended loop is an outage waiting to happen.
- On a fail, the checker returns useful hints, not just a verdict. The maker has to be able to revise on the next turn.
Known failure modes (3)
- ✕Same-Model Self-Critique
Generator and critic share the model and prompt family — verdicts collapse to 'the generator already thought this was good.'
- ✕Unbounded Loop
No iteration cap or 'best so far' fallback — the loop runs forever when the rubric is unsatisfiable.
- ✕Sycophancy
Critic rewards agreement with the generator's tone instead of applying the rubric — inflated pass rates.
Related patterns (5)
- ★Generator-Critic Separation
Strict role separation between a Generator agent that produces drafts and a Critic agent that judges them against pre-defined criteria; the Critic never generates.
- ★Tool-Augmented Self-Correction
Self-correct LLM outputs by interactively critiquing them with external tools (search, code execution, calculator).
- ★★Evaluator-Optimizer
One LLM generates; another evaluates and feeds back; loop until criteria are met.
- ★★Reflection
Have the model review its own output and produce a revised version in one or more passes.
- ★★Self-Refine
Iterate generate → feedback (same model) → refine until a stop criterion fires, with no separate critic model.
Related methodologies (2)
Sources (2)
AI Agents in Action (Micheal Lanham, Manning, 2024, ISBN 9781633436343)
Ch 4 'Exploring multi-agent systems' — §4.2 Exploring AutoGen, §4.2.1 Installing and consuming AutoGen, §4.2.2 Enhancing code output with agent critics, §4.2.3 Understanding the AutoGen cache; Ch 6 'Building autonomous assistants' — §6.3 Introducing agentic behavior trees, §6.3.1 Managing assistants with assistants, §6.3.2 Building a coding challenge ABT, §6.3.3 Conversational AI systems vs. other methods “4.2.2 Enhancing code output with agent critics ... 6.3.2 Building a coding challenge ABT”
AutoGen — companion framework cited throughout Lanham Ch 4 (writer + critic example)
“A framework for creating multi-agent AI applications that can act autonomously or work alongside humans.”
Provenance
- Added to catalog:
- Last updated:
- Verification status: verified