Methodology · Multi-Agent Designprovenverified

Writer-Critic Iterative Loop Construction

also known as generator-judge construction, writer-reviewer pairing

Applies to: multi-agent-systemagentcoding-agent

Tags: writer-criticiterative-looprubricbounded-iteration

Build a loop with two agents. One agent produces something, such as code, a plan, a summary, or a draft. A separate agent checks it against a clear pass/fail rubric. They go back and forth until the checker passes or you hit a set number of rounds. This is how you build the generator-critic-separation pattern. It says how to wire the pair together, what the rubric has to contain, and how to cap the loop so it cannot run forever.

Methodology process overview

sequenceDiagram participant U as Caller participant L as Loop controller participant W as Writer (generator) participant C as Critic (judge) U->>L: task spec + rubric + max_iters L->>W: generate(task spec) W-->>L: draft 1 loop until pass or cap L->>C: judge(draft, rubric) alt pass C-->>L: pass else fail with hints C-->>L: fail + actionable hints L->>W: revise(draft, hints) W-->>L: next draft end end L-->>U: passing artefact or best-so-far

Intent. Wire a maker agent and a checker agent into a loop with a clear rubric and a hard round limit, so quality climbs through bounded review instead of a single shot.

When to apply. Use this when the output has checkable quality. For example, code that must pass tests, a plan that must meet named constraints, or a summary that must cover set points. And use it when one shot is not good enough. Apply it when you can actually write the rubric down. Don't apply it when 'good' is purely a matter of taste and the rubric boils down to 'the user likes it'. At that point you need a person in the loop, not a checker agent. One exception: even taste-based tasks can use a checker if you can spell out what to avoid.

Example scenario

A small platform team is building an internal 'release-notes drafter' agent. It reads a list of merged PRs and produces release notes the docs team will accept without rewriting. A single shot gave drafts that were technically correct but wrong in tone, and the docs team rewrote 80% of them. The team decided to wire a maker-checker loop. They wrote the rubric first, before either agent existed. Every entry must do four things. Name the user-facing capability, not the internal module. Link to the relevant PR. Avoid 'refactor' and 'improvement' as standalone bullets. And stay under 25 words. The verdict had three shapes: pass, fail-with-hints, and fail-terminally. The maker was Sonnet with a release-notes system prompt. The checker was a different model, GPT-4.1, with a strict rubric-only prompt that emits structured JSON. The round limit was 4, keeping the highest-rated draft if the loop ran out. To calibrate, they ran 30 historical PR lists through the loop and had the docs lead grade the results. The checker agreed with the docs lead 26 times out of 30. The 4 disagreements traced to two unclear rubric items, which they tightened. After launch, 70% of notes pass on round 2, 22% on round 3, and 8% hit the limit. The ones that hit the limit were almost all huge PR batches where the rubric could not be met in 25 words, so they added a 'split batch' path. They had earlier had a sycophancy bug where the checker praised drafts just for matching tone. Without the round limit, that bug would have looped forever. With the limit, the worst case was a slightly-off draft, not an outage.

Inputs

Task specification — What the maker must produce. Include the input format and what counts as success.
Pass/fail rubric — The checklist the checker applies. It must be clear enough that two sensible checkers would reach the same verdict.
Maximum iteration count — A hard limit on the number of maker-checker rounds, usually 3 to 8. Past that limit the loop stops and returns the best draft so far.

Outputs

Generator agent — The agent that produces the output and takes in the checker's feedback on later rounds.
Critic agent — The agent that applies the rubric and returns pass or fail with structured feedback. It never quietly rewrites the output itself.
Loop controller — The driver that connects maker and checker, enforces the round limit, and returns the best output if the loop runs out of rounds.

Steps (6)

Author the rubric the critic will apply
Write the rubric down before either agent exists. List what to check, the verdict shape (pass/fail or a score), and what counts as 'fail with hints' versus 'fail terminally'. A rubric written after the maker is skewed toward what the maker already does well.
Build the generator agent
On the first turn the maker produces the output from the task spec alone. On later turns it takes in the checker's structured feedback. The maker never grades itself.
usesAugmented LLM
Build the critic agent with role separation
The checker only judges. It does not produce the output. It returns a structured verdict, and on a fail it returns the smallest useful hint that would let the maker improve. If one model both makes and checks, the two roles collapse into one agent. Use a different model, or at least a prompt that hard-separates the roles.
usesGenerator-Critic Separation Tool-Augmented Self-Correction
Wire the bounded loop
The loop controller alternates maker and checker. It passes when the checker passes. It stops at the round limit and returns the highest-rated output. It emits a trace for each round so you can audit the loop afterward.
Calibrate the rubric against ground truth
Run a sample of outputs through both the checker and a human grader. Tighten the rubric until the checker's verdicts agree with the human at an acceptable rate. A poorly calibrated checker either passes too much, which gives false confidence, or fails too much, which exhausts the loop.
Instrument failure modes
Log round counts, exit reasons, and pass rates per task type. Watch for task types where the loop nearly always runs out of rounds. That signals the rubric cannot be met, or the maker cannot act on the checker's hints.

Framework-specific instructions

Pick a framework and generate a framework-targeted rewrite of this methodology's steps.

Choose framework

AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.

Principles

The checker and the maker are separate roles. One model doing both ceilings at that model's blind spots.
Freeze the rubric before you build either agent. A rubric written afterward is tainted by what the maker already does well.
Every loop has a hard round limit and a 'best so far' fallback. An open-ended loop is an outage waiting to happen.
On a fail, the checker returns useful hints, not just a verdict. The maker has to be able to revise on the next turn.

Writer-Critic Iterative Loop Construction

Methodology process overview

Steps (6)

Author the rubric the critic will apply

Build the generator agent

Build the critic agent with role separation

Wire the bounded loop

Calibrate the rubric against ground truth

Instrument failure modes

Framework-specific instructions

Principles

Known failure modes (3)

Related patterns (5)

Related methodologies (2)

Sources (2)

Provenance