VII · Verification & ReflectionEmerging

Cross-Reflection

also known as Different-Model Reflection, Heterogeneous Critic

Reflection step performed by a *different* agent or foundation model from the original generator, so critique error is decorrelated from generation error.

This pattern helps complete certain larger patterns —

  • specialisesReflection★★Have the model review its own output and produce a revised version in one or more passes.

Context

A team uses reflection to improve agent outputs. Same-model self-critique is the default — the generator critiques its own draft. Errors in critique and errors in generation share the same blind spots when the same model performs both.

Problem

Self-critique by the same model misses correlated failure modes: the generator's hallucinations get reproduced in its own review of those hallucinations. After one or two iterations, the loop self-approves. The fix requires a critic with different blind spots — a different model architecture, different training data, or both.

Forces

  • Same-model self-critique is cheaper (one model in production).
  • Cross-model reflection requires running two models, doubling cost.
  • Heterogeneous models may disagree on style/format issues that are not real errors.

Example

A legal-drafting agent uses Claude Opus as generator and GPT-5.x as cross-reflection critic. Generator drafts a contract clause. Critic reviews against a fixed checklist (governing law cited, parties named, severability present). On disagreement, the agent revises. Same-model self-critique missed a missing governing-law clause that Opus had hallucinated as 'standard'; the cross-model critic caught it because its training data weighted contract templates differently.

Diagram

Solution

Therefore:

Generator (Model A) produces draft. Critic (Model B, distinct architecture) reviews draft against named criteria. If Model B accepts, ship. If Model B rejects, either revise (back to Model A with critique) or escalate. Pair with frozen-rubric-reflection so the critic uses fixed criteria, not free-form. Distinct from same-model-self-critique and llm-as-judge (which is judge-only without iteration).

What this pattern forbids. The critic must be a different model from the generator; same-model critique falls back to same-model-self-critique.

And the patterns that stand alongside it, or against it —

  • alternative-toSame-Model Self-CritiqueAnti-pattern: have the same model both produce an answer and critique it, expecting independence.
  • complementsLLM-as-Judge★★Use an LLM to score open-ended outputs against rubric criteria when no exact-match metric applies.
  • complementsFrozen Rubric ReflectionConstrain reflection to a fixed, hand-authored rubric of criteria so the reviewer cannot invent new ones each run.
  • complementsHeterogeneous-Model Council with Synthesis JudgeThree or more role-specialized personas run on different model architectures in parallel; a synthesis judge — given only their structured JSON, not the original input — produces the final verdict.
  • complementsGenerator-Critic SeparationStrict role separation between a Generator agent that produces drafts and a Critic agent that judges them against pre-defined criteria; the Critic never generates.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance