Cross-Reflection
also known as Different-Model Reflection, Heterogeneous Critic
Reflection step performed by a *different* agent or foundation model from the original generator, so critique error is decorrelated from generation error.
This pattern helps complete certain larger patterns —
- specialisesReflection★★— Have the model review its own output and produce a revised version in one or more passes.
Context
A team uses reflection to improve agent outputs. Same-model self-critique is the default — the generator critiques its own draft. Errors in critique and errors in generation share the same blind spots when the same model performs both.
Problem
Self-critique by the same model misses correlated failure modes: the generator's hallucinations get reproduced in its own review of those hallucinations. After one or two iterations, the loop self-approves. The fix requires a critic with different blind spots — a different model architecture, different training data, or both.
Forces
- Same-model self-critique is cheaper (one model in production).
- Cross-model reflection requires running two models, doubling cost.
- Heterogeneous models may disagree on style/format issues that are not real errors.
Example
A legal-drafting agent uses Claude Opus as generator and GPT-5.x as cross-reflection critic. Generator drafts a contract clause. Critic reviews against a fixed checklist (governing law cited, parties named, severability present). On disagreement, the agent revises. Same-model self-critique missed a missing governing-law clause that Opus had hallucinated as 'standard'; the cross-model critic caught it because its training data weighted contract templates differently.
Diagram
Solution
Therefore:
Generator (Model A) produces draft. Critic (Model B, distinct architecture) reviews draft against named criteria. If Model B accepts, ship. If Model B rejects, either revise (back to Model A with critique) or escalate. Pair with frozen-rubric-reflection so the critic uses fixed criteria, not free-form. Distinct from same-model-self-critique and llm-as-judge (which is judge-only without iteration).
What this pattern forbids. The critic must be a different model from the generator; same-model critique falls back to same-model-self-critique.
And the patterns that stand alongside it, or against it —
- alternative-toSame-Model Self-Critique✕— Anti-pattern: have the same model both produce an answer and critique it, expecting independence.
- complementsLLM-as-Judge★★— Use an LLM to score open-ended outputs against rubric criteria when no exact-match metric applies.
- complementsFrozen Rubric Reflection★— Constrain reflection to a fixed, hand-authored rubric of criteria so the reviewer cannot invent new ones each run.
- complementsHeterogeneous-Model Council with Synthesis Judge★— Three or more role-specialized personas run on different model architectures in parallel; a synthesis judge — given only their structured JSON, not the original input — produces the final verdict.
- complementsGenerator-Critic Separation★— Strict role separation between a Generator agent that produces drafts and a Critic agent that judges them against pre-defined criteria; the Critic never generates.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.