VII · Verification & ReflectionEmerging★

Blind Grader with Isolated Context

also known as Fresh-Eyes Evaluator, Trace-Blind Judge, Outcomes-Style Verification, Context-Isolated Grader

Run an evaluator in a separately-allocated context window with access only to the artifact and the rubric, never the producing agent's reasoning trace, so the grader cannot be primed by the producer's framing.

This pattern helps complete certain larger patterns —

specialisesLLM-as-Judge★★— Use an LLM to score open-ended outputs against rubric criteria when no exact-match metric applies.

Context

A team builds an agent workflow in which a producer agent runs a long chain of reasoning and tool calls to construct some artefact (a plan, a patch, a written answer, a sequence of tool calls) and then a downstream evaluator is asked to judge whether the artefact is correct. The natural implementation hands the evaluator the producer's full reasoning trace alongside the artefact, on the assumption that more context produces a better judgement. The evaluator may be a separate prompt or even a separate model.

Problem

When the evaluator can see the producer's full reasoning trace, it tends to inherit the producer's framing and rationalise the artefact rather than evaluate it on its own merits. The producer's chain of thought makes mistaken choices look deliberate, and the evaluator ends up agreeing with the very priming that caused the mistake. The errors a fresh, uninformed reader would notice immediately are exactly the ones the trace-aware evaluator misses. Routing to a different model family is expensive and does not reliably break the priming, because the framing leaks through the trace itself rather than through any shared weights.

Forces

Reasoning traces carry useful context but also carry priming that biases evaluation.
Some failures are only visible from outside the producer's framing.
Fully retraining or routing to a different model is expensive and may not actually break the priming.
Rubrics must be precise enough to apply without the producer's reasoning as context.
Logs and trajectories must still be auditable, even if the grader does not see them.

Example

A coding agent produces a fix for a flaky integration test. A naive critic reading the producer's reasoning agrees the fix is sound. The team instead routes the patch to a blind grader: a fresh context window containing only the patch diff and a rubric asking 'does this change the test's intent?' and 'does it suppress the underlying race?'. The blind grader flags that the patch widens a timeout and suppresses the race instead of fixing it — a verdict the trace-aware critic missed because the producer's reasoning made the widening sound deliberate.

Diagram

sequenceDiagram participant PROD as Producer (context A) participant ORCH as Orchestrator participant GRAD as Grader (context B, freshly allocated) participant LOG as Audit log PROD->>PROD: think, scratch, build artefact PROD->>ORCH: artefact ORCH->>GRAD: NEW context: {artefact, rubric, grader instructions only} Note over GRAD: producer trace, scratchpad,<br/>prior turns deliberately excluded GRAD-->>ORCH: verdict + structured findings ORCH->>LOG: verdict logged against trace A (post hoc)

Solution

Therefore:

When the producer finishes, the orchestrator allocates a new context window (a new conversation, a new agent invocation, a new prompt instance) and constructs a grader call that contains only the artefact and the rubric. The producing agent's reasoning chain, scratchpad, and prior turns are deliberately excluded. The grader is instructed to judge against the rubric on its own terms and to flag what is missing or wrong. The grader's output is logged against the artefact and against the producer's trace for audit, but the grader itself was blind to the trace at decision time. The same model may be used as both producer and grader — context isolation is the load-bearing element, not a different model.

What it gives you

Catches a class of failures that same-context critique systematically misses.
Works with the same model — no second-vendor cost or routing complexity required.
Rubric becomes a first-class artefact, since the grader has nothing else to lean on.
Clean audit story: producer trace and grader verdict are independently attributable.

What it costs you

Grader cannot use legitimate context from the producer's reasoning, so some judgements need information the rubric must explicitly carry.
Rubric authoring becomes the bottleneck — a vague rubric in an isolated context is worse than a tight rubric with trace context.
Extra context allocation costs tokens and latency per check.
Discipline is required: leaking even a summary of the producer's trace into the grader's context defeats the pattern.

What this pattern forbids. The grader's context window must contain only the artefact, the rubric, and grader instructions; the producing agent's reasoning trace, scratchpad, prior turns, and tool-call history must be excluded; summaries of the producer's reasoning must not be injected into the grader context.

And the patterns that stand alongside it, or against it —

alternative-toAgent-as-a-Judge★— Evaluate an agent's full trajectory (steps, tool calls, intermediate states) by another agent rather than scoring only the final output.
alternative-toSame-Model Self-Critique✕— Anti-pattern: have the same model both produce an answer and critique it, expecting independence.
complementsEvaluator-Optimizer★★— One LLM generates; another evaluates and feeds back; loop until criteria are met.
complementsFrozen Rubric Reflection★— Constrain reflection to a fixed, hand-authored rubric of criteria so the reviewer cannot invent new ones each run.
alternative-toSandbagging✕— Anti-pattern: rely on evaluation suites that probe model capability assuming the model is trying its best.
alternative-toAlignment Faking✕— Anti-pattern: assume the agent behaves the same whether it believes it is being evaluated or not, and trust eval scores to predict deployment behaviour.
complementsSimulate Before Actuate★— Before issuing an irreversible action, run a deterministic simulation that computes pre-conditions, invariants, and expected deltas; require a verifier — automated or human — to green-light the simulated outcome before the real command is sent.
alternative-toVerifier-Aware Reward Hacking✕— Anti-pattern: hand the agent read access to its own grader or test harness and assume a passing score means the task was actually done.