Rigor Relocation

also known as Relocating Rigor, Rigor Migration, Discipline at a Higher Abstraction

Relocate verification rigor from the model loop to surrounding scaffolding (evals, judges, decision logs, policy gates) so failures are caught by the wrapper rather than the agent.

Context

A team has handed real code-writing work to coding agents. The keystrokes that used to carry the engineer's discipline — careful naming, defensive checks, hand-written tests — are now produced at a different speed and by a different author. Senior engineers worry that quality is collapsing; the productivity numbers say the opposite. Both can be true if nobody asks where the rigor went.

Problem

Treating agentic coding as if rigor itself were optional produces drift: undocumented conventions the agent re-invents each session, invariants that exist only in code review folklore, and verification that runs by hand when somebody remembers. The opposite mistake — preserving every prior practice unchanged — applies rigor at the wrong layer, so reviewers grade tokens the agent wrote on autopilot while the load-bearing decisions go unexamined. The team is forced to choose between performative discipline at the old layer and accepting that discipline has quietly left the building.

Forces

Engineering rigor does not vanish when a constraint is removed; it relocates to whichever surface still binds behaviour.
Agents read context files, configs, and tests far more reliably than they read human folklore.
Verification cost falls as compute gets cheap, so 'check it every time' becomes affordable where 'check it once at review' used to be the cap.

Example

A team's senior engineer is alarmed that nobody is hand-writing argument validation anymore — the agent just generates the function and merges. Instead of forbidding agent-authored code, the team relocates the rigor: a CLAUDE.md rule says 'public functions take validated inputs and the agent must add the type or assertion that enforces it', a policy-as-code gate rejects PRs that introduce unchecked public entry points, and an eval-as-contract case fails closed if the validation is silently dropped. The senior's discipline is still in the codebase; it just lives on three new surfaces instead of in one human's habit.

Diagram

flowchart TD L[Rigor expected inside the agent loop] --> R{Relocate each practice} R --> C[Tacit conventions to context file CLAUDE.md or AGENTS.md] R --> G[Hand-enforced invariants to machine-enforced lint and structural tests] R --> J[Verification to evals, judges, decision logs, policy gates] C --> W[Wrapper catches failures, not the agent] G --> W J --> W

Solution

Therefore:

Identify, for each existing rigor practice, which agent-readable surface now carries it, and relocate it there. Three concrete relocations: (a) tacit conventions and architecture decisions move into the agent's context file (CLAUDE.md, AGENTS.md, system prompt) so they are read every session, not learned once by a human; (b) hand-enforced invariants move into machine-enforced rules — types, assertions, schema validators, policy-as-code gates — so they bind every generated change, not only the reviewed ones; (c) periodic verification moves into continuous evaluation — eval-as-contract on every PR, agent-as-judge on trajectories, scorer-live-monitoring in production — so the bar is enforced on every change instead of every release. Pair with decision-log and provenance-ledger so the relocations are auditable.

What it gives you

Discipline survives the shift to agentic generation instead of degrading into review folklore.
Context files turn one-time onboarding into per-session enforcement.
Machine-enforced invariants catch deviations the human reviewer would miss in a 2000-line diff.
Continuous evaluation surfaces regressions on the change that caused them, not on the release that shipped them.

What it costs you

Authoring and maintaining context files is real engineering work, and stale context files actively mislead the agent.
Machine-enforced invariants are only as good as the rules; missing rules produce a false sense of safety.
Continuous evaluation has cost and calibration overhead; bad evals fail loud and block legitimate work.
Relocating the wrong practice (e.g. relocating taste to a linter) produces ritual without rigor.

What this pattern forbids. Any rigor practice the team claims to hold must be expressible on a surface the agent reads or is checked against — context file, machine-enforced rule, or continuous evaluation. Practices that live only in human habit are not counted as rigor in agentic mode.

The smaller patterns that complete this one —

usesEval as Contract★★— Treat the eval suite as the contract the agent must satisfy; releases ship only if evals pass.
usesPolicy-as-Code Gate★— Evaluate every proposed agent action against externally-managed machine-readable policies before dispatch, so compliance authorship lives outside the prompt and outside the agent code.
usesAgentic Context Engineering Playbook·— Treat the agent's system prompt and long-lived memory as a structured, item-addressable playbook that evolves through small delta updates from a Generator/Reflector/Curator loop, so accumulated tactics resist the context collapse that monolithic rewrites cause.
usesAgent-as-a-Judge★— Evaluate an agent's full trajectory (steps, tool calls, intermediate states) by another agent rather than scoring only the final output.

And the patterns that stand alongside it, or against it —

complementsSpec-Driven Loop★— Run the same prompt against a fixed spec in a deterministic outer loop until the spec is satisfied.
complementsSpec-First Agent★— Drive the agent loop from a human-authored specification document rather than free-form prompts.
complementsDecision Log★★— Persist the agent's reasoning trace alongside its actions so post-hoc review can explain why.
complementsProvenance Ledger★★— Log every agent decision and state change with enough metadata to explain or reverse it later.
complementsScorer Live Monitoring★— Score agent outputs asynchronously in production with non-blocking scorers that observe, alert, and log but do not regenerate the output.
alternative-toErrors Swept Under the Rug✕— Anti-pattern: scrub failed actions, stack traces, and error observations from the agent's own context so the trace looks clean, leaving the model with no evidence of what did not work.
alternative-toPerma-Beta✕— Anti-pattern: ship the agent in 'beta' indefinitely so that quality regressions are someone else's problem.
alternative-toAutomating a Broken Process✕— Anti-pattern: deploy agents on top of a workflow that is already dysfunctional, so the dysfunction is amplified at machine speed instead of resolved.
alternative-toAgentic Skill Atrophy✕— Anti-pattern: let agents take over routine architectural and debugging decisions in code until developers no longer form the implicit knowledge that lets them review the agent's output or recover when it fails.