Verification & Reflection

Behavior-Pinning Test Before Agent Edit

Capture the current behaviour of agent-touchable code as golden characterization tests before an agent edits it, with load-bearing values computed deterministically and only prose left to the model, run as a regression gate.

Problem

Letting an agent edit undocumented code is risky because there is no recorded statement of what the code currently does, so a refactor has nothing to be checked against. Hand-written tests after the fact tend to encode what the model thinks the code should do, not what it did, and a suite written by the same model that wrote the change inherits the model's blind spots. The team needs a baseline that fixes the existing behaviour exactly, separates the values that must not move from the explanations that may, and fails the moment an edit drifts.

Solution

Run the legacy code against representative inputs and capture its outputs as golden fixtures while the code is still untouched, so the baseline reflects what the code does rather than what anyone believes it should do. Partition each captured output: the load-bearing values, such as totals, tier boundaries, and rounding, are produced by deterministic components and asserted for exact equality, while any model-written narration that explains or annotates the result is compared loosely or excluded from the gate. Wire the suite into the agent's loop so that finishing an edit triggers the characterization tests automatically, detected from policy keywords rather than left to the agent's discretion, and treat any exact-match failure as a blocked change. Once the refactor is proven behaviour-preserving, the golden values may be updated deliberately by a human when a rule is meant to change.

When to use

  • An agent is about to edit legacy code whose behaviour is undocumented but encodes real business rules.
  • Outputs contain load-bearing numeric or structured values that a deterministic component can compute and assert exactly.
  • The change is meant to preserve behaviour (a refactor or extension), so the existing outputs are the correct baseline.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related