Governance & Observability

Scaffold Ablation on Model Upgrade

On each model upgrade, treat every harness component as an encoded assumption about a model weakness and ablate the components the new model no longer needs, gated by evals.

Problem

Every harness component encodes an assumption about what the model cannot do on its own, and those assumptions expire silently as models improve. Carried-over scaffolding that the new model no longer needs is not free: it is dead complexity to maintain, it adds cost and latency, and at worst it actively suppresses the stronger model's capability by forcing it down a path built for a weaker one. Because nothing fails loudly when an assumption expires, the harness only grows; no event prompts anyone to remove a component, so workarounds outlive the limitation that justified them.

Solution

Make each harness component carry the assumption it encodes ('the model cannot keep a long plan straight', 'the model will not emit valid JSON'). On a model upgrade, walk the components and stress-test each assumption against the new model: temporarily remove the component and run the eval suite. If the eval holds, the assumption has expired and the component comes out; if it regresses, the assumption survives and the component stays. Anthropic demonstrates the move concretely by deleting a sprint construct on an upgrade once the model could plan without it. The eval suite is the gate; the corresponding anti-pattern is keeping stale workaround scaffolding that now constrains the stronger model. Compose with eval-as-contract for the gate and with dynamic-scaffolding for components that should be conditional rather than removed.

When to use

A harness has accreted scaffolding across several model generations.
A model upgrade is being adopted and the team owns an eval suite to gate changes.
There is evidence or suspicion that carried-over scaffolding is suppressing the new model's capability.

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Problem

Solution

When to use

Related