Methodology · Iteration Management

Shadow Canary Bandit Rollout

Move an agent change through stages that widen exposure as results hold up. Run it in shadow, then on a small canary slice, then let traffic shift toward the better version. A drop in the numbers stops the rollout on its own.

Description

Roll out a change to an agent in stages that expose more users as confidence grows. First, run the new version next to the live one and compare them offline, so no user is affected (a shadow run). Next, send a small slice of real traffic to the new version at full risk (a canary). Last, let the system send more traffic to whichever version gets better results from real users. Each stage is a gate. If the numbers get worse at any stage, the rollout stops and the failing cases are saved for study. This works on one change at a time, such as a new prompt, model, or retrieval tweak, rather than on a whole action type.

When to apply

Use this for any live agent or LLM app where changes ship often and a bad change can hurt the user experience or cost. It works best when traffic is high, because the small canary slice needs enough volume to catch problems. Don't apply the traffic-shifting stage on low-volume systems, where the learning method (a multi-armed bandit) cannot gather enough signal. There, fall back to a plain shadow run plus a manual switch.

What it involves

Shadow the new build
Define the canary slice and exit criteria
Open the canary
Collect regression traces on any failure
Promote to bandit or A/B
Watch the results and auto-rollback on regression

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Description

When to apply

What it involves

Related