Methodology · Iteration Managementprovenverified

Shadow Canary Bandit Rollout

also known as staged exposure rollout, shadow-to-bandit promotion

Applies to: agentllm-apprag-systemcoding-agent

Tags: shadowcanarybanditrolloutexperimentation

Roll out a change to an agent in stages that expose more users as confidence grows. First, run the new version next to the live one and compare them offline, so no user is affected (a shadow run). Next, send a small slice of real traffic to the new version at full risk (a canary). Last, let the system send more traffic to whichever version gets better results from real users. Each stage is a gate. If the numbers get worse at any stage, the rollout stops and the failing cases are saved for study. This works on one change at a time, such as a new prompt, model, or retrieval tweak, rather than on a whole action type.

Methodology process overview

Intent. Move an agent change through stages that widen exposure as results hold up. Run it in shadow, then on a small canary slice, then let traffic shift toward the better version. A drop in the numbers stops the rollout on its own.

When to apply. Use this for any live agent or LLM app where changes ship often and a bad change can hurt the user experience or cost. It works best when traffic is high, because the small canary slice needs enough volume to catch problems. Don't apply the traffic-shifting stage on low-volume systems, where the learning method (a multi-armed bandit) cannot gather enough signal. There, fall back to a plain shadow run plus a manual switch.

Inputs

  • Production traffic streamLive requests you can copy to the new version for a shadow run and split for the canary slice.
  • Reward signalA result you can measure on each request, such as a user thumbs-up, a completed task, or a downstream metric. The traffic-shifting stage uses this to decide which version to favor.
  • Regression detectorAutomatic checks that flag when a new version's reward or error rate falls below the current one.

Outputs

  • Promotion logA record you can audit. It shows which change passed which stage and on what evidence.
  • Regression trace bundleThe failing cases saved from any stage that did not pass, ready for offline debugging.
  • Traffic allocation policyThe current rule for how much traffic each version gets. This is the bandit's belief or the A/B split.

Steps (6)

  1. Shadow the new build

    Copy production traffic to the new version. Compare its outputs and metrics offline. No user is affected. This catches obvious crashes, slower responses, and large output drift before any user sees the change.

    usesShadow Canary

  2. Define the canary slice and exit criteria

    Pick a small live slice, usually 1 to 5 percent. Write down what 'pass' means before you open it: the minimum sample size, the regression allowed on each metric, and the time window. Freeze these rules first.

  3. Open the canary

    Send the small slice to the new version at full risk. Watch the error rate, the latency, the reward signal, and how often guardrails trip.

    usesShadow CanaryScorer Live Monitoring

  4. Collect regression traces on any failure

    If the canary misses any of its pass rules, stop the rollout and save the failing cases in a bundle. That bundle becomes a new test for the next round.

    usesDecision LogLineage Tracking

  5. Promote to bandit or A/B

    When the canary clears its bar, hand traffic control to a learning method that shifts traffic toward the better version (a Bayesian bandit), or a simple fixed A/B split. It gives the new version more traffic as its results pull ahead.

    usesBayesian Bandit Experimentation

  6. Watch the results and auto-rollback on regression

    Keep watching the bandit's view of each version. If the new version's reward clearly drops below the current one, the bandit moves traffic away from it on its own. Add a firm rollback threshold to fully retire the build.

Framework-specific instructions

Pick a framework and generate a framework-targeted rewrite of this methodology's steps.

Choose framework

AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.

Principles

  • Exposure widens only on evidence. Every stage gates the next.
  • A regression is a useful output. Save it as traces and fold it into the test set. Do not just roll back.
  • A learning method (bandit) replaces a fixed A/B split once traffic is high enough. Below that, A/B or a canary with a manual switch is the honest choice.
  • Every stage has a pass rule. Freeze it before the stage opens.

Known failure modes (3)

Related patterns (5)

Related compositions (2)

Related methodologies (2)

Sources (2)

Provenance

  • Added to catalog:
  • Last updated:
  • Verification status: verified