Crawl-Walk-Run Automation Gating
also known as progressive autonomy, autonomy tiers
Roll an agent out in three stages, with a clear gate between each one. In the first stage the agent only suggests, and a person acts. In the second stage the agent acts on internal staff, who can fix mistakes. In the third stage the agent acts directly on outside customers. Each stage is set per action type, not for the whole agent. The same agent can be in the last stage for safe read-only actions and the first stage for refunds. To move up a stage, an action type must clear a published metric bar. If its numbers drop, it moves back down on its own.
Methodology process overview
Intent. Separate what an agent can do from what it is allowed to do on its own. A system that could plausibly act gets to act only after the data earns it, one action type at a time.
When to apply. Use this for any agent that could plausibly act on its own in ways customers feel. Examples are replying to tickets, refunding orders, sending outbound messages, or changing production resources. The right fit is when one bad action does real harm and there is no reliable way to undo it. Don't apply it for read-only or sandboxed agents, where a bad action causes no harm.
Inputs
- Catalog of action types — A list of every distinct action the agent can take. Each one has its own level of risk if it goes wrong.
- Promotion bar per tier — Published targets an action type must hit to move up a stage. These cover human acceptance, internal completion, and customer outcomes.
- Per-action-type metric pipeline — Logging that records, for every action, which stage it ran at and what happened next.
Outputs
- Per-action-type tier assignment — A live map from each action type to its current stage (Crawl, Walk, or Run).
- Promotion log — A record you can audit. It shows which action types moved up or down, when, and on what evidence.
Steps (6)
Enumerate action types
List every distinct action the agent can take. 'Reply to ticket', 'issue refund up to $50', 'issue refund above $50', and 'escalate to human' are four separate types, not one.
Publish the metric bar per tier
For each stage, write down the metric and the target an action type must hit to move up. Use acceptance rate, completion rate, or customer outcome. Also set how long each stage must hold before you consider promotion.
Start every action type at Crawl
On day one, no action type starts in Walk or Run, no matter how it was built. In Crawl the agent only suggests. A human accepts, rejects, or edits each suggestion.
Promote one action type at a time
When an action type clears the Crawl-to-Walk bar, move up that one type only. The agent now acts on internal staff for that action. Everything else stays in Crawl.
Watch for regression and auto-demote
If a stage's metric drops below its bar, move that action type back down. Use a small margin so it does not bounce up and down. A demotion is not a failure. It is the system working as designed.
Advance to Run with the customer-outcome metric
Moving from Walk to Run needs more than internal acceptance. It needs a real customer outcome, such as a ticket that stays resolved, a refund that is not reversed, or a message that is not flagged. Internal acceptance is not the same as customer success.
Framework-specific instructions
Pick a framework and generate a framework-targeted rewrite of this methodology's steps.
Choose framework
AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.
Principles
- Autonomy belongs to action types, not to the agent as a whole.
- Every stage publishes both a metric bar and a hold time. The hold time stops one good week from promoting a risky action.
- Promote one action at a time. Demote automatically.
- Internal acceptance is not customer success. The Walk-to-Run gate uses a different metric from the Crawl-to-Walk gate.
Known failure modes (2)
Related patterns (3)
- ★★Human-in-the-Loop
Require explicit human approval at defined points before the agent performs an action.
- ★★Shadow Canary
Run a candidate agent version in shadow alongside the champion, comparing outputs without affecting users.
- ★Kill Switch
Provide an out-of-band control plane to halt running agent instances without redeploy.
Related compositions (2)
- recipe · abstract shapeSafety Hardening
The minimum set of constraints to put around any production agent before it touches the world: budgets, gates, charters, kill-switches, approvals.
- recipe · abstract shapeProduction LLM Platform
Stand up a production LLM/RAG system whose data pipeline, model pipeline, and inference path scale and deploy independently.
Related methodologies (1)
Sources (3)
AI Engineering
Ch 10 'AI Engineering Architecture and User Feedback' “Architecture Steps: Enhance Context, Put in Guardrails, Add Model Router and Gateway, Reduce Latency with Caches, Add Agent Patterns”
Building A Generative AI Platform (Chip Huyen, author's blog precursor to AI Engineering Ch 10)
“The initial expansion of a platform usually involves adding mechanisms to allow the system to augment each query with the necessary information ... Guardrails help reduce AI risks and protect not just your users but also you, the developer…”
AI Engineering Architecture and User Feedback (Alex Strick van Linschoten)
“User feedback systems inherently contain various biases ... negative experience bias and self-selection bias”
Provenance
- Added to catalog:
- Last updated:
- Verification status: verified