Agent Sprawl

also known as Ungoverned Agent Fleet, Agent Fleet Sprawl

Anti-pattern: every team ships its own agents while ownership, success metrics, monitoring, and a decommissioning path stay an afterthought, so the fleet outgrows governance and most agents end up unwatched, unowned, and impossible to retire.

Context

An organisation moves from a pilot agent or two to dozens. Building one is now easy enough that individual teams do it themselves: marketing stands up a content-drafting agent, sales wires one up for lead scoring, support adds a triage bot. Each is deployed to solve an immediate local problem, on whatever stack that team already uses, and there is no central record of what exists, who owns it, what it touches, or how it will eventually be turned off.

Problem

Building agents is fast and decentralised; governing them — assigning an owner, defining what success looks like, monitoring behaviour, and retiring them when they stop earning their keep — stays slow and centralised, and the gap compounds. Agents accumulate faster than anyone catalogues them, so no one can say how many are in production or what systems they reach. Most run unwatched: there is no owner to notice when one degrades, no success metric to judge it against, and no decommissioning path, so a half-finished agent keeps making autonomous decisions on sensitive systems long after the team that shipped it has moved on. The fleet becomes legacy debt that nobody fully understands and nobody is accountable for, and a single misbehaving agent can act for weeks before anyone notices.

Forces

Building an agent is now cheap and any team can do it, so creation is decentralised and effectively unbounded.
Ownership, success metrics, monitoring, and decommissioning are governance work that stays centralised and human-speed.
Each agent individually solves a real business problem, which makes the local decision to ship it look obviously correct.
The faster agents are shipped than they are owned or retired, the larger the share of the fleet that runs unwatched.

Example

Over a year a mid-sized company goes from one pilot agent to roughly seventy, each stood up by a different team to solve a local problem — content drafting, lead scoring, ticket triage, invoice matching. None was registered centrally, given a success metric, or assigned an owner beyond the engineer who built it. A security review finds that more than half run unmonitored, several still call production systems with the credentials of people who have since left, and no one can say which are still useful. The platform team responds by making an owner, a success metric, monitoring, and a decommissioning trigger mandatory at deployment, and by reconciling a central agent registry against what is actually running — putting governance back in step with how fast teams create agents.

Diagram

flowchart TD T[Teams across the org] -->|self-serve, fast| C[Ship agents to solve local problems] C --> Fleet[Growing agent fleet] G[Assign owner / metric / monitoring / retire] -.centralised, slow.-> Fleet Fleet --> U[Unwatched, unowned agents] U --> D[Acts on sensitive systems; nobody accountable] D --> L[Fleet becomes ungovernable legacy debt] classDef bad fill:#fee,stroke:#c33; class U,D,L bad;

Solution

Therefore:

Govern the agent fleet at the rate it grows. Make it a deployment gate that every production agent declares an owner, the business outcome it is accountable for, and the conditions under which it is paused or retired, and register it in a central inventory that can be reconciled against what is actually running. The order matters: the business case and an accountable owner come first and the technical platform second, because a governance tool layered onto an already-sprawling fleet only inventories the mess. Mitigation patterns: tool-agent-registry for a reconciled inventory of agents and their owners, and kill-switch for the pause-and-decommission path each agent must carry. This is the organisational, fleet-scale lifecycle failure that those per-agent controls do not by themselves prevent.

What this pattern forbids. No useful constraint; the missing constraint is fleet-scale governance that keeps pace with creation — every production agent bound to an owner, a success metric, monitoring, and a decommissioning trigger, reconciled against a central inventory.

The patterns that counter or replace it —

complementsAgent Identity Sprawl✕— Anti-pattern: an agent fleet mints non-human identities at machine speed while scoping, rotation, ownership, and revocation stay human-speed, so over-privileged long-lived credentials accumulate, outlive their agents, and widen an ungovernable attack surface.
complementsShadow AI✕— Anti-pattern: leave the corporate LLM offering so restrictive, slow, or narrow that employees bypass it with personal accounts and unapproved agent tools, creating data leakage and ungoverned tool calls that security cannot see.
complementsPerma-Beta✕— Anti-pattern: ship the agent in 'beta' indefinitely so that quality regressions are someone else's problem.
alternative-toTool/Agent Registry★— Maintain a single queryable catalogue of both available tools and available agents, with metadata (capability, cost, latency, quality) the agent can use to pick the right one for a task.
alternative-toKill Switch★— Provide an out-of-band control plane to halt running agent instances without redeploy.