Infrastructure Burst Bottleneck (Agent Scale-Out)
also known as Agent-Triggered Infra Saturation, Burst-Capacity Cliff
Anti-pattern: deploy agents whose scale-out behavior triggers sudden data-and-compute bursts that on-prem or under-provisioned cloud infrastructure cannot absorb; agents work at small scale and freeze in production.
Context
An organization moves a successful pilot agent to wide rollout. The agent's bursty workload pattern (parallel sub-agents, fan-out tool calls, large context loads) saturates underlying databases, vector stores, embedding services, or model gateways. Less than 30% of enterprises have infrastructure that flexes elastically to absorb the burst.
Problem
The agent works fine at pilot scale (10–100 RPM). At production scale (1000+ RPM) the underlying infra saturates — Postgres connection pool exhausted, vector store latency spikes, embeddings backlog grows. Agents start queueing on infra, response times grow from 5s to 5min, retries amplify the saturation. Differs from orchestrator-as-bottleneck (which is the orchestrator process); this is the *upstream-infra* saturation.
Forces
- Agent fan-out patterns are bursty — N sub-agents call simultaneously.
- Vector stores, embedding services, and DBs were sized for the pre-agent baseline.
- Auto-scale rules tuned for steady traffic miss agent bursts that arrive in seconds.
Example
A research agent uses a 12-way fan-out on each query, each sub-agent embedding 50 documents. At 100 concurrent users: 60,000 embedding calls per second. The embedding service was sized for 5,000 RPS. Latency spikes from 50ms to 8s. Agents queue. Users see 10min response times. Postmortem: nobody load-tested the embedding service at projected fan-out before rollout.
Diagram
Solution
Therefore:
Map the agent's fan-out shape (number of concurrent sub-agents × calls per sub-agent × per-call infra cost). Load-test the dependency tree at projected fan-out. Provision burst capacity. Use connection pooling with circuit-breaker fallback. Throttle agent fan-out at the orchestrator when infra signals back-pressure. Pair with circuit-breaker, rate-limiting, and graceful-degradation.
What this pattern forbids. No useful constraint; the missing constraint is full-dependency-tree capacity-testing at projected agent fan-out.
And the patterns that stand alongside it, or against it —
- complementsOrchestrator as Bottleneck✕— Anti-pattern: route all agent runs through a single-process orchestrator that becomes the system-wide concurrency ceiling.
- complementsCircuit Breaker★★— Stop calling a failing dependency for a cooldown period after error rates exceed a threshold.
- complementsRate Limiting★★— Cap the number of requests, tokens, or tool calls per user (or session) within a time window.
- complementsGraceful Degradation★★— When a dependency fails, downgrade the user-facing experience to a working subset rather than failing entirely.
- complementsBlocking Sync Calls in Agent Loop✕— Anti-pattern: run synchronous, blocking I/O inside the agent loop or HTTP handler, capping concurrency at the number of OS threads.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.