XIV · Anti-PatternsAnti-pattern

Infrastructure Burst Bottleneck (Agent Scale-Out)

also known as Agent-Triggered Infra Saturation, Burst-Capacity Cliff

Anti-pattern: deploy agents whose scale-out behavior triggers sudden data-and-compute bursts that on-prem or under-provisioned cloud infrastructure cannot absorb; agents work at small scale and freeze in production.

Context

An organization moves a successful pilot agent to wide rollout. The agent's bursty workload pattern (parallel sub-agents, fan-out tool calls, large context loads) saturates underlying databases, vector stores, embedding services, or model gateways. Less than 30% of enterprises have infrastructure that flexes elastically to absorb the burst.

Problem

The agent works fine at pilot scale (10–100 RPM). At production scale (1000+ RPM) the underlying infra saturates — Postgres connection pool exhausted, vector store latency spikes, embeddings backlog grows. Agents start queueing on infra, response times grow from 5s to 5min, retries amplify the saturation. Differs from orchestrator-as-bottleneck (which is the orchestrator process); this is the *upstream-infra* saturation.

Forces

  • Agent fan-out patterns are bursty — N sub-agents call simultaneously.
  • Vector stores, embedding services, and DBs were sized for the pre-agent baseline.
  • Auto-scale rules tuned for steady traffic miss agent bursts that arrive in seconds.

Example

A research agent uses a 12-way fan-out on each query, each sub-agent embedding 50 documents. At 100 concurrent users: 60,000 embedding calls per second. The embedding service was sized for 5,000 RPS. Latency spikes from 50ms to 8s. Agents queue. Users see 10min response times. Postmortem: nobody load-tested the embedding service at projected fan-out before rollout.

Diagram

Solution

Therefore:

Map the agent's fan-out shape (number of concurrent sub-agents × calls per sub-agent × per-call infra cost). Load-test the dependency tree at projected fan-out. Provision burst capacity. Use connection pooling with circuit-breaker fallback. Throttle agent fan-out at the orchestrator when infra signals back-pressure. Pair with circuit-breaker, rate-limiting, and graceful-degradation.

What this pattern forbids. No useful constraint; the missing constraint is full-dependency-tree capacity-testing at projected agent fan-out.

And the patterns that stand alongside it, or against it —

  • complementsOrchestrator as BottleneckAnti-pattern: route all agent runs through a single-process orchestrator that becomes the system-wide concurrency ceiling.
  • complementsCircuit Breaker★★Stop calling a failing dependency for a cooldown period after error rates exceed a threshold.
  • complementsRate Limiting★★Cap the number of requests, tokens, or tool calls per user (or session) within a time window.
  • complementsGraceful Degradation★★When a dependency fails, downgrade the user-facing experience to a working subset rather than failing entirely.
  • complementsBlocking Sync Calls in Agent LoopAnti-pattern: run synchronous, blocking I/O inside the agent loop or HTTP handler, capping concurrency at the number of OS threads.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance