IX · Routing & CompositionMature★★

Circuit Breaker

also known as Failure Trip, Rate-Limit Trip

Stop calling a failing dependency for a cooldown period after error rates exceed a threshold.

This pattern helps complete certain larger patterns —

  • used-byGraceful Degradation★★When a dependency fails, downgrade the user-facing experience to a working subset rather than failing entirely.

Context

An agent calls external services as part of every request — third-party APIs, vector databases, model providers, internal microservices — and those dependencies fail from time to time through rate limiting, vendor outages, regional incidents, or transient bugs. The agent itself does not control when these failures happen, but it does control how it reacts when one of them starts returning errors. Retries are the natural first instinct because most transient errors clear on their own.

Problem

When a dependency is genuinely down or rate-limited, naive retry logic hammers it with the same failing call over and over, burning token budget and wall-clock latency on responses that will never succeed. Worse, the retry storm can push a partially-degraded vendor past its rate limits and block legitimate traffic from other tenants, turning a small incident into a larger one. The team has no way to give the upstream a chance to recover without a coordinated decision to back off.

Forces

  • Threshold tuning trades fast detection for false trips.
  • Cooldown duration trades availability for stability.
  • Per-endpoint vs global breakers differ on blast radius.

Example

A tool-using agent calls a third-party enrichment API that suddenly starts returning 500s. Without protection it retries every call, burning token budget on failed responses and tripping the vendor's per-key rate limit. The team puts a Circuit Breaker in front of the tool: once the error rate over the last minute exceeds 30%, the breaker opens and short-circuits subsequent calls with a structured 'dependency unavailable' result for sixty seconds before probing again. Cost stops climbing and the agent can pivot to a fallback strategy.

Diagram

Solution

Therefore:

Track per-dependency error rate over a window. When error rate exceeds a threshold, 'open' the breaker: route calls to fallback (or fail fast) for a cooldown. After cooldown, allow trial calls; close the breaker on success.

What this pattern forbids. When the breaker is open, the dependency must not be called; only fallback paths may run.

The smaller patterns that complete this one —

  • generalisesDegenerate-Output DetectionDetect when the agent is about to emit a near-duplicate of its own recent output and either drop, replace, or escalate to a stronger model rather than ship the loop.
  • generalisesTyped Tool-Loop Failure DetectorLift tool-loop detection from prompt-level rules to a mechanical dispatch-boundary veto with typed failure modes and per-tool caps that returns a formatted refusal the model must consume.

And the patterns that stand alongside it, or against it —

  • composes-withFallback Chain★★Try a primary handler; on failure or low confidence, fall through to a sequence of fallback handlers.
  • complementsRate Limiting★★Cap the number of requests, tokens, or tool calls per user (or session) within a time window.
  • complementsException Handling and Recovery★★Catch and react to predictable failure modes (tool errors, rate limits, validation failures) with structured recovery paths.
  • complementsProvider Fallback★★When one provider's API errors mid-stream, transparently switch to another provider while preserving state.
  • composes-withKill SwitchProvide an out-of-band control plane to halt running agent instances without redeploy.
  • complementsPre-Generative Loop Gate·Before the next generation fires, detect divergence signatures (narration loops, frustration paths, repetition pressure) and inject a diagnostic steering hint into the prompt rather than veto the call.
  • complementsInfrastructure Burst Bottleneck (Agent Scale-Out)Anti-pattern: deploy agents whose scale-out behavior triggers sudden data-and-compute bursts that on-prem or under-provisioned cloud infrastructure cannot absorb; agents work at small scale and freeze in production.
  • alternative-toMissing Idempotency on Agent CallsAnti-pattern: retry state-mutating agent tool calls without idempotency keys, so retries multiply real-world side effects.
  • complementsNaive Retry Without BackoffAnti-pattern: retry failed model or tool calls immediately, amplifying load on systems that are already failing.
  • complementsAgentic Behavior Tree·Borrow the behavior-tree formalism: leaves are LLM calls or tools that return success/failure; a tree of selectors and sequences orchestrates control flow.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.