Naive Retry Without Backoff
Anti-pattern: retry failed model or tool calls immediately, amplifying load on systems that are already failing.
Problem
The retry loop fires immediately on failure, so every instance of the agent piles onto the failing upstream at the same instant. Recovery is delayed because the upstream cannot drain its queue. When many agent instances share a backend, the retry storm itself becomes the outage. Distinct from unbounded-loop, which is about logical step counts; this is about call-attempt pacing inside one step.
Solution
Use exponential backoff (e.g. 1s, 2s, 4s, 8s, with ±25% jitter), cap attempt count (typically 3–5), and honor `Retry-After` headers. Distinguish retryable errors (5xx, 429, timeout) from non-retryable (4xx other than 429). Pair with circuit-breaker so once attempts exhaust, the agent stops calling the failing dependency entirely until a health probe succeeds.
When to use
- Never. Cite when reviewing retry implementations.
- Use exponential backoff with jitter, bounded attempts, and Retry-After respect.
- Pair with circuit-breaker for outage isolation.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.