IX · Routing & CompositionMature★★

Provider Fallback

also known as Mid-Request Failover, Cross-Provider Recovery

When one provider's API errors mid-stream, transparently switch to another provider while preserving state.

This pattern helps complete certain larger patterns —

  • specialisesFallback Chain★★Try a primary handler; on failure or low confidence, fall through to a sequence of fallback handlers.

Context

A production agent product streams long responses to the user — multi-paragraph answers, generated code, structured documents — and is willing to integrate with more than one LLM provider to keep that experience working. The team already accepts that any single provider will have rate-limit windows, regional incidents, and the occasional mid-stream disconnect that drops the second half of a response. They control a gateway layer between the client and the upstream providers and can hold conversation state there.

Problem

A single-provider deployment is hostage to that provider's worst hour: when its stream fails halfway through a generation, the user sees a half-rendered answer followed by an error and has to start over. A request-boundary fallback chain handles the case where a whole call fails before any output, but it cannot recover a stream that began on provider A and died after some tokens were already delivered. Without mid-stream failover, the team's only options are to lose the partial output or to lock in to whichever provider was most reliable last week.

Forces

  • Provider tool-call schemas differ; cross-provider continuation needs schema translation.
  • Partial output reconciliation across providers.
  • Routing logic must not amplify provider quirks.

Example

A code-review agent product runs on a single provider whose us-east region begins returning 529 errors mid-stream during peak hours. Users see half-rendered reviews abandoned with stack traces. The team puts a gateway in front: it holds conversation state, normalises tool-call schemas across two providers, and on stream error reconnects the user to the fallback provider continuing from the last clean delta. Uptime moves from the underlying provider's SLA to the union of two providers' SLAs, and the support inbox stops filling on incident days.

Diagram

Solution

Therefore:

A gateway proxy holds the conversation state. On stream error, it switches to a fallback provider, optionally preserving partial output, and continues with translated message format. Tool-call schemas are normalised at the gateway. Streaming clients see one continuous stream.

What this pattern forbids. Clients must not see the underlying provider; only the provider-agnostic interface is exposed, and failover happens behind it.

And the patterns that stand alongside it, or against it —

  • complementsCircuit Breaker★★Stop calling a failing dependency for a cooldown period after error rates exceed a threshold.
  • complementsMulti-Model Routing★★Send each request to the cheapest model that can handle it well.
  • complementsOpen-Weight CascadeBuild a multi-model cascade where lower tiers are open-weight, self-hostable models that run inside the operator's boundary, and only escalations cross to a hosted frontier model — giving cost arbitrage *and* sovereignty.
  • complementsDegenerate-Output DetectionDetect when the agent is about to emit a near-duplicate of its own recent output and either drop, replace, or escalate to a stronger model rather than ship the loop.
  • complementsProvider-String RoutingSelect the model and provider for a request through a single namespaced string (`provider/model`) backed by env-var credentials, so the caller specifies what to run with one parameter rather than a typed provider object.
  • alternative-toVendor Lock-InAnti-pattern: couple application code directly to one model provider's SDK, request shape, and proprietary features so that switching providers requires rewriting application code rather than swapping an adapter.
  • complementsComplexity-Based RoutingEstimate a request's difficulty up front and bind it to the cheapest model tier that can answer well, using an explicit complexity classifier as the routing key.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.