Routing & Composition
Sending requests to the right specialist.
21 patterns in this book. · Updated
When to reach for each
01. Multi-Model Routing Send each request to the cheapest model that can handle it well. Best for: Cost and quality goals diverge across request types. Tradeoff: Two-model debug surface. Watch for: A single model already meets the price-performance target.
02. Fallback Chain Try a primary handler; on failure or low confidence, fall through to a sequence of fallback handlers. Best for: Single-handler failure would cascade to the user as an outage. Tradeoff: Cumulative latency on full cascade. Watch for: Only one handler exists and there is nothing to fall back to.
03. Circuit Breaker Stop calling a failing dependency for a cooldown period after error rates exceed a threshold. Best for: A dependency fails often enough that hammering it wastes cost or blocks legitimate traffic. Tradeoff: False trips degrade availability when the error was transient. Watch for: Failures are correlated across all dependencies and there is no useful fallback to route to.
04. Pipes and Filters Compose stream-shaped processing as a chain of small filters connected by pipes. Best for: A transformation can be decomposed into small filters with single responsibilities. Tradeoff: Pipeline visibility: hard to see end-to-end behaviour. Watch for: The transformation is small enough that a single function is clearer.
05. Automatic Workflow Search Treat the agent's workflow (a graph of LLM-invoking nodes) as an artefact to search; use Monte Carlo Tree Search guided by an eval benchmark to discover the best workflow, then deploy it. Best for: You have a stable eval benchmark that can score full workflows end-to-end. Tradeoff: Eval set quality bounds discovered workflow quality. Watch for: No reliable eval exists to guide the search.
All patterns in this book
Multi-Model Routing
×20Send each request to the cheapest model that can handle it well.
Fallback Chain
×11Try a primary handler; on failure or low confidence, fall through to a sequence of fallback handlers.
Circuit Breaker
×3Stop calling a failing dependency for a cooldown period after error rates exceed a threshold.
Pipes and Filters
×3Compose stream-shaped processing as a chain of small filters connected by pipes.
Automatic Workflow Search
×3Treat the agent's workflow (a graph of LLM-invoking nodes) as an artefact to search; use Monte Carlo Tree Search guided by an eval benchmark to discover the best workflow, then deploy it.
Graceful Degradation
×2When a dependency fails, downgrade the user-facing experience to a working subset rather than failing entirely.
Provider Fallback
×2When one provider's API errors mid-stream, transparently switch to another provider while preserving state.
Complexity-Based Routing
×2Estimate a request's difficulty up front and bind it to the cheapest model tier that can answer well, using an explicit complexity classifier as the routing key.
Mixture of Experts Routing
×2Route each request to one or more domain-expert agents, where each expert holds deep capability in a narrow area.
Open-Weight Cascade
×2Build a multi-model cascade where lower tiers are open-weight, self-hostable models that run inside the operator's boundary, and only escalations cross to a hosted frontier model — giving cost arbitr…
Parallel Tool Calls
×1Allow the model to emit several independent tool calls in one assistant turn; the host executes them in parallel.
Parallelization
×1Run independent LLM calls concurrently and combine results.
Prompt Chaining
×1Decompose a task into a fixed sequence of LLM calls where each step's output becomes the next step's input.
Routing
×1Classify an incoming request and dispatch it to the specialist (lane / agent / model) best suited to handle it.
Agent Persona Profile
×1Treat agent identity as a structured profile object — persona, primary motivator, allowed actions, knowledge bindings — rather than a free-form role sentence in the system prompt.
Provider-String Routing
×1Select the model and provider for a request through a single namespaced string (`provider/model`) backed by env-var credentials, so the caller specifies what to run with one parameter rather than a t…
Trust and Reputation Routing
×1Maintain a per-agent reputation score updated from outcome quality and peer feedback, and route new tasks preferentially to high-reputation agents.
MRKL Systems (Modular Neuro-Symbolic)
Route each request through an LLM dispatcher to specialized symbolic or neural expert modules (calculator, knowledge base, code executor) rather than asking one LLM to do everything; integrate the mo…
BPMN/DMN Deterministic Shell Around Agent
BPMN processes and DMN decision tables form the deterministic spine; LLM-driven agents are invoked only at explicit 'unstructured problem' nodes inside the process.
Dynamic Scaffolding
Inject task-specific scaffolding (examples, hints, schemas) into the prompt only when the task type warrants it.
Hybrid Symbolic-Neural Routing
Per query, route between a symbolic path (rule engine, knowledge graph) and a neural path (LLM), using the LLM for interpretation and the symbolic layer for exact constraints.