← All booksBook IX

Routing & Composition

Sending requests to the right specialist.

21 patterns in this book. · Updated

↓ download as png

When to reach for each

01. Multi-Model Routing Send each request to the cheapest model that can handle it well. Best for: Cost and quality goals diverge across request types. Tradeoff: Two-model debug surface. Watch for: A single model already meets the price-performance target.

02. Fallback Chain Try a primary handler; on failure or low confidence, fall through to a sequence of fallback handlers. Best for: Single-handler failure would cascade to the user as an outage. Tradeoff: Cumulative latency on full cascade. Watch for: Only one handler exists and there is nothing to fall back to.

03. Circuit Breaker Stop calling a failing dependency for a cooldown period after error rates exceed a threshold. Best for: A dependency fails often enough that hammering it wastes cost or blocks legitimate traffic. Tradeoff: False trips degrade availability when the error was transient. Watch for: Failures are correlated across all dependencies and there is no useful fallback to route to.

04. Pipes and Filters Compose stream-shaped processing as a chain of small filters connected by pipes. Best for: A transformation can be decomposed into small filters with single responsibilities. Tradeoff: Pipeline visibility: hard to see end-to-end behaviour. Watch for: The transformation is small enough that a single function is clearer.

05. Automatic Workflow Search Treat the agent's workflow (a graph of LLM-invoking nodes) as an artefact to search; use Monte Carlo Tree Search guided by an eval benchmark to discover the best workflow, then deploy it. Best for: You have a stable eval benchmark that can score full workflows end-to-end. Tradeoff: Eval set quality bounds discovered workflow quality. Watch for: No reliable eval exists to guide the search.

All patterns in this book

Fallback Chain

×11

Try a primary handler; on failure or low confidence, fall through to a sequence of fallback handlers.

Circuit Breaker

×3

Stop calling a failing dependency for a cooldown period after error rates exceed a threshold.

Pipes and Filters

×3

Compose stream-shaped processing as a chain of small filters connected by pipes.

Automatic Workflow Search

×3

Treat the agent's workflow (a graph of LLM-invoking nodes) as an artefact to search; use Monte Carlo Tree Search guided by an eval benchmark to discover the best workflow, then deploy it.

Graceful Degradation

×2

When a dependency fails, downgrade the user-facing experience to a working subset rather than failing entirely.

Provider Fallback

×2

When one provider's API errors mid-stream, transparently switch to another provider while preserving state.

Complexity-Based Routing

×2

Estimate a request's difficulty up front and bind it to the cheapest model tier that can answer well, using an explicit complexity classifier as the routing key.

Mixture of Experts Routing

×2

Route each request to one or more domain-expert agents, where each expert holds deep capability in a narrow area.

Open-Weight Cascade

×2

Build a multi-model cascade where lower tiers are open-weight, self-hostable models that run inside the operator's boundary, and only escalations cross to a hosted frontier model — giving cost arbitr…

Parallel Tool Calls

×1

Allow the model to emit several independent tool calls in one assistant turn; the host executes them in parallel.

Parallelization

×1

Run independent LLM calls concurrently and combine results.

Prompt Chaining

×1

Decompose a task into a fixed sequence of LLM calls where each step's output becomes the next step's input.

Routing

×1

Classify an incoming request and dispatch it to the specialist (lane / agent / model) best suited to handle it.

Agent Persona Profile

×1

Treat agent identity as a structured profile object — persona, primary motivator, allowed actions, knowledge bindings — rather than a free-form role sentence in the system prompt.

Provider-String Routing

×1

Select the model and provider for a request through a single namespaced string (`provider/model`) backed by env-var credentials, so the caller specifies what to run with one parameter rather than a t…

Trust and Reputation Routing

×1

Maintain a per-agent reputation score updated from outcome quality and peer feedback, and route new tasks preferentially to high-reputation agents.

MRKL Systems (Modular Neuro-Symbolic)

Route each request through an LLM dispatcher to specialized symbolic or neural expert modules (calculator, knowledge base, code executor) rather than asking one LLM to do everything; integrate the mo…

BPMN/DMN Deterministic Shell Around Agent

BPMN processes and DMN decision tables form the deterministic spine; LLM-driven agents are invoked only at explicit 'unstructured problem' nodes inside the process.

Dynamic Scaffolding

Inject task-specific scaffolding (examples, hints, schemas) into the prompt only when the task type warrants it.

Hybrid Symbolic-Neural Routing

Per query, route between a symbolic path (rule engine, knowledge graph) and a neural path (LLM), using the LLM for interpretation and the symbolic layer for exact constraints.