Routing
also known as Mode Selector, Intent Classifier, Task Router
Classify an incoming request and dispatch it to the specialist (lane / agent / model) best suited to handle it.
This pattern helps complete certain larger patterns —
- used-bySupervisor★★— Place a coordinating agent above a set of specialised agents and route work to them.
- used-byDynamic Scaffolding★— Inject task-specific scaffolding (examples, hints, schemas) into the prompt only when the task type warrants it.
- used-byDisambiguation★★— Have the agent ask a clarifying question before acting on an ambiguous request.
- used-byTool Loadout★★— Select a small task-relevant subset of available tools per request rather than exposing the full registry to the model.
- used-byHierarchical Retrieval★★— Route a query through a multi-level cascade — coarse source or index selection, then per-source narrower retrieval, then chunk-level — so each retrieval decision is pushed to the cheapest tier that can answer it.
Context
An agent product receives a heterogeneous mix of incoming requests: short deterministic commands ("open settings"), open-ended chats with no tool use, and longer multi-step tasks that need a planner, retrieval, and several tool calls. Each kind of request benefits from a different prompt, a different tool palette, and sometimes a different model. The team has the option of building several specialist lanes behind a single front door.
Problem
If every request goes through one all-purpose prompt that can handle the hardest case, the cheap and simple requests over-pay on tokens and latency for capabilities they never use. If every request goes through a prompt tuned for cheap cases, the complex requests are stuck without the planning and tools they need and the product feels incompetent on anything non-trivial. A single shared prompt forces the team to pay for the worst case on every request or under-serve the hard cases.
Forces
- Routing itself costs a model call.
- Misrouting can be worse than not routing at all.
- The router needs visibility into capabilities of each downstream specialist.
Example
A help-desk product handles cheap FAQ lookups and rare deep-research queries through one expensive prompt; per-query cost is irrational. The team puts a small classifier in front: it returns one of `command`, `agent`, `research`, `human` and the host dispatches to the right lane. Eighty percent of traffic lands in the cheap deterministic command lane, the heavy agent only runs when needed, and average per-query cost falls by an order of magnitude.
Diagram
Solution
Therefore:
A lightweight classifier model (often the cheapest available) returns a label. The host dispatches the request to the specialist for that label. Common lanes: command (deterministic action), agent (multi-step), chat (no tools).
What this pattern forbids. A request gets exactly one lane; downstream specialists cannot accept work outside their declared lane.
The smaller patterns that complete this one —
- generalisesMulti-Model Routing★★— Send each request to the cheapest model that can handle it well.
- generalisesMixture of Experts Routing★— Route each request to one or more domain-expert agents, where each expert holds deep capability in a narrow area.
- usesAugmented LLM★★— Build the foundational agent block as an LLM augmented with retrieval, tools, and memory that the model actively chooses to use, rather than a bare-model call.
- generalisesHybrid Symbolic-Neural Routing★— Per query, route between a symbolic path (rule engine, knowledge graph) and a neural path (LLM), using the LLM for interpretation and the symbolic layer for exact constraints.
- generalisesComplexity-Based Routing★— Estimate a request's difficulty up front and bind it to the cheapest model tier that can answer well, using an explicit complexity classifier as the routing key.
And the patterns that stand alongside it, or against it —
- complementsFallback Chain★★— Try a primary handler; on failure or low confidence, fall through to a sequence of fallback handlers.
- alternative-toHero Agent✕— Anti-pattern: stuff every capability into one agent with one giant prompt.
- complementsPrompt Chaining★★— Decompose a task into a fixed sequence of LLM calls where each step's output becomes the next step's input.
- complementsTrust and Reputation Routing★— Maintain a per-agent reputation score updated from outcome quality and peer feedback, and route new tasks preferentially to high-reputation agents.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.