Complexity-Based Routing
also known as Difficulty-Aware Routing, Cost-Quality Routing, Query-Difficulty Routing
Estimate a request's difficulty up front and bind it to the cheapest model tier that can answer well, using an explicit complexity classifier as the routing key.
This pattern helps complete certain larger patterns —
- specialisesRouting★★— Classify an incoming request and dispatch it to the specialist (lane / agent / model) best suited to handle it.
- specialisesMulti-Model Routing★★— Send each request to the cheapest model that can handle it well.
Context
A team runs an agent against a heterogeneous mix of requests where some queries are trivially solvable by a small model and others genuinely need a frontier model's depth. The team already has access to several model tiers across one or more providers, and treats difficulty as the dominant driver of per-request quality and cost — orthogonal to topic, modality, or which provider hosts the weights. They are willing to pay for an extra classification step if it lets the bulk of traffic land on a cheap tier without hurting the hard cases.
Problem
Sending everything to the strong tier overpays on the easy majority of traffic. Sending everything to the cheap tier silently degrades the hard minority. Topic-based or provider-based routing does not help when two queries on the same topic differ by orders of magnitude in difficulty — 'what is 2+2' and 'prove this lemma' are both maths. Without an explicit difficulty signal, the team has no way to make spend track the property that actually matters.
Forces
- Difficulty is not directly observable; the classifier is approximating a latent variable.
- Classifier cost has to stay well under the saving it unlocks, or the routing destroys its own value.
- Misclassifying a hard query as easy is much costlier than the reverse, because the user sees a wrong answer instead of an unnecessary spend.
Example
A coding-assistant product pays frontier-model prices on every request, including 'rename this variable' and 'fix this typo'. The team trains a small complexity classifier on logged outcomes: features include prompt length, presence of multi-file context, and whether the user previously rejected a small-model answer. The classifier scores each new request and routes simple edits to an 8B open-weight model, mid-complexity tasks to a hosted mid-tier model, and hard architectural questions to a frontier model. Average cost per request drops by roughly 60%; hard-query quality on the eval set holds within 1 point of the strong-tier-only baseline.
Diagram
Solution
Therefore:
Define a small set of model tiers (small/medium/large, or open-weight/hosted-mid/hosted-frontier). Build a complexity classifier that scores each request on a difficulty axis — a learned router trained on win-rate data, a heuristic over query features (length, presence of operators, retrieval-hit count), or an LLM-judge on a cheap model. Dispatch each request to the tier matched to its score. Log per-tier outcomes and re-train the classifier on observed wins and losses. Distinct from open-weight-cascade (which tries cheap first and escalates on failure or low confidence) and multi-model-routing (which mixes class- and tier-based dispatch): here the routing decision is taken once, up front, from a difficulty signal — there is no cheap-first attempt to escalate from.
What this pattern forbids. A request reaches a tier only through the complexity classifier's decision; ad-hoc bypasses or per-call overrides are forbidden, or the routing key stops being difficulty.
And the patterns that stand alongside it, or against it —
- alternative-toOpen-Weight Cascade★— Build a multi-model cascade where lower tiers are open-weight, self-hostable models that run inside the operator's boundary, and only escalations cross to a hosted frontier model — giving cost arbitrage *and* sovereignty.
- complementsMixture of Experts Routing★— Route each request to one or more domain-expert agents, where each expert holds deep capability in a narrow area.
- complementsTopic-Based Routing★— Route inter-agent messages through named topics that agents subscribe to, instead of having senders address each other by id.
- complementsProvider-String Routing★— Select the model and provider for a request through a single namespaced string (`provider/model`) backed by env-var credentials, so the caller specifies what to run with one parameter rather than a typed provider object.
- complementsProvider Fallback★★— When one provider's API errors mid-stream, transparently switch to another provider while preserving state.
- complementsFallback Chain★★— Try a primary handler; on failure or low confidence, fall through to a sequence of fallback handlers.
- complementsAdaptive Compute Allocation★— Allocate inference-time compute (thinking tokens, samples, depth, model size) per query based on input difficulty, rather than using a fixed budget across all queries.
- alternative-toTop-Tier Model For Everything (Cost)✕— Anti-pattern: route every request through the highest-tier model regardless of difficulty, treating cost as a model-choice problem instead of a routing one.
- complementsLarge Action Models (LAMs)·— Use a model class specifically trained for action execution (tool calls, UI navigation, workflow steps) rather than text generation, when the workload is dominated by reliably completing actions in real systems.
- complementsLarge Reasoning Model (LRM) Paradigm★— Route reasoning-heavy tasks to a reasoning-tuned model that trades inference time for deliberation, rather than to a fast LLM that exhibits premature-closure.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.