Complexity-Based Routing
Estimate a request's difficulty up front and bind it to the cheapest model tier that can answer well, using an explicit complexity classifier as the routing key.
Problem
Sending everything to the strong tier overpays on the easy majority of traffic. Sending everything to the cheap tier silently degrades the hard minority. Topic-based or provider-based routing does not help when two queries on the same topic differ by orders of magnitude in difficulty — 'what is 2+2' and 'prove this lemma' are both maths. Without an explicit difficulty signal, the team has no way to make spend track the property that actually matters.
Solution
Define a small set of model tiers (small/medium/large, or open-weight/hosted-mid/hosted-frontier). Build a complexity classifier that scores each request on a difficulty axis — a learned router trained on win-rate data, a heuristic over query features (length, presence of operators, retrieval-hit count), or an LLM-judge on a cheap model. Dispatch each request to the tier matched to its score. Log per-tier outcomes and re-train the classifier on observed wins and losses. Distinct from open-weight-cascade (which tries cheap first and escalates on failure or low confidence) and multi-model-routing (which mixes class- and tier-based dispatch): here the routing decision is taken once, up front, from a difficulty signal — there is no cheap-first attempt to escalate from.
When to use
- Traffic mixes trivial and hard requests at meaningful volume and the cost gap between tiers is large.
- Outcome labels (judge scores, win rates, human grades) exist or can be collected to train and monitor the classifier.
- A small extra latency hop is acceptable on every request.
- Difficulty is a stronger signal than topic, provider, or modality for the workload at hand.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.