Multi-Model Routing
Send each request to the cheapest model that can handle it well.
Problem
If every request is routed to the frontier model, the bill is wildly larger than it needs to be because the cheap model would have handled most of the traffic at the same quality. If every request is routed to the cheap model, the hard cases come back wrong with no signal that a better model was available. A static single-model choice forces a bad compromise, and naive escalation that always tries the cheap model first and falls back to the strong one on failure can cost more than starting with the strong model.
Solution
Combine routing (classify the request) with a per-class model preference. Routing and filter extraction go to the cheap model; the screen-aware dialog or final answer goes to the strong model. Optionally cascade: try cheap, fall back to strong if confidence is low.
When to use
- Cost and quality goals diverge across request types.
- A classifier can route requests to a cheap or strong model with acceptable accuracy.
- A cascade with low-confidence fallback to the strong model is feasible.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.
Related
- Routing
- Cost Gating
- Fallback Chain
- Hero Agent
- Provider Fallback
- Hidden Mode Switching
- Dual-System GUI Agent
- Open-Weight Cascade
- Multilingual Voice Agent Stack
- Degenerate-Output Detection
- RL-Trained Conductor Orchestrator
- Provider-String Routing
- Vendor Lock-In
- Adaptive Compute Allocation
- Hybrid Symbolic-Neural Routing
- Complexity-Based Routing
- Hierarchical Retrieval
- Top-Tier Model For Everything (Cost)
- Large Action Models (LAMs)
- MRKL Systems (Modular Neuro-Symbolic)
- Large Reasoning Model (LRM) Paradigm