Routing & Composition

Multi-Model Routing

Send each request to the cheapest model that can handle it well.

Problem

If every request is routed to the frontier model, the bill is wildly larger than it needs to be because the cheap model would have handled most of the traffic at the same quality. If every request is routed to the cheap model, the hard cases come back wrong with no signal that a better model was available. A static single-model choice forces a bad compromise, and naive escalation that always tries the cheap model first and falls back to the strong one on failure can cost more than starting with the strong model.

Solution

Combine routing (classify the request) with a per-class model preference. Routing and filter extraction go to the cheap model; the screen-aware dialog or final answer goes to the strong model. Optionally cascade: try cheap, fall back to strong if confidence is low.

When to use

  • Cost and quality goals diverge across request types.
  • A classifier can route requests to a cheap or strong model with acceptable accuracy.
  • A cascade with low-confidence fallback to the strong model is feasible.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related