RouteLLM
Research framework for training and serving LLM routers that dynamically dispatch each query between a stronger, more expensive model and a cheaper but weaker model based on a learned difficulty score and a per-request cost threshold.
Description
RouteLLM is LM-Sys's open-source framework for cost-effective LLM serving. It trains learned routers — matrix factorisation (mf), similarity-weighted Elo ranking (sw_ranking), and a BERT classifier — on preference data so that incoming queries are dispatched to a strong (expensive) or weak (cheap) model based on a predicted win-rate score against a caller-supplied cost threshold. The project ships an OpenAI-compatible server that drops in as a routing layer in front of two model endpoints, plus benchmarks documenting >2x cost reduction at fixed quality on MT-Bench, MMLU, and GSM8K. Companion paper: arXiv 2406.18665.
Solution
Routing proxy in front of two model endpoints. A trained router (mf / sw_ranking / bert) scores each incoming request for predicted strong-vs-weak win rate; if the score exceeds the request's cost threshold, the query is dispatched to the strong model, otherwise to the weak model. The response is returned through an OpenAI-compatible API. Routing is stateless per request; the router is trained offline on preference data and loaded as a model artefact.
Primary use cases
- research on learned LLM routing
- cost-aware serving across strong/weak model pairs
- drop-in OpenAI-compatible routing proxy
- benchmarking routing strategies on MT-Bench / MMLU / GSM8K
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.