Routing & Composition

Open-Weight Cascade

Build a multi-model cascade where lower tiers are open-weight, self-hostable models that run inside the operator's boundary, and only escalations cross to a hosted frontier model — giving cost arbitrage *and* sovereignty.

Problem

A simple cheap-first cascade routes the easy requests to an open-weight model and the hard ones to a hosted frontier model, which means every borderline request quietly leaks its data to a vendor outside the regulated boundary. An open-weight-only cascade keeps everything in-house but takes a noticeable capability hit on the rare hard request that really needs the frontier model. Neither extreme satisfies the operator who needs cost arbitrage on insensitive traffic and strict in-boundary processing on sensitive traffic.

Solution

Stratify requests by sensitivity *and* difficulty before routing. (1) Sensitive requests: forced down the open-weight path even if confidence is low; degrade gracefully or refuse rather than escalate. (2) Insensitive easy requests: small open-weight model. (3) Insensitive hard requests: escalate to hosted frontier model. The router enforces the sensitivity classification before any model call.

When to use

  • Sensitive requests must stay inside an operator-controlled boundary even when borderline.
  • Insensitive easy requests can be served cheaply by a small open-weight model.
  • Insensitive hard requests can be safely escalated to a hosted frontier model.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related