Safety & Control

Input/Output Guardrails

Validate inputs before they reach the model and outputs before they reach the user.

Problem

Asking the model itself to police what flows in and out fails by construction: the model is the very surface being defended, and the same generation that might leak a secret is also the one being asked to refuse to leak it. A clever attacker only needs to find one phrasing that flips the model's behaviour. Without a layer outside the model that runs deterministic checks on both the input and the output path, the team is left trusting the model to be its own gatekeeper, which it provably cannot do under adversarial pressure.

Solution

Place validators on input (regex, classifier, allowlist) and output (schema, toxicity classifier, secret-redaction) paths. Compose validators per use case. On failure, exception or fallback response. Hub of pre-built validators is reusable across products.

When to use

  • User inputs may carry malicious or out-of-policy content the model should not act on.
  • Model outputs may carry PII, secrets, or unsafe content that must not reach users.
  • Validators (regex, classifier, schema, redactor) can be composed per use case.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related