Input/Output Guardrails
Validate inputs before they reach the model and outputs before they reach the user.
Problem
Asking the model itself to police what flows in and out fails by construction: the model is the very surface being defended, and the same generation that might leak a secret is also the one being asked to refuse to leak it. A clever attacker only needs to find one phrasing that flips the model's behaviour. Without a layer outside the model that runs deterministic checks on both the input and the output path, the team is left trusting the model to be its own gatekeeper, which it provably cannot do under adversarial pressure.
Solution
Place validators on input (regex, classifier, allowlist) and output (schema, toxicity classifier, secret-redaction) paths. Compose validators per use case. On failure, exception or fallback response. Hub of pre-built validators is reusable across products.
When to use
- User inputs may carry malicious or out-of-policy content the model should not act on.
- Model outputs may carry PII, secrets, or unsafe content that must not reach users.
- Validators (regex, classifier, schema, redactor) can be composed per use case.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.
Related
- Code-Switching-Aware Agent
- Computer Use
- Dual LLM Pattern
- Lethal Trifecta Threat Model
- PII Redaction
- Prompt Injection Defense
- Refusal
- Sandbox Isolation
- Secrets Handling
- Session Isolation
- Structured Output
- Tool Output Poisoning Defense
- Tool Output Trusted Verbatim
- Proactive Goal Creator
- Policy-as-Code Gate
- Typed Refusal Codes
- Authorized Tool Misuse
- Multimodal Guardrails
- Context Minimization
- Supervisor-Plus-Gate
- Agent Middleware Chain