NVIDIA NeMo Guardrails
Type: full-code · Vendor: NVIDIA · Language: Python · License: Apache-2.0 · Status: active · Status in practice: mature · First released: 2023
NeMo Guardrails adds programmable rails around an LLM-based conversational application so that inputs and outputs can be checked, rewritten, or rejected outside the model's own discretion.
Description. NeMo Guardrails is an open-source Python toolkit from NVIDIA for adding rule-based controls to LLM applications. Developers express rails in Colang, a flow language, and the runtime applies them at fixed stages: input rails on the incoming user message, dialog rails that steer the conversation, and output rails on the generated response. Each rail can reject, alter, or rephrase the text it inspects, so safety and topic checks run as deterministic code around the model rather than as model behavior.
Agent loop shape. A wrap-around guard pipeline: user input passes through input rails, then dialog rails select a flow and the LLM generates a response, then output rails inspect and may reject or rewrite that response before it returns to the user. Rails are deterministic Colang subflows that bracket each model call rather than a free agent loop.
Primary use cases
- topic and scope control for domain-specific assistants
- input sanitization and PII masking on user messages
- output filtering and response rewriting
- refusal of disallowed subjects
- enforcing predefined conversational flows
Key concepts
- Colang → deterministic-control-flow-not-prompt (docs) — A modeling language for conversational flows in which developers write rails and dialog flows as subflows that the runtime executes deterministically around the LLM.
- Input / dialog / output rails → input-output-guardrails (docs) — The three rail stages: input rails check the user message before the model, dialog rails steer which flow runs, and output rails check the generated response before it returns to the user.
- Self-check rails → prompt-injection-defense (docs) — Built-in rails that issue a separate LLM query to moderate input or output, including jailbreak/prompt-injection detection, before content is accepted.
Patterns this full-code implements —
- ★Supervisor-Plus-Gate
Output rails run programmable deterministic checks on the LLM's generated output and can reject it (blocking delivery) or alter it before it reaches the user or any downstream side-effect.
- ★★Input/Output Guardrails
Input rails inspect the incoming user message before it reaches the LLM and can reject it or alter it (masking sensitive data, rephrasing); output rails do the symmetric check on the generated respon…
- ★★Refusal
Programmable rails control LLM output to refuse disallowed subjects and respond in prescribed ways to specific user requests, keeping the assistant on its permitted topics.
- ★Scope-of-Practice Boundary Gate
Topical/dialog rails restrict the bot to its purpose and enforce code-defined Colang flows (user ask about X -> bot refuse to respond about X) that block off-topic professional advice such as investi…
- ★Prompt Injection Defense
An input rail screens the user message for exploitation attempts (prompt/code injection and jailbreak) before it reaches the LLM, and on a match returns a refusal rather than passing the crafted inpu…
- ★Enforced Advisory Disclaimer
Output rails run after generation and can modify the bot response programmatically; a Colang output-rail subflow rewrites $bot_message outside the model's discretion — the mechanism for attaching a n…
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.