Amazon Bedrock Guardrails (sensitive information filters)
Type: low-code · Vendor: AWS · Language: N/A · License: proprietary · Status: active · Status in practice: mature · First released: 2024-04-23
Amazon Bedrock Guardrails applies configurable content, topic, word, sensitive-information, and grounding filters to both user prompts and model responses in Bedrock generative AI applications.
Description. Amazon Bedrock Guardrails is a managed safety service from AWS that screens generative AI traffic on the Bedrock platform. It evaluates input prompts and model completions against configured policies, including content filters, denied topics, word filters, sensitive information filters, and contextual grounding checks. Guardrails can be invoked inline during model inference or standalone through the ApplyGuardrail API, blocking or masking content that violates a policy.
Agent loop shape. Guardrails is an inline filtering stage rather than an agent loop: each user input and each model completion passes through the configured policy filters, and content that violates a policy is blocked or masked before it reaches the model or the user.
Primary use cases
- filtering harmful prompts and responses in chatbots
- redacting PII from conversation transcripts
- blocking denied topics in regulated applications
- detecting hallucinations in RAG responses
Key concepts
- Content filters → input-output-guardrails (docs) — Configurable strength filters over predefined harmful categories (Hate, Insults, Sexual, Violence, Misconduct, Prompt Attack) applied to text and image prompts and responses.
- Denied topics (docs) — A set of application-defined subjects that the guardrail blocks if it detects them in user queries or model responses, even when the content is otherwise harmless.
- Contextual grounding checks (docs) — A filter that flags or blocks model responses in RAG applications when they are not grounded in the retrieved source or are irrelevant to the user's query, used to catch hallucinations.
- ApplyGuardrail API → input-output-guardrails (docs) — An API that runs the configured guardrail over arbitrary text without invoking a foundation model, so the same policy can screen content independently of inference.
Patterns this low-code implements —
- ★★Input/Output Guardrails
A single guardrail evaluates both the input prompts and the model completions against the defined filters, so the same policy governs what reaches the model and what reaches the user.
- ★Multimodal Guardrails
Applies the same content-filter policy categories to image content as to text, blocking harmful multimodal content across hate, insults, sexual, violence, misconduct, and prompt-attack categories.
- ★★PII Redaction
Sensitive information filters detect PII in input prompts and model responses and either block the content or mask/anonymize it by replacing each entity with its PII type.
- ★Prompt Injection Defense
A dedicated Prompt Attack content-filter category screens user input prompts for prompt-injection and jailbreak attempts and blocks them before they reach the model, so injected instructions in user-…
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.