Amazon Bedrock Guardrails (sensitive information filters)

Type: low-code · Vendor: AWS · Language: N/A · License: proprietary · Status: active · Status in practice: mature · First released: 2024-04-23

Links: homepage docs

Amazon Bedrock Guardrails applies configurable content, topic, word, sensitive-information, and grounding filters to both user prompts and model responses in Bedrock generative AI applications.

Description. Amazon Bedrock Guardrails is a managed safety service from AWS that screens generative AI traffic on the Bedrock platform. It evaluates input prompts and model completions against configured policies, including content filters, denied topics, word filters, sensitive information filters, and contextual grounding checks. Guardrails can be invoked inline during model inference or standalone through the ApplyGuardrail API, blocking or masking content that violates a policy.

Agent loop shape. Guardrails is an inline filtering stage rather than an agent loop: each user input and each model completion passes through the configured policy filters, and content that violates a policy is blocked or masked before it reaches the model or the user.

Primary use cases

filtering harmful prompts and responses in chatbots
redacting PII from conversation transcripts
blocking denied topics in regulated applications
detecting hallucinations in RAG responses

flowchart TD fw["Amazon Bedrock Guardrails (sensitive information filters)"] fw --> p1["Input/Output Guardrails<br/>(core)"] fw --> p2["Multimodal Guardrails<br/>(first-class)"] fw --> p3["PII Redaction<br/>(first-class)"] fw --> p4["Prompt Injection Defense<br/>(first-class)"]

Key concepts

Content filters → input-output-guardrails (docs) — Configurable strength filters over predefined harmful categories (Hate, Insults, Sexual, Violence, Misconduct, Prompt Attack) applied to text and image prompts and responses.
Denied topics (docs) — A set of application-defined subjects that the guardrail blocks if it detects them in user queries or model responses, even when the content is otherwise harmless.
Contextual grounding checks (docs) — A filter that flags or blocks model responses in RAG applications when they are not grounded in the retrieved source or are irrelevant to the user's query, used to catch hallucinations.
ApplyGuardrail API → input-output-guardrails (docs) — An API that runs the configured guardrail over arbitrary text without invoking a foundation model, so the same policy can screen content independently of inference.

Amazon Bedrock Guardrails (sensitive information filters)

Neighbourhood

Alternatives & relatives

Listed as alternative by (2)

References

Provenance