Framework · Model-Vendor Agents

Meta Llama Guard 3

Llama Guard 3 is a fine-tuned Llama 3.1 8B safety classifier that labels prompts and responses as safe or unsafe against a fixed hazard taxonomy.

Description

Meta Llama Guard 3 is an open-weight content-safety classifier from Meta, built by fine-tuning the Llama-3.1-8B pretrained model for safety classification. It takes a prompt or a model response and generates a safe/unsafe label plus, when unsafe, the violated hazard categories. The classification can be applied to both LLM inputs and LLM outputs, giving the host application a signal to refuse out-of-policy content.

Solution

Llama Guard is a single-pass classifier rather than an agent loop: it takes a conversation prompt or response and emits a safe/unsafe label with violated categories, and the calling application decides whether to allow, block, or refuse based on that output.

Primary use cases

classifying prompts and responses as safe or unsafe
input and output content moderation for LLM apps
emitting machine-readable hazard category codes
gating tool-call and code-interpreter abuse

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.