Meta Llama Guard 3
Llama Guard 3 is a fine-tuned Llama 3.1 8B safety classifier that labels prompts and responses as safe or unsafe against a fixed hazard taxonomy.
Description
Meta Llama Guard 3 is an open-weight content-safety classifier from Meta, built by fine-tuning the Llama-3.1-8B pretrained model for safety classification. It takes a prompt or a model response and generates a safe/unsafe label plus, when unsafe, the violated hazard categories. The classification can be applied to both LLM inputs and LLM outputs, giving the host application a signal to refuse out-of-policy content.
Solution
Llama Guard is a single-pass classifier rather than an agent loop: it takes a conversation prompt or response and emits a safe/unsafe label with violated categories, and the calling application decides whether to allow, block, or refuse based on that output.
Primary use cases
- classifying prompts and responses as safe or unsafe
- input and output content moderation for LLM apps
- emitting machine-readable hazard category codes
- gating tool-call and code-interpreter abuse
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.