Framework · Model-Vendor Agents

Meta Llama Guard 3

Llama Guard 3 is a fine-tuned Llama 3.1 8B safety classifier that labels prompts and responses as safe or unsafe against a fixed hazard taxonomy.

Description

Meta Llama Guard 3 is an open-weight content-safety classifier from Meta, built by fine-tuning the Llama-3.1-8B pretrained model for safety classification. It takes a prompt or a model response and generates a safe/unsafe label plus, when unsafe, the violated hazard categories. The classification can be applied to both LLM inputs and LLM outputs, giving the host application a signal to refuse out-of-policy content.

Solution

Llama Guard is a single-pass classifier rather than an agent loop: it takes a conversation prompt or response and emits a safe/unsafe label with violated categories, and the calling application decides whether to allow, block, or refuse based on that output.

Primary use cases

  • classifying prompts and responses as safe or unsafe
  • input and output content moderation for LLM apps
  • emitting machine-readable hazard category codes
  • gating tool-call and code-interpreter abuse

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.