Safety & Control

Multimodal Guardrails

Input and output guardrails that operate across modalities (vision, audio, file) rather than text only — handling e.g. malicious instructions embedded in image OCR or audio transcription.

Problem

An attacker plants prompt-injection instructions in image text the OCR will read, in audio the transcription will turn into text, in PDF metadata the file processor will surface. The text-only guardrail sees the final text but not the modality-specific transformation that introduced it. Likewise, output guardrails may check generated text but not synthesised audio or rendered images for the same policy violations.

Solution

For each modality the agent accepts: apply a modality-specific input check (image content classifier, audio-content classifier, file-type and metadata check) before the modality is transformed to text. After transformation, apply standard text guardrails. For modality outputs (synthesised image, synthesised audio): apply output-specific checks (NSFW image classifier, voice-cloning detection, watermark embedding). Pair with input-output-guardrails, prompt-injection-defense, action-selector-pattern.

When to use

  • Agent accepts non-text input modalities.
  • Agent emits non-text output modalities.
  • Threat model includes injection or policy violations via non-text channels.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related