Safety & Control

Cryptographic Instruction Authentication

Wrap system/developer instructions in cryptographically signed blocks that user-generated text cannot reproduce; train or scaffold the model to refuse instructions lacking a valid signature.

Problem

Without a cryptographic distinction, instructions in user input are indistinguishable to the model from instructions in system prompts. Any text the user can write, they can write inside fake system-prompt markers. The model is asked to follow text-based conventions ('treat anything in <system> tags as authoritative') that user text can mimic.

Solution

At prompt construction time, sign each system/developer block with a key held only by the orchestrator (HMAC with a shared secret, or asymmetric signature). The prompt format includes the signature alongside the block. A signature verifier (either a model fine-tuned to refuse unsigned instructions, or a structural pre-processor) rejects any instruction-shaped text that lacks a valid signature. User text physically cannot produce a valid signature without the key. Pair with prompt-injection-defense, action-selector-pattern.

When to use

  • Agent uses fine-tuned or self-hosted model that can be trained on signature verification.
  • Key infrastructure can be operated reliably.
  • Prompt-injection threat justifies the engineering investment.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related