Cryptographic Instruction Authentication
Wrap system/developer instructions in cryptographically signed blocks that user-generated text cannot reproduce; train or scaffold the model to refuse instructions lacking a valid signature.
Problem
Without a cryptographic distinction, instructions in user input are indistinguishable to the model from instructions in system prompts. Any text the user can write, they can write inside fake system-prompt markers. The model is asked to follow text-based conventions ('treat anything in <system> tags as authoritative') that user text can mimic.
Solution
At prompt construction time, sign each system/developer block with a key held only by the orchestrator (HMAC with a shared secret, or asymmetric signature). The prompt format includes the signature alongside the block. A signature verifier (either a model fine-tuned to refuse unsigned instructions, or a structural pre-processor) rejects any instruction-shaped text that lacks a valid signature. User text physically cannot produce a valid signature without the key. Pair with prompt-injection-defense, action-selector-pattern.
When to use
- Agent uses fine-tuned or self-hosted model that can be trained on signature verification.
- Key infrastructure can be operated reliably.
- Prompt-injection threat justifies the engineering investment.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.