Constitutional Charter
also known as Immutable Constitution, Negative Constraints, Robot Laws
Define rules the agent reads every turn but cannot modify, encoding inviolable boundaries.
This pattern helps complete certain larger patterns —
- used-bySelf-Modification Diff Gate·— Gate the agent's edits to its own code or rules through a separate critic persona that reviews the diff before it lands.
- used-byRefusal★★— Explicitly refuse requests that fall outside the agent's scope, capability, or policy boundaries.
Context
A team runs an agent that has access to its own configuration — system prompts, memory files, tool definitions — and is expected to refine those over time as it learns. Some constraints, though, are non-negotiable: never give medical dosage advice, never reveal another customer's data, never spend more than a certain amount without approval. Those constraints need to survive jailbreak attempts, accidental self-edits, and the slow drift of long-running self-modification.
Problem
If the agent has write access to its own rules, then any successful jailbreak prompt or any sufficiently confused turn can simply rewrite the rules and the inviolable constraints stop being inviolable. Telling the model in prose that certain rules are immutable does not enforce immutability — the model is the very thing being asked to police itself, and it can be talked out of any prose instruction. A naive design either accepts that the agent's values are fluid (and trusts the model not to drift) or refuses to give the agent any self-modification ability at all.
Forces
- Charter authors must encode hard constraints without paralysing the agent.
- Read-only at the tool layer is enforceable; read-only by exhortation is not.
- Charters age; updating requires human action.
Example
A consumer-facing agent has a system prompt with rules like 'never give medical dosage advice' and 'never reveal customer PII'. A jailbreak prompt convinces the agent to rewrite its own instructions and the rules dissolve. The team extracts those rules into a Constitutional Charter: a separate, read-only document the agent re-reads each turn but cannot edit, and the surrounding harness rejects any reasoning that contradicts it. The agent can be coaxed into many things but no longer into editing its own values.
Diagram
Solution
Therefore:
A charter file is read into context every turn (or every tick). The tool layer enforces read-only on it; the agent has no write tool that can touch it. Updates go through an explicit operator path. Charters typically express constraints in negative form ('the agent shall not...').
What this pattern forbids. The agent cannot write the charter; updates require explicit operator action outside the agent loop.
And the patterns that stand alongside it, or against it —
- complementsQuorum on Mutation·— Require multiple consecutive ticks (or runs) to agree before a mutation to durable state lands.
- alternative-toPrompt Bloat✕— Anti-pattern: every bug fix adds a sentence to the system prompt; nothing is ever removed.
- complementsSovereign Inference Stack★— Run the entire agent stack (model weights, inference, tool layer, vector stores, logs) inside a jurisdictional and operational boundary the operator controls, so no request, prompt, or output crosses into a third-party API.
- composes-withWorld-Model Separation★— Maintain an explicit, surprise-updated model of the environment (humans, repos, services, capabilities) in a separate file from the agent's self-model, so the two cannot be confused or co-mutated by reflection.
- alternative-toPolicy-as-Code Gate★— Evaluate every proposed agent action against externally-managed machine-readable policies before dispatch, so compliance authorship lives outside the prompt and outside the agent code.
- complementsPersonality Variant Overlay·— Let one agent speak in several named voices that overlay the base identity rather than replacing it, so the agent can shift register without losing identity continuity or splitting into separate personas.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.