Policy-as-Code Gate

also known as OPA Action Gate, Compiled Governance, Policy-as-Prompt, Rego-Gated Agent, External Policy Engine

Evaluate every proposed agent action against externally-managed machine-readable policies before dispatch, so compliance authorship lives outside the prompt and outside the agent code.

This pattern helps complete certain larger patterns —

used-byRigor Relocation★— Relocate verification rigor from the model loop to surrounding scaffolding (evals, judges, decision logs, policy gates) so failures are caught by the wrapper rather than the agent.
used-byScope-of-Practice Boundary Gate★— Block requests and responses that perform license-gated professional activities unless a licensed human is in the loop, enforcing the boundary in code outside the reasoning loop.
used-byReversibility-Aware Action Filter·— Insert a standing filter between the policy and the environment that estimates each proposed action's reversibility and re-samples the policy until a reversible action is chosen.

Context

A team runs an agent in a regulated or compliance-sensitive domain — banking, insurance, public-sector, critical infrastructure — where the set of permitted actions is determined by policy documents that compliance, legal, or security functions own and update. The agent has a non-trivial action surface (transfers, account changes, external API calls of varying risk) and the rules over that surface change more often than the agent code. The people who write the rules are not the same people who write the prompts or deploy the agent.

Problem

When the governance rules live inside the system prompt or are hard-coded in the agent, every policy change becomes a prompt edit followed by a redeploy, and the compliance officers responsible for the rules cannot read, audit, or change them without going through engineering. Natural-language rules embedded in the prompt also have no signed version, no machine-evaluable contract with the action that actually fired, and no independent audit trail an auditor can replay. Without an external, machine-readable policy surface, compliance and engineering are bound to the same release cycle and the rules become unauditable.

Forces

Compliance officers must own the rules, but they do not write prompts and do not deploy agent code.
Policies change faster than agent prompts and on a different release cadence than model weights.
Natural-language rules embedded in the prompt are not independently auditable and have no signed version.
A machine-evaluable policy engine must be deterministic and fast enough to sit on the hot path of every tool call.
Policy documents are often authored in prose; manually translating them to code is a bottleneck and a source of drift.

Example

A bank deploys an agent that can move money, open accounts, and call external KYC services. The compliance team writes its rules in Rego in a separately versioned policy repository, including jurisdiction-by-jurisdiction holds, sanctions checks, and threshold-based human-approval requirements. Before any tool call, the agent serialises the proposed action and sends it to an OPA sidecar. OPA returns allow with obligations (require dual approval, mask the customer name in the downstream call), and the agent honours those obligations on dispatch. When a regulator asks why a particular transfer was permitted, the audit log replays the action against the exact policy hash that was active at that moment.

Diagram

sequenceDiagram participant CR as Compliance repo (policies as code) participant PDP as Policy Decision Point (OPA / Cedar) participant A as Agent participant T as Tool participant L as Decision log CR->>PDP: compile / sign / deploy policy bundle A->>PDP: action proposal {tool, args, caller, data fingerprints} PDP-->>A: allow | deny | allow-with-obligations<br/>+ policy_hash + rule_id alt allow A->>T: dispatch tool (with obligations applied) T-->>A: result else deny A->>A: surface rule_id to user / escalate end A->>L: {action, policy_hash, rule_id, verdict}

Solution

Therefore:

Maintain policies as code (OPA/Rego, Cedar, or equivalent) in a repository owned by compliance, optionally generated by a policy compiler that translates prose policy documents into the rule language. Before any tool dispatch, the agent emits a structured action proposal (tool, arguments, caller context, retrieved data fingerprints) to an external policy decision point. The engine returns allow, deny, or allow-with-obligations together with a policy hash and rule id. The agent dispatches the tool only on allow; on deny the agent surfaces the rule id to the user or escalates. Policies are versioned, signed, and ship through a separate pipeline from the agent. Evaluation results are logged with the policy hash so any decision can be re-checked against the exact rule version that fired.

What it gives you

Compliance owns the rules in their native form; engineering owns the agent.
Policy changes ship without touching prompts or model weights.
Every allow/deny carries a signed policy version that an auditor can replay.
Deterministic rule evaluation removes the LLM from the enforcement path.
Prose-to-code compilation reduces translation drift between policy documents and runtime checks.

What it costs you

Adds a synchronous decision point to every tool call; latency and availability of the policy engine become production concerns.
Rule language (Rego, Cedar) is itself a skill the compliance team must acquire or be supported in.
Prose-to-code compilation can introduce its own translation errors; the compiled output still needs human review.
Policies that depend on free-text content (intent, tone) cannot be fully expressed as code and fall back on classifier obligations.
Action proposals must serialise enough context for the policy to evaluate, which expands the agent's structured-output surface.

What this pattern forbids. The LLM must not dispatch any governed tool call without first obtaining an allow verdict from the external policy engine, must not modify or paraphrase rule content at runtime, and must surface the rule id behind any deny rather than synthesising its own explanation.

The smaller patterns that complete this one —

generalisesPolicy-Gated Agent Action (KRITIS)★— Each agent action passes through a policy gate (NIS2, EU AI Act, BSI rules) and is tagged with Run ID + Model Digest + Policy Hash for WORM-audit reconstruction.

And the patterns that stand alongside it, or against it —

alternative-toConstitutional Charter★— Define rules the agent reads every turn but cannot modify, encoding inviolable boundaries.
complementsInput/Output Guardrails★★— Validate inputs before they reach the model and outputs before they reach the user.
complementsHuman-in-the-Loop★★— Require explicit human approval at defined points before the agent performs an action.
complementsRefusal★★— Explicitly refuse requests that fall outside the agent's scope, capability, or policy boundaries.
complementsVisual Workflow Graph★★— Express agentic logic as a visual graph of typed nodes connected on a canvas with Start and End nodes so non-coding stakeholders can read and edit the flow.
complementsTyped Refusal Codes★— Define a single source of truth for machine-readable refusal codes across all guard surfaces, so refusals can be triaged mechanically rather than by string-grepping ad-hoc human-readable messages.
complementsLLM as Periphery·— Invert the typical LLM-in-the-middle architecture: a deterministic state machine and event store form the core; the LLM is restricted to edge tasks — input interpretation and output synthesis only.
complementsSimulate Before Actuate★— Before issuing an irreversible action, run a deterministic simulation that computes pre-conditions, invariants, and expected deltas; require a verifier — automated or human — to green-light the simulated outcome before the real command is sent.
complementsHybrid Symbolic-Neural Routing★— Per query, route between a symbolic path (rule engine, knowledge graph) and a neural path (LLM), using the LLM for interpretation and the symbolic layer for exact constraints.
complementsControl-Flow Integrity★— Treat the agent's planned step sequence as a trusted control-flow graph that tool outputs, retrieved content, and user-supplied data cannot redirect at runtime.
complementsStochastic-Deterministic Boundary (SDB)★— Formalize the seam between an LLM proposal and a system action as a four-part contract — proposer, verifier, commit step, reject signal — so the contract itself, not the agent's good intent, gates side-effects.
complementsSupervisor-Plus-Gate★— Supervisor controller that validates and gates LLM outputs against deterministic checks before they commit to side-effects.
complementsTool Over-Broad Scope✕— Anti-pattern: grant the agent tools scoped so broadly that a single hallucinated argument can escalate into a privilege incident.
complementsDecision Context Maps★— Before any consequential decision, require the agent to gather a declared set of contextual inputs (resource availability, schedules, downstream dependencies) into a 'context map' the decision must cite.
alternative-toContext Gap (Security)✕— Agents faithfully follow explicit security rules but miss the broader implications — they log access correctly without flagging the unusual pattern a human expert would catch immediately.
complementsPriority Matrix (Conflict Resolution)★— Pre-define how the agent must resolve specific classes of goal conflicts via a human-authored lookup table — transforming the agent from a decision-maker (where it fails on competing objectives) into a decision-implementer.
composes-withAgent Middleware Chain★— Wrap every model call, tool call, and memory access in a composable pre/execute/post interceptor pipeline so cross-cutting concerns attach without touching agent or orchestrator code.
composes-withMulti-Principal Welfare Aggregation·— When an agent serves multiple humans with conflicting preferences, declare the aggregation rule explicitly rather than letting it be implicit in the prompt or fine-tune.
composes-withCost-Aware Action Delegation★— Classify every agent action by risk/cost and route each tier to a different approval policy, bounding the autonomy surface per-action instead of by one global flag.
complementsAgentic Golden Path★— Constrain an agent to the platform's curated golden path of living, machine-readable standards and check for drift as it works, so its output is compliant by construction rather than corrected later.
complementsTenant-Scoped Tool Binding★— Bind every tool call and retrieval to the active tenant in code at the execution layer, so a multi-tenant agent can never be talked into reading or writing another tenant's data.
complementsRisk-Tiered Action Autonomy★— Set an agent's permitted action class by the financial materiality of the action, letting it read and draft freely while requiring a different human principal to release material postings, payments, or filings.
alternative-toFormal-Proof Compliance Gate·— Require every agent-proposed action to ship a machine-checked proof that it satisfies the binding regulatory invariants, and reject deterministically any action whose proof does not check.
complementsChange-Freeze-Aware Action Gate★— Check every mutating agent action against an active deploy-freeze or maintenance calendar and block it or force explicit human re-authorisation while a freeze covering its scope is in effect.
complementsProduction Failure Triage Loop★— Sort every production agent failure into a small fixed taxonomy and bind each class to a set remediation path, so fixes are dispatched mechanically and the monitor-to-fix loop stays fast enough to gate scaling.