Sovereign Inference Stack

also known as On-Premise Agent Stack, Data-Residency Agent Architecture, Sovereign AI

Run the entire agent stack (model weights, inference, tool layer, vector stores, logs) inside a jurisdictional and operational boundary the operator controls, so no request, prompt, or output crosses into a third-party API.

Context

An operator in public administration, banking, defence, health, or critical infrastructure needs to deploy an agent under a policy or legal regime that forbids sending the prompts, tool inputs, or outputs to a foreign-cloud large-language-model provider. Concrete drivers include the EU AI Act for high-risk systems, the German BSI C5 cloud-security framework, the EU NIS2 directive, and sectoral data-protection rules covering medical or financial data. The operator must be able to demonstrate that no in-scope data crosses the boundary they control.

Problem

A hosted-API agent sends every prompt, every tool input, and every output to a third party — that is the architecture. Contractual assurances from the provider do not satisfy regulators who require the data to stay inside a specific jurisdiction and under the operator's own keys. At the same time, the frontier hosted models offer the best capability per dollar, and self-hosting demands GPU capital expenditure and machine-learning operations skill the operator may not have. Without a deliberate stack where every load-bearing component sits inside the operator-controlled boundary, the team has to choose between being non-compliant and not shipping at all.

Forces

Frontier hosted models offer the best capability per dollar.
Regulators forbid data egress for protected categories.
Self-hosting demands GPU capex and MLOps competence the operator may lack.
Sovereign deployments must still reach acceptable model quality to be useful.

Example

A bank wants an internal coding assistant but legal flatly forbids any source-code or prompt leaving the bank's controlled boundary, regardless of vendor contractual language. The team picks a permissively-licensed open-weights model, runs inference in their own datacentre, places the vector store and trace logs inside the same boundary, and holds the keys themselves. No request, prompt, or output ever crosses to a third-party API; the assistant ships under regulator review.

Diagram

flowchart TB subgraph Boundary[Operator-controlled boundary] Inf[On-prem inference] Tools[Tool gateway] Vec[(Vector store)] Logs[(Audit log)] Eval[Eval harness] end U[User UI] --> Inf Inf --> Tools Tools --> Vec Inf --> Logs Inf -.never crosses.-x Ext[Third-party API] Inf --> U

Solution

Therefore:

Choose models with permissive weights or commercial sovereign licensing. Run inference on-prem or in a jurisdictionally controlled cloud region with the operator holding the keys. Place all auxiliary services (vector store, tool gateway, audit log, evaluation harness) inside the same boundary. Document the boundary as part of the system's compliance posture (model card, data-flow diagram). Treat the boundary as load-bearing: any new tool or model call has to be reviewed for boundary impact before merge.

What this pattern forbids. No prompt, tool input, tool output, or memory entry may leave the operator-controlled boundary; agent components that require a third-party hosted call are forbidden by construction.

The smaller patterns that complete this one —

usesLineage Tracking★★— Track which prompt version, model version, and data sources produced each agent output.

And the patterns that stand alongside it, or against it —

complementsSession Isolation★★— Keep one user's session state and memory unreachable from another user's agent.
complementsSecrets Handling★— Ensure the model never receives secrets in plaintext; tools resolve credentials from references at runtime.
complementsConstitutional Charter★— Define rules the agent reads every turn but cannot modify, encoding inviolable boundaries.
complementsOpen-Weight Cascade★— Build a multi-model cascade where lower tiers are open-weight, self-hostable models that run inside the operator's boundary, and only escalations cross to a hosted frontier model — giving cost arbitrage *and* sovereignty.
complementsVendor Lock-In✕— Anti-pattern: couple application code directly to one model provider's SDK, request shape, and proprietary features so that switching providers requires rewriting application code rather than swapping an adapter.
alternative-toShadow AI✕— Anti-pattern: leave the corporate LLM offering so restrictive, slow, or narrow that employees bypass it with personal accounts and unapproved agent tools, creating data leakage and ungoverned tool calls that security cannot see.
complementsCompliance-Certified Launch Gate★— Require an external regulator to certify the generative service against a published content-safety standard before it may serve the public, forcing the standard's controls into the build as a re-certifiable artifact.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

Sovereign / Regulated Deployment
core

Used in frameworks

References

Provenance

Source: patterns/sovereign-inference-stack.md on GitHub · commit 4fa1213 · view history
Added to catalog: 2026-04-30
Last updated: 2026-05-21
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.