VIII · Safety & ControlEmerging

Sovereign Inference Stack

also known as On-Premise Agent Stack, Data-Residency Agent Architecture, Sovereign AI

Run the entire agent stack (model weights, inference, tool layer, vector stores, logs) inside a jurisdictional and operational boundary the operator controls, so no request, prompt, or output crosses into a third-party API.

Context

An operator in public administration, banking, defence, health, or critical infrastructure needs to deploy an agent under a policy or legal regime that forbids sending the prompts, tool inputs, or outputs to a foreign-cloud large-language-model provider. Concrete drivers include the EU AI Act for high-risk systems, the German BSI C5 cloud-security framework, the EU NIS2 directive, and sectoral data-protection rules covering medical or financial data. The operator must be able to demonstrate that no in-scope data crosses the boundary they control.

Problem

A hosted-API agent sends every prompt, every tool input, and every output to a third party — that is the architecture. Contractual assurances from the provider do not satisfy regulators who require the data to stay inside a specific jurisdiction and under the operator's own keys. At the same time, the frontier hosted models offer the best capability per dollar, and self-hosting demands GPU capital expenditure and machine-learning operations skill the operator may not have. Without a deliberate stack where every load-bearing component sits inside the operator-controlled boundary, the team has to choose between being non-compliant and not shipping at all.

Forces

  • Frontier hosted models offer the best capability per dollar.
  • Regulators forbid data egress for protected categories.
  • Self-hosting demands GPU capex and MLOps competence the operator may lack.
  • Sovereign deployments must still reach acceptable model quality to be useful.

Example

A bank wants an internal coding assistant but legal flatly forbids any source-code or prompt leaving the bank's controlled boundary, regardless of vendor contractual language. The team picks a permissively-licensed open-weights model, runs inference in their own datacentre, places the vector store and trace logs inside the same boundary, and holds the keys themselves. No request, prompt, or output ever crosses to a third-party API; the assistant ships under regulator review.

Diagram

Solution

Therefore:

Choose models with permissive weights or commercial sovereign licensing. Run inference on-prem or in a jurisdictionally controlled cloud region with the operator holding the keys. Place all auxiliary services (vector store, tool gateway, audit log, evaluation harness) inside the same boundary. Document the boundary as part of the system's compliance posture (model card, data-flow diagram). Treat the boundary as load-bearing: any new tool or model call has to be reviewed for boundary impact before merge.

What this pattern forbids. No prompt, tool input, tool output, or memory entry may leave the operator-controlled boundary; agent components that require a third-party hosted call are forbidden by construction.

The smaller patterns that complete this one —

  • usesLineage Tracking★★Track which prompt version, model version, and data sources produced each agent output.

And the patterns that stand alongside it, or against it —

  • complementsSession Isolation★★Keep one user's session state and memory unreachable from another user's agent.
  • complementsSecrets HandlingEnsure the model never receives secrets in plaintext; tools resolve credentials from references at runtime.
  • complementsConstitutional CharterDefine rules the agent reads every turn but cannot modify, encoding inviolable boundaries.
  • complementsOpen-Weight CascadeBuild a multi-model cascade where lower tiers are open-weight, self-hostable models that run inside the operator's boundary, and only escalations cross to a hosted frontier model — giving cost arbitrage *and* sovereignty.
  • complementsVendor Lock-InAnti-pattern: couple application code directly to one model provider's SDK, request shape, and proprietary features so that switching providers requires rewriting application code rather than swapping an adapter.
  • alternative-toShadow AIAnti-pattern: leave the corporate the model offering so restrictive, slow, or narrow that employees bypass it with personal accounts and unapproved agent tools, creating data leakage and ungoverned tool calls that security cannot see.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

References

Provenance