Sandbox Escape Monitoring

also known as Sandbox Telemetry, Boundary Violation Alerts

Treat sandbox boundary violations as telemetry; alert on syscalls, network egress, or filesystem writes outside expected scope.

Context

A team runs an agent that executes generated code or manipulates files on behalf of users, inside an isolation boundary such as a container, microVM, or syscall-filtered sandbox. The boundary is designed to confine what the agent can read, write, and reach over the network. Real-world sandboxes have known escape vectors and zero-day vulnerabilities; isolation is necessary but not by itself sufficient.

Problem

Treating the sandbox as a pure prevention mechanism means a successful escape, or even repeated escape attempts, can happen without anyone seeing them. A blocked network egress, an unexpected syscall, or a write outside the working directory will silently fail or succeed without any alert. The team is forced to choose between assuming the sandbox is impenetrable, which it is not, or learning about boundary violations from the downstream damage they cause.

Forces

Telemetry granularity vs cost.
False positives on legitimate boundary-pushing operations.
Egress patterns evolve faster than allowlists.

Example

A code-execution agent runs user-emitted Python in a container that should have no network. One day a contractor's prompt-injected payload triggers an outbound DNS request; sandbox-isolation alone would have allowed the egress to fail silently. With escape monitoring, the unexpected syscall and the blocked egress both stream to safety telemetry, an alert fires within seconds, and the team locks the offending tenant before any further attempts.

Diagram

Solution

Therefore:

Instrument the sandbox: log every syscall outside the allowed set, every network egress not on the allowlist, every filesystem write outside the working directory. Stream to safety telemetry. Alert on threshold breaches. Pair with kill-switch for automatic halt on confirmed escape.

What this pattern forbids. Sandbox events outside the allowed set must be logged and inspectable; silent boundary violations are forbidden.

The smaller patterns that complete this one —

usesProvenance Ledger★★— Log every agent decision and state change with enough metadata to explain or reverse it later.

And the patterns that stand alongside it, or against it —

complementsSandbox Isolation★★— Run agent-emitted code or actions in a contained environment with restricted filesystem, network, and process privileges.
composes-withKill Switch★— Provide an out-of-band control plane to halt running agent instances without redeploy.
complementsAdversary-Indistinguishability Blind Spot✕— Anti-pattern: rely on behavioral-anomaly detection calibrated to irregular human behaviour, so an autonomous adversary acting with legitimate credentials, standard protocols, and superhuman consistency is less anomalous than a human and slips past unseen.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

Used in frameworks

Noma Security
supported4 patternsEnterprise Platforms★ emerging
Identifies and blocks malicious tool calls, poisoned MCP servers, and unauthorized function executions.

References

OWASP Top 10 for LLM Applications
spec

Provenance

Source: patterns/sandbox-escape-monitoring.md on GitHub · commit 4fa1213 · view history
Added to catalog: 2026-04-30
Last updated: 2026-05-21
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.