Safety & Control

Code-Then-Execute with Dataflow Analysis

Have the agent emit code in a sandbox DSL whose values are statically tagged trusted/tainted via dataflow analysis before execution, enabling per-value policy enforcement.

Problem

Without provenance tracking, the executor cannot distinguish trusted values (the agent's plan, user goal) from tainted values (fetched content that could be attacker-controlled). The same `exec(code)` runs both. A prompt injection in fetched content can produce code that, e.g., reads secrets from env and embeds them in an outbound URL — and the sandbox cannot reject it because it cannot tell the URL is tainted.

Solution

Define a sandbox DSL (subset of Python/TS or a custom Pyret-style language) where every value carries a provenance tag (TRUSTED, TAINTED, MIXED). The runtime performs static dataflow analysis on each agent-generated program before execution: if a TAINTED value reaches a sink declared sensitive (network egress, env reads, file writes outside scratch dir), reject the program. Pair with sandbox-isolation, action-selector-pattern.

When to use

  • Agent generates code that processes untrusted content alongside sensitive values (secrets, PII).
  • Static analysis can be performed in tens of ms per program.
  • Engineering team can maintain a sandbox DSL.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related