Agent-Generated Code RCE
Anti-pattern: let the agent author and execute code in its sandbox without distinguishing legitimate task code from injection-induced code.
Problem
An attacker who can plant instructions in any reachable input — a document the agent processes, a tool result it reads — can elicit malicious code from the agent. The agent generates and executes it through the same path as legitimate code. Result: data exfiltration, reverse shells, sandbox escape, all initiated by the agent itself. The audit log shows agent-authored code running under agent identity; classical RCE detection sees nothing exotic.
Solution
Don't run agent-authored code with the same trust regardless of origin. Use sandbox-isolation with no outbound network unless allow-listed. Separate planning (which can be informed by untrusted input) from execution (which should not be). For high-risk inputs, require human-in-the-loop confirmation before execute. Pair with prompt-injection-defense.
When to use
- Never. Cite when reviewing code-execution-capable agents.
- Sandbox with no outbound unless allow-listed; track input provenance to execute calls.
- Require human confirmation before executing code that originated from untrusted input.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.