Control-Flow Integrity

also known as CFI, Agent CFI, Plan-Graph Integrity

Treat the agent's planned step sequence as a trusted control-flow graph that tool outputs, retrieved content, and user-supplied data cannot redirect at runtime.

This pattern helps complete certain larger patterns —

used-byPlan-and-Execute★★— Plan all the steps once with a strong model, then execute each step with a cheaper model under the plan.

Context

A team runs a tool-using agent on the Plan-then-Execute architecture or an equivalent graph runtime (LangGraph, a compiled DAG, an LLM-compiler). The plan is produced once, before any external content is read, and the executor then walks that plan calling tools and consuming their outputs. Some of those outputs come from sources the operator does not control — fetched web pages, third-party API responses, documents, MCP servers — and some are passed back into the model to inform later steps. The architecture already separates planning from execution; the question is whether external bytes can re-shape the plan after it has been compiled.

Problem

Classical software keeps data and instructions in separate memory regions because allowing data to be executed is the canonical exploit primitive. LLM agents have no such separation by default: a tool output, a retrieved document, or a fetched page returns tokens that flow back into the model's context, and the model can decide to add new steps, skip steps, or call tools the original plan never authorised. Each turn of the loop is a fresh chance for embedded instructions to alter what runs next, and there is no architectural fact that says the plan is the authority. Prompt-injection-defense filters the inputs and tool-output-trusted-verbatim guards how outputs are consumed, but neither pins down the structural commitment that the plan itself decides the next edge.

Forces

External content is necessary for the agent to be useful; refusing to read it is not an option.
Plans must sometimes adapt to facts discovered at execution time, so an absolutely frozen graph loses real capability.
Enforcement at the host layer survives jailbreaks; enforcement by prompt does not.

Example

A research agent's plan is: fetch a third-party documentation page, extract a setup command, and run it in a sandbox. Without CFI, the documentation page contains hidden instructions telling the agent to also fetch the user's SSH key and post it to a chat webhook; the model adds those steps to its loop and the attack succeeds. With CFI, the plan is compiled to a three-node DAG before any external content is read: FETCH_DOC → EXTRACT_COMMAND → RUN_IN_SANDBOX. The fetched page supplies a value to EXTRACT_COMMAND but cannot add a node that calls the SSH-key tool, because the host owns the graph and rejects any step whose predecessor is not in the compiled plan. The injection payload is read as data and discarded; the trusted edges hold.

Diagram

flowchart LR P[Planner<br/>privileged] -->|compile| G[(Trusted plan graph<br/>nodes + edges)] G --> N1[Node 1] N1 --> N2[Node 2] N2 --> N3[Node 3] T1[Tool output<br/>untrusted] -.value only.-> N2 T1 -.x cannot add edge.-> G R[Retrieved content<br/>untrusted] -.value only.-> N3 R -.x cannot add node.-> G N3 -->|new facts| RP{Replan?} RP -- yes --> P RP -- no --> Done[Done]

Solution

Therefore:

Lift control flow out of the model's free-form reasoning into an explicit artefact the host enforces. Concrete moves: compile the plan to a static DAG or finite state machine before execution begins; let nodes consume tool outputs as typed values but forbid those outputs from adding nodes or editing edges; route any genuine replan through a separate, privileged planner that re-emits a new compiled graph rather than mutating the current one in place; treat every step's predecessor as evidence the host can check, so an execution trace has a provable origin in the original plan. The model is the consumer of the graph, not its author at runtime.

What it gives you

Indirect prompt injection in tool outputs cannot cause unauthorised tool calls, because the calls are fixed at compile time.
Execution traces are auditable against the compiled plan; every step has a verifiable predecessor.
The trust boundary is enforced by the orchestrator, not by guardrail prose, so it survives clever payloads.
Composes cleanly with dual-LLM and simulate-before-actuate as complementary layers.

What it costs you

Static plans cannot react to genuinely new information without a privileged replan hop, which adds latency and cost.
Compiling a plan up front requires the planner to anticipate branches; over-broad graphs become brittle.
Does not defend against injection that targets the planner itself, or against poisoned tool outputs consumed verbatim within a legitimate node.
Tooling investment is non-trivial: capability tagging, graph compilation, and runtime checks must all exist.

What this pattern forbids. Tool outputs and retrieved content may supply values to graph nodes but may not add nodes, edit edges, or otherwise alter the compiled plan; any change to the graph requires a privileged replan that produces a new compiled artefact.

The smaller patterns that complete this one —

usesSpec-Driven Loop★— Run the same prompt against a fixed spec in a deterministic outer loop until the spec is satisfied.
usesLLMCompiler·— Take ReWOO's plan-as-DAG and run independent steps in parallel through a task-fetching dispatcher.

And the patterns that stand alongside it, or against it —

complementsPrompt Injection Defense★— Tag user-supplied or tool-supplied content as untrusted and refuse to follow instructions found inside it.
complementsTool Output Poisoning Defense★— Treat tool output as untrusted content and apply instruction-stripping plus per-tool trust labels.
complementsTool Output Trusted Verbatim✕— Anti-pattern: trust whatever tools return without validation, schema enforcement, or trust labels.
complementsDual LLM Pattern★— Split agent work between a privileged model that holds tool access and a quarantined model that reads untrusted content, exchanging only opaque references between them.
composes-withSimulate Before Actuate★— Before issuing an irreversible action, run a deterministic simulation that computes pre-conditions, invariants, and expected deltas; require a verifier — automated or human — to green-light the simulated outcome before the real command is sent.
complementsPolicy-as-Code Gate★— Evaluate every proposed agent action against externally-managed machine-readable policies before dispatch, so compliance authorship lives outside the prompt and outside the agent code.
complementsLethal Trifecta Threat Model★— Block prompt-injection-driven exfiltration by ensuring no single agent execution path holds all three of: access to private data, exposure to untrusted content, and an outbound communication channel.
complementsAction Selector Pattern★— Eliminate the feedback channel from tool outputs back into the agent's reasoning step by having the agent select actions from a fixed catalog rather than free-form generation over tool output.
complementsCryptographic Instruction Authentication·— Wrap system/developer instructions in cryptographically signed blocks that user-generated text cannot reproduce; train or scaffold the model to refuse instructions lacking a valid signature.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in frameworks

LangGraph
supported31 patternsOrchestration Frameworks★★ mature
Stateful graph fixes edges at compile time; node outputs cannot rewire the graph at runtime. Cited as a CFI-style defence in Del Rosario et al. (2025).

References

Provenance

Source: patterns/control-flow-integrity.md on GitHub · commit 69cc3f6 · view history
Added to catalog: 2026-05-22
Last updated: 2026-05-22
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.