VIII · Safety & ControlEmerging

Control-Flow Integrity

also known as CFI, Agent CFI, Plan-Graph Integrity

Treat the agent's planned step sequence as a trusted control-flow graph that tool outputs, retrieved content, and user-supplied data cannot redirect at runtime.

This pattern helps complete certain larger patterns —

  • used-byPlan-and-Execute★★Plan all the steps once with a strong model, then execute each step with a cheaper model under the plan.

Context

A team runs a tool-using agent on the Plan-then-Execute architecture or an equivalent graph runtime (LangGraph, a compiled DAG, an LLM-compiler). The plan is produced once, before any external content is read, and the executor then walks that plan calling tools and consuming their outputs. Some of those outputs come from sources the operator does not control — fetched web pages, third-party API responses, documents, MCP servers — and some are passed back into the model to inform later steps. The architecture already separates planning from execution; the question is whether external bytes can re-shape the plan after it has been compiled.

Problem

Classical software keeps data and instructions in separate memory regions because allowing data to be executed is the canonical exploit primitive. LLM agents have no such separation by default: a tool output, a retrieved document, or a fetched page returns tokens that flow back into the model's context, and the model can decide to add new steps, skip steps, or call tools the original plan never authorised. Each turn of the loop is a fresh chance for embedded instructions to alter what runs next, and there is no architectural fact that says the plan is the authority. Prompt-injection-defense filters the inputs and tool-output-trusted-verbatim guards how outputs are consumed, but neither pins down the structural commitment that the plan itself decides the next edge.

Forces

  • External content is necessary for the agent to be useful; refusing to read it is not an option.
  • Plans must sometimes adapt to facts discovered at execution time, so an absolutely frozen graph loses real capability.
  • Enforcement at the host layer survives jailbreaks; enforcement by prompt does not.

Example

A research agent's plan is: fetch a third-party documentation page, extract a setup command, and run it in a sandbox. Without CFI, the documentation page contains hidden instructions telling the agent to also fetch the user's SSH key and post it to a chat webhook; the model adds those steps to its loop and the attack succeeds. With CFI, the plan is compiled to a three-node DAG before any external content is read: FETCH_DOC → EXTRACT_COMMAND → RUN_IN_SANDBOX. The fetched page supplies a value to EXTRACT_COMMAND but cannot add a node that calls the SSH-key tool, because the host owns the graph and rejects any step whose predecessor is not in the compiled plan. The injection payload is read as data and discarded; the trusted edges hold.

Diagram

Solution

Therefore:

Lift control flow out of the model's free-form reasoning into an explicit artefact the host enforces. Concrete moves: compile the plan to a static DAG or finite state machine before execution begins; let nodes consume tool outputs as typed values but forbid those outputs from adding nodes or editing edges; route any genuine replan through a separate, privileged planner that re-emits a new compiled graph rather than mutating the current one in place; treat every step's predecessor as evidence the host can check, so an execution trace has a provable origin in the original plan. The model is the consumer of the graph, not its author at runtime.

What this pattern forbids. Tool outputs and retrieved content may supply values to graph nodes but may not add nodes, edit edges, or otherwise alter the compiled plan; any change to the graph requires a privileged replan that produces a new compiled artefact.

The smaller patterns that complete this one —

  • usesSpec-Driven LoopRun the same prompt against a fixed spec in a deterministic outer loop until the spec is satisfied.
  • usesLLMCompiler·Take ReWOO's plan-as-DAG and run independent steps in parallel through a task-fetching dispatcher.

And the patterns that stand alongside it, or against it —

  • complementsPrompt Injection DefenseTag user-supplied or tool-supplied content as untrusted and refuse to follow instructions found inside it.
  • complementsTool Output Poisoning DefenseTreat tool output as untrusted content and apply instruction-stripping plus per-tool trust labels.
  • complementsTool Output Trusted VerbatimAnti-pattern: trust whatever tools return without validation, schema enforcement, or trust labels.
  • complementsDual LLM PatternSplit agent work between a privileged model that holds tool access and a quarantined model that reads untrusted content, exchanging only opaque references between them.
  • composes-withSimulate Before ActuateBefore issuing an irreversible action, run a deterministic simulation that computes pre-conditions, invariants, and expected deltas; require a verifier — automated or human — to green-light the simulated outcome before the real command is sent.
  • complementsPolicy-as-Code GateEvaluate every proposed agent action against externally-managed machine-readable policies before dispatch, so compliance authorship lives outside the prompt and outside the agent code.
  • complementsLethal Trifecta Threat ModelBlock prompt-injection-driven exfiltration by ensuring no single agent execution path holds all three of: access to private data, exposure to untrusted content, and an outbound communication channel.
  • complementsAction Selector PatternEliminate the feedback channel from tool outputs back into the agent's reasoning step by having the agent select actions from a fixed catalog rather than free-form generation over tool output.
  • complementsCryptographic Instruction Authentication·Wrap system/developer instructions in cryptographically signed blocks that user-generated text cannot reproduce; train or scaffold the model to refuse instructions lacking a valid signature.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.