VIII · Safety & ControlEmerging

Simulate Before Actuate

also known as Dry-Run Harness, Simulate-Then-Commit, Pre-Action Simulation Gate

Before issuing an irreversible action, run a deterministic simulation that computes pre-conditions, invariants, and expected deltas; require a verifier — automated or human — to green-light the simulated outcome before the real command is sent.

Context

An agent has tools that take irreversible actions: filesystem writes, database mutations, infrastructure changes, browser actions on a live site, payments, emails. The cost of a wrong action is high. The agent itself is non-deterministic and occasionally proposes plausible-looking actions that are wrong in subtle ways: deletes the wrong key, sends to the wrong recipient, mutates the wrong row.

Problem

Letting the agent commit irreversible actions on a single proposal exposes the system to silent, hard-to-rollback damage. Pure human-in-the-loop is too slow for the volume; pure trust-the-agent is too dangerous. Recent practitioner write-ups (Joakim Vivas' '17 agentic architectures' survey) and the arXiv 'Architectures for Building Agentic the model' chapter and 'Deterministic Pre-Action Authorization' preprint converge on a deterministic simulation step: run the proposed action against a digital twin, sandbox replay, or dry-run flag; compute the resulting state and the diff; require sign-off on the diff before committing.

Forces

  • Irreversible actions deserve more scrutiny than reversible ones, but the agent's proposal does not distinguish.
  • Full human-in-the-loop is too slow at production volume; a deterministic verifier can scale.
  • A simulation has to be faithful enough that 'passes the sim' implies 'safe in reality' — otherwise the gate is theatre.
  • Some action surfaces have no simulator (external APIs without sandboxes, partner systems); the pattern then degrades to dry-run flags, schema validation, or HITL.

Example

A devops agent receives a request to clean up unused Kubernetes resources. It proposes 'kubectl delete pod app-prod-7d3'. The wrapper intercepts the call, runs it with --dry-run=server, reads the simulated diff: 'will delete 1 pod, will scale Deployment app-prod from 3 to 2, will not affect Service'. The verifier checks invariants: target namespace is in the agent's allowed scope, deletion count is under cap, no destructive label match. All green; the real call goes out. On a different invocation the agent proposes deleting a pod in kube-system; same flow, the verifier rejects (namespace not in allowed scope), the agent gets an error back and replans.

Diagram

Solution

Therefore:

Decompose the action surface: for each irreversible tool, define a faithful simulator (digital twin, sandbox replay, dry-run mode, snapshot DOM for web, transactional rollback for DBs). Wrap the tool so every call runs simulation → verifier → execute. The verifier is automated where the invariants can be encoded (no destructive deletes without explicit flag, no out-of-budget transfers) and falls back to human-in-the-loop where they cannot. Where no simulator exists, refuse to call without HITL approval.

What this pattern forbids. Forbids the agent from invoking irreversible tools directly; every such call must pass through the simulator + verifier gate. The LLM's tool-call freedom is conditional on the gate's approval.

The smaller patterns that complete this one —

  • usesSandbox Isolation★★Run agent-emitted code or actions in a contained environment with restricted filesystem, network, and process privileges.
  • generalisesDry-Run HarnessSimulate planned actions (and their projected side effects) without committing them, surfacing a reviewable diff before any commit.
  • generalisesMental-Model-In-The-Loop Simulator·Run candidate multi-step strategies inside an internal simulator of the environment before committing in the real world — broader than simulate-before-actuate (single action) by simulating multi-step strategies.

And the patterns that stand alongside it, or against it —

  • complementsHuman-in-the-Loop★★Require explicit human approval at defined points before the agent performs an action.
  • complementsWorld Model as Tool·Let a planning agent invoke a generative world model as a tool to roll out hypothetical futures before committing to an action, treating the world model as a callable simulator rather than a training target.
  • complementsApproval Queue★★Queue agent-proposed actions for asynchronous human review while the agent continues other work.
  • alternative-toCompensating Action★★Pair every irreversible-looking agent action with a compensating action that can undo or counteract it.
  • complementsPolicy-as-Code GateEvaluate every proposed agent action against externally-managed machine-readable policies before dispatch, so compliance authorship lives outside the prompt and outside the agent code.
  • complementsKill SwitchProvide an out-of-band control plane to halt running agent instances without redeploy.
  • complementsBlind Grader with Isolated ContextRun an evaluator in a separately-allocated context window with access only to the artifact and the rubric, never the producing agent's reasoning trace, so the grader cannot be primed by the producer's framing.
  • composes-withControl-Flow IntegrityTreat the agent's planned step sequence as a trusted control-flow graph that tool outputs, retrieved content, and user-supplied data cannot redirect at runtime.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.