Safety & Control

Dry-Run Harness

Simulate planned actions (and their projected side effects) without committing them, surfacing a reviewable diff before any commit.

Problem

Reviewing each individual action lacks context — humans need to see the projected end-state, not isolated steps. Naive simulate-before-actuate runs only the next action in dry-run; humans cannot evaluate the aggregate effect of a multi-step plan. Differs from simulate-before-actuate by presenting the candidate side-effect set as a unified reviewable artifact.

Solution

Build a tool wrapper that supports dry-run mode: every action returns the projected side-effect (the SQL it would run, the API call it would make, the file diff it would write) without actually committing. The agent runs end-to-end in dry-run; the resulting collection of projected side-effects is presented to a human as a unified diff (or change-list). Human approves, edits, or rejects the plan as a whole. Only on approval do the actions commit for real. Pair with approval-queue, simulate-before-actuate, human-in-the-loop.

When to use

  • Multi-step plans whose aggregate effect needs human review.
  • Tools support (or can be wrapped to support) dry-run mode.
  • Review latency budget allows for plan-then-approve cycle.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related