VIII · Safety & ControlEmerging

Interruptible Agent Execution

also known as Pause/Resume/Cancel Control Surface, User-Interruptible Agent

Treat pause, resume, and cancel as a first-class control surface on every long-running agent so users can halt expensive or off-track trajectories mid-task while state is preserved for resumption.

Context

An agent runs for minutes, hours, or longer on a single user task — a deep-research loop, a code-agent session, an autonomous browser flow. The user is watching it work and forms a judgment mid-run: it has gone off-track, it is burning tokens unnecessarily, or the task is no longer wanted. The user expects to stop it like any other long-running application — pause and inspect, cancel cleanly, or resume after a check.

Problem

Most agent runtimes only expose 'start' and (sometimes) a brutal kill. Pause is not implemented, so the user must wait for the agent to finish or kill the process. Cancel loses any partial work and any chance to run compensating actions. Resume is impossible because nothing snapshotted state. Without an interruption surface, autonomous loops produce a binary 'let it finish or lose everything' experience that destroys user trust in long-running agents.

Forces

  • Pause must propagate to the model call and the tool call, not just the orchestrator loop.
  • Resume must restore state without re-doing the in-flight tool call.
  • Cancel must run compensating actions on in-flight side effects.
  • All three must be exposed in the UX, not hidden as ops-only controls.

Example

A research agent has spent 12 minutes browsing sources and is starting to repeat searches. The user clicks Pause. The runtime snapshots state at the next step boundary and stops further calls. The user reviews the work-in-progress notes, decides the agent had enough material 8 minutes ago, and clicks Resume with an instruction to summarise and stop rather than search further. The agent picks up from the snapshot and finishes.

Diagram

Solution

Therefore:

Build the runtime so each step boundary is a snapshot point: state is durable across pause/resume. Pause stops further model and tool calls without killing the process. Resume rehydrates from the snapshot. Cancel runs compensating actions on in-flight side effects (mark drafts as discarded, release locks, end provider sessions) before tearing down. Expose all three as visible UX, not hidden APIs. Distinct from a kill-switch, which is an operator-level emergency halt.

What this pattern forbids. A long-running agent must not expose only 'start' and 'kill'; pause, resume, and cancel are first-class controls and state is preserved across them.

The smaller patterns that complete this one —

  • usesAgent Resumption★★Persist agent execution state so a long-running run survives restarts, deploys, or user disconnects.
  • usesDurable Workflow SnapshotCapture workflow execution state as a snapshot in a pluggable storage provider so a paused run can resume across deployments, process restarts, and host crashes.
  • usesCompensating Action★★Pair every irreversible-looking agent action with a compensating action that can undo or counteract it.

And the patterns that stand alongside it, or against it —

  • complementsKill SwitchProvide an out-of-band control plane to halt running agent instances without redeploy.
  • complementsInterrupt-Resumable Thought·Preserve multi-step reasoning across interrupts by supporting paused-and-resumed thought frames so a new message handles cleanly without clobbering in-flight work.
  • composes-withComposable Termination ConditionsExpress agent stop criteria as small single-purpose conditions composed with AND/OR into one explicit termination contract instead of ad-hoc loop guards.
  • complementsApproval Queue★★Queue agent-proposed actions for asynchronous human review while the agent continues other work.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.