VIII · Safety & ControlMature★★

Human-in-the-Loop

also known as HITL, Approval Gate, Confirmation Step, Risky Action Gate, Destructive Action Confirmation, Ask Before Risky Action

Require explicit human approval at defined points before the agent performs an action.

This pattern helps complete certain larger patterns —

  • used-by[crawl-walk-run-automation-gating]
  • used-byProgressive DelegationStage the human-to-agent handoff over time: the agent starts producing drafts a human always reviews; its autonomy expands action-by-action as measured trust accrues.
  • used-byCost-Aware Action DelegationClassify every agent action by risk/cost and route each tier to a different approval policy, bounding the autonomy surface per-action instead of by one global flag.

Context

A team runs an agent that can take consequential actions on the user's behalf — moving money, deleting files, sending public messages, deploying code, changing production configuration. The agent is correct most of the time but the cost of being wrong on certain action classes (an irreversible payment, a public broadcast, a destructive write) is much higher than the cost of pausing for a human to confirm. Some of those action classes also carry regulatory weight: the operator must be able to show that a human approved the step.

Problem

If the agent acts fully autonomously across all action classes, then any moment of model overconfidence becomes a real-world incident: a typo-squatted vendor gets paid, the wrong customer gets emailed, the production database loses a table. If the agent gates every action behind human approval, users get approval-fatigued, start clicking through prompts without reading them, and the gating stops protecting anyone. Without a way to single out the small set of action classes that genuinely warrant a pause, the team has to choose between unsafe autonomy and unusable friction.

Forces

  • Where to place the gate trades latency and friction for safety.
  • Approval-fatigue: too many gates train users to click through.
  • Asynchronous approval stalls the loop.

Example

A finance ops agent automates supplier payments end to end. After an incident where it paid $42k to a typo-squatted vendor domain, the team installs human-in-the-loop at the payment-execution boundary: the agent prepares the full payment proposal, surfaces vendor name, amount, IBAN, and the source invoice, then pauses for an explicit approve or reject from the on-call operator. Reject sends the proposal back for replan. The decision and the operator id are logged. Auto-payments resume but the bad-vendor class of incident stops.

Diagram

Solution

Therefore:

Identify the boundary. Pause the loop. Surface the proposed action with enough context for the human to decide. Require an explicit approve/reject. Resume on approve; abort or replan on reject. Log the decision.

What this pattern forbids. The defined action class cannot proceed without an affirmative approval signal.

The smaller patterns that complete this one —

  • generalisesCost Gating★★Block actions whose expected cost exceeds a threshold without explicit user (or operator) acknowledgement.
  • generalisesApproval Queue★★Queue agent-proposed actions for asynchronous human review while the agent continues other work.
  • generalisesDisambiguation★★Have the agent ask a clarifying question before acting on an ambiguous request.
  • generalisesSynchronous Execution-Plan ConfirmationAgent synchronously emits its full execution plan for user confirmation before any side-effect step, and provides asynchronous operation recordings for post-hoc review.
  • generalisesHuman ReflectionReflection loop that explicitly collects human feedback (not approval) on agent plans to improve them, distinct from approval gates where the human only says yes/no.
  • generalisesTwo Human TouchpointsPlace exactly two human-in-the-loop checkpoints in agentic pipelines: one at content selection and one at final review before publication.

And the patterns that stand alongside it, or against it —

  • complementsStep Budget★★Cap the number of tool calls or loop iterations the agent is allowed within a single request.
  • complementsCompensating Action★★Pair every irreversible-looking agent action with a compensating action that can undo or counteract it.
  • alternative-toConversation Handoff to Human★★Transfer the entire conversation thread from agent to human operator, with state transfer and return primitive.
  • alternative-toCommunicative DehallucinationWhen an instructed agent would have to invent missing context to comply, have it reverse roles and ask the instructor for the missing detail before answering.
  • complementsPolicy-as-Code GateEvaluate every proposed agent action against externally-managed machine-readable policies before dispatch, so compliance authorship lives outside the prompt and outside the agent code.
  • complementsSimulate Before ActuateBefore issuing an irreversible action, run a deterministic simulation that computes pre-conditions, invariants, and expected deltas; require a verifier — automated or human — to green-light the simulated outcome before the real command is sent.
  • complementsSocratic Questioning AgentDrive the agent toward its goal by asking the user a sequence of strategic, open-ended questions that surface the user's own latent knowledge, goal, or context — rather than producing an answer directly.
  • complementsDry-Run HarnessSimulate planned actions (and their projected side effects) without committing them, surfacing a reviewable diff before any commit.
  • complementsPipeline Triad PatternStaff each pipeline stage with a triad — Creator generates an artifact, Critic finds flaws, Arbiter makes a binding PASS/FAIL/PARTIAL decision — with four explicit human gates between stages.
  • complementsContext Gap (Security)Agents faithfully follow explicit security rules but miss the broader implications — they log access correctly without flagging the unusual pattern a human expert would catch immediately.
  • complementsConstrained AdaptabilityAgents recalculate within declared tools and rules like a GPS rerouting, but cannot creatively transcend those boundaries to invent new approaches the way humans do.
  • complementsPriority Matrix (Conflict Resolution)Pre-define how the agent must resolve specific classes of goal conflicts via a human-authored lookup table — transforming the agent from a decision-maker (where it fails on competing objectives) into a decision-implementer.
  • complementsConfidence-Checking WorkflowAlways ask the agent, for each part of its output, to state its confidence and identify which parts need human verification, like triaging a junior analyst's work.
  • complementsAutonomy SliderExpose agent autonomy as a continuous adjustable parameter so the same codebase can span scripted assistant to fully autonomous worker without re-architecting.
  • complementsCorrigible Off-Switch Incentive·Design the agent so being shut down or overridden by a human carries positive expected value, because the human's intervention is itself evidence the current objective is mis-specified.
  • complementsGenerative UILet the agent decide which interface components to render at runtime and stream them to the frontend over a typed protocol, so the surface follows the agent's output instead of being hardcoded.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

Used in frameworks

Show 6 more

References

Provenance