VIII · Safety & ControlEmerging

Cost-Aware Action Delegation

also known as Risk-Tiered Action Approval, Per-Action Autonomy

Classify every agent action by risk/cost and route each tier to a different approval policy, bounding the autonomy surface per-action instead of by one global flag.

Context

An agent has access to a mixed action surface: reading a file, calling a search API, sending an email, modifying a CRM record, refunding an order, terminating a cloud resource. A single 'auto-approve everything' flag treats sending an email the same as refunding $10,000. A single 'require approval for everything' flag turns the agent into a typing-assist tool.

Problem

Without per-action risk tiering, the autonomy decision collapses to one global switch. Either the agent acts on dangerous things without checking, or it asks before every read. Approval fatigue kills the second mode within a week; trust incidents kill the first. The team has no vocabulary for 'this action is fine to do unsupervised, this one needs to confirm with the user, this one needs to escalate to a human reviewer'.

Forces

  • Risk varies by action type and sometimes by parameter value (refund $5 vs refund $5000).
  • Approval fatigue dominates if every action requires confirmation.
  • Trust incidents dominate if no action requires confirmation.
  • Risk tiers must be a small enumeration that humans can reason about.

Example

A customer-ops agent has 30 actions. `search_orders` is low (auto). `update_shipping_address` is medium (confirm with the requesting customer-rep). `refund_order` is parameter-conditional: refunds under $100 are medium, refunds $100-1000 require manager sign-off, refunds over $1000 require both manager and finance approval. The agent's reasoning never gates the action; the runtime classifier does.

Diagram

Solution

Therefore:

Tag every action with a risk tier (low / medium / high, or a richer scheme). Map each tier to an approval policy: low → auto-execute, medium → confirm with the user, high → require human reviewer with explicit sign-off. The tier can be conditional on parameters (refund > $1000 → high). The agent's action surface is the union of permitted (tier, policy) pairs; the runtime enforces the policy independently of the agent's reasoning. Make the classifier itself reviewable — actions and their tiers are configuration, not prompt content.

What this pattern forbids. An agent must not execute an action without consulting its risk tier; the approval policy for that tier must complete before the action proceeds.

The smaller patterns that complete this one —

  • usesApproval Queue★★Queue agent-proposed actions for asynchronous human review while the agent continues other work.
  • usesHuman-in-the-Loop★★Require explicit human approval at defined points before the agent performs an action.

And the patterns that stand alongside it, or against it —

  • composes-withPolicy-as-Code GateEvaluate every proposed agent action against externally-managed machine-readable policies before dispatch, so compliance authorship lives outside the prompt and outside the agent code.
  • composes-with[crawl-walk-run-automation-gating]
  • complementsAutonomy SliderExpose agent autonomy as a continuous adjustable parameter so the same codebase can span scripted assistant to fully autonomous worker without re-architecting.
  • complementsTwo Human TouchpointsPlace exactly two human-in-the-loop checkpoints in agentic pipelines: one at content selection and one at final review before publication.
  • alternative-toAgent Privilege EscalationAnti-pattern: let an agent's effective permissions be the union of its own identity, the identities of its tools, and the identities of the services those tools call.
  • composes-withProgressive DelegationStage the human-to-agent handoff over time: the agent starts producing drafts a human always reviews; its autonomy expands action-by-action as measured trust accrues.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.