Safety & Control

Cost-Aware Action Delegation

Classify every agent action by risk/cost and route each tier to a different approval policy, bounding the autonomy surface per-action instead of by one global flag.

Problem

Without per-action risk tiering, the autonomy decision collapses to one global switch. Either the agent acts on dangerous things without checking, or it asks before every read. Approval fatigue kills the second mode within a week; trust incidents kill the first. The team has no vocabulary for 'this action is fine to do unsupervised, this one needs to confirm with the user, this one needs to escalate to a human reviewer'.

Solution

Tag every action with a risk tier (low / medium / high, or a richer scheme). Map each tier to an approval policy: low → auto-execute, medium → confirm with the user, high → require human reviewer with explicit sign-off. The tier can be conditional on parameters (refund > $1000 → high). The agent's action surface is the union of permitted (tier, policy) pairs; the runtime enforces the policy independently of the agent's reasoning. Make the classifier itself reviewable — actions and their tiers are configuration, not prompt content.

When to use

  • The agent's action surface spans actions of materially different blast radius.
  • Operators need an audit trail of what risk class each executed action was in.
  • Some actions are parameter-conditional and would be misclassified by a single tier per action.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related