VIII · Safety & ControlMature★★

Cost Gating

also known as Budget Cap, Cost-Aware Approval

Block actions whose expected cost exceeds a threshold without explicit user (or operator) acknowledgement.

This pattern helps complete certain larger patterns —

  • specialisesHuman-in-the-Loop★★Require explicit human approval at defined points before the agent performs an action.
  • used-byComposable Termination ConditionsExpress agent stop criteria as small single-purpose conditions composed with AND/OR into one explicit termination contract instead of ad-hoc loop guards.

Context

A team runs an agent whose individual steps cost real money — large-context model calls billed by the token, paid third-party APIs, retrieval against an expensive vector store. A single user request can fan out into hundreds of such calls, and the bill arrives at the end of the month rather than at the moment of the action. Users have no way to see the cost building up while the agent works.

Problem

If the agent just executes whatever steps it judges useful, an over-eager research task can quietly burn through a hundred-euro budget on a question that should have cost one euro, and the user only finds out when the invoice arrives. If the agent asks for permission on every paid call, users learn to click through the prompts and the gating becomes theatre. Without a forecast of cost and a meaningful threshold, the team must choose between surprise bills and approval fatigue.

Forces

  • Estimating cost up front requires a model of what will happen.
  • Confirmation-fatigue: too many approvals train users to ignore them.
  • Budgets at multiple horizons (per call, per session, per month).

Example

An autonomous research agent is asked to 'thoroughly investigate' a niche market and quietly fans out into hundreds of web searches plus a few large-context summarisations, ringing up forty euros before producing a draft. The team adds Cost Gating: any step whose forecast cost (token volume × model rate) exceeds two euros prompts the user with the estimate, and any cumulative spend over twenty euros pauses the run for explicit acknowledgement. Surprise bills stop showing up.

Diagram

Solution

Therefore:

Estimate cost before invoking the expensive action. If the estimate exceeds the threshold, surface it to the user (or operator) and require explicit approval. Track running totals against per-session and per-period budgets.

What this pattern forbids. Actions exceeding the threshold cannot run without explicit acknowledgement.

And the patterns that stand alongside it, or against it —

  • complementsStep Budget★★Cap the number of tool calls or loop iterations the agent is allowed within a single request.
  • complementsMulti-Model Routing★★Send each request to the cheapest model that can handle it well.
  • complementsPrompt Caching★★Order prompts so the unchanging prefix can be cached by the provider, cutting per-call cost and latency.
  • complementsExtended Thinking★★Spend a configurable budget of internal reasoning tokens before producing a user-visible answer.
  • complementsCost Observability★★Surface per-request, per-user, and per-feature cost and token consumption to operators in near-real-time.
  • complementsRate Limiting★★Cap the number of requests, tokens, or tool calls per user (or session) within a time window.
  • alternative-toUnbounded Subagent SpawnAnti-pattern: a supervisor or orchestrator spawns sub-agents that can themselves spawn sub-agents without a global cap.
  • alternative-toToken-Economy BlindnessAnti-pattern: operate multi-agent loops with no per-run token budget or alarm, allowing recursive loops to silently accumulate $10k+ in undetected costs.
  • complementsRealtime API When BatchableAnti-pattern: use the realtime/synchronous model API for workloads whose latency budget would permit batching, paying 2–10× the unit cost for no user-visible benefit.
  • complementsMissing max_tokens CapAnti-pattern: call the model without an explicit max_tokens (or equivalent) so a single call can drain the run's budget on a runaway generation.
  • complementsAgent-Initiated PaymentGive an agent a bounded wallet so it can settle a payment mid-request to unlock a resource — answering a payment-required challenge with a verifiable proof — instead of routing every purchase through a human.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.