Cost Gating

also known as Budget Cap, Cost-Aware Approval

Block actions whose expected cost exceeds a threshold without explicit user (or operator) acknowledgement.

This pattern helps complete certain larger patterns —

specialisesHuman-in-the-Loop★★— Require explicit human approval at defined points before the agent performs an action.
used-byComposable Termination Conditions★— Express agent stop criteria as small single-purpose conditions composed with AND/OR into one explicit termination contract instead of ad-hoc loop guards.

Context

A team runs an agent whose individual steps cost real money — large-context model calls billed by the token, paid third-party APIs, retrieval against an expensive vector store. A single user request can fan out into hundreds of such calls, and the bill arrives at the end of the month rather than at the moment of the action. Users have no way to see the cost building up while the agent works.

Problem

If the agent just executes whatever steps it judges useful, an over-eager research task can quietly burn through a hundred-euro budget on a question that should have cost one euro, and the user only finds out when the invoice arrives. If the agent asks for permission on every paid call, users learn to click through the prompts and the gating becomes theatre. Without a forecast of cost and a meaningful threshold, the team must choose between surprise bills and approval fatigue.

Forces

Estimating cost up front requires a model of what will happen.
Confirmation-fatigue: too many approvals train users to ignore them.
Budgets at multiple horizons (per call, per session, per month).

Example

An autonomous research agent is asked to 'thoroughly investigate' a niche market and quietly fans out into hundreds of web searches plus a few large-context summarisations, ringing up forty euros before producing a draft. The team adds Cost Gating: any step whose forecast cost (token volume × model rate) exceeds two euros prompts the user with the estimate, and any cumulative spend over twenty euros pauses the run for explicit acknowledgement. Surprise bills stop showing up.

Diagram

flowchart TD A[Action proposed] --> E[Estimate cost] E --> C{Cost > threshold?} C -- no --> X[Execute] C -- yes --> AP[Surface to user/operator] AP --> AR{Approved?} AR -- yes --> X AR -- no --> B[Block] X --> T[Track running totals]

Solution

Therefore:

Estimate cost before invoking the expensive action. If the estimate exceeds the threshold, surface it to the user (or operator) and require explicit approval. Track running totals against per-session and per-period budgets.

What this pattern forbids. Actions exceeding the threshold cannot run without explicit acknowledgement.

And the patterns that stand alongside it, or against it —

complementsStep Budget★★— Cap the number of tool calls or loop iterations the agent is allowed within a single request.
complementsMulti-Model Routing★★— Send each request to the cheapest model that can handle it well.
complementsPrompt Caching★★— Order prompts so the unchanging prefix can be cached by the provider, cutting per-call cost and latency.
complementsExtended Thinking★★— Spend a configurable budget of internal reasoning tokens before producing a user-visible answer.
complementsCost Observability★★— Surface per-request, per-user, and per-feature cost and token consumption to operators in near-real-time.
complementsRate Limiting★★— Cap the number of requests, tokens, or tool calls per user (or session) within a time window.
alternative-toUnbounded Subagent Spawn✕— Anti-pattern: a supervisor or orchestrator spawns sub-agents that can themselves spawn sub-agents without a global cap.
alternative-toToken-Economy Blindness✕— Anti-pattern: operate multi-agent loops with no per-run token budget or alarm, allowing recursive loops to silently accumulate $10k+ in undetected costs.
complementsRealtime API When Batchable✕— Anti-pattern: use the realtime/synchronous model API for workloads whose latency budget would permit batching, paying 2–10× the unit cost for no user-visible benefit.
complementsMissing max_tokens Cap✕— Anti-pattern: call the model without an explicit max_tokens (or equivalent) so a single call can drain the run's budget on a runaway generation.
complementsAgent-Initiated Payment★— Give an agent a bounded wallet so it can settle a payment mid-request to unlock a resource — answering a payment-required challenge with a verifiable proof — instead of routing every purchase through a human.
complementsVelocity-and-Magnitude Governor★— Hard-code per-unit-time caps on the financial magnitude of agent actions, and on any deviation beyond a statistical threshold force a downgrade from human-on-the-loop to human-in-the-loop.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

Safety Hardening
hardening

Used in frameworks

Sparrot
first-class75 patternsDomain Agents· experimental
Premium-model access is gated behind an explicit, time-boxed (≤10 min) written grant; without an active grant the router stays on cheap models, and grant + revoke + each routing d…

References

Rate limits
doc

Provenance

Source: patterns/cost-gating.md on GitHub · commit 4fa1213 · view history
Added to catalog: 2026-04-30
Last updated: 2026-05-26
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.