Safety & Control

Rate Limiting

Cap the number of requests, tokens, or tool calls per user (or session) within a time window.

Problem

Without per-identity limits, a single caller can drain the month's token budget in a few hours, hit downstream provider rate limits and starve every other user, or simply run up an unbounded bill the operator did not authorise. Imposing one global cap is too blunt — it punishes everyone for one bad actor — and trusting users to behave reasonably has never worked at scale. The team is forced to choose between generous limits that hurt cost and tight limits that hurt legitimate users.

Solution

Define limits per identity at multiple horizons (per minute, per hour, per day). Use token-bucket or sliding-window counters. Apply at API gateway and at agent loop level. Surface limit hits to the user clearly.

When to use

  • A single user or compromised account could otherwise bankrupt the product or starve others.
  • Limits per identity can be enforced at API gateway and inside the agent loop.
  • Limit hits can be surfaced to users in a clear, actionable way.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related