Missing max_tokens Cap
also known as Unbounded Output Cap, No Output Budget
Anti-pattern: call the model without an explicit max_tokens (or equivalent) so a single call can drain the run's budget on a runaway generation.
Context
An agent calls a model that supports a max_tokens parameter (or the SDK exposes one). The call site omits the parameter or sets it to the model's max, on the reasoning that 'the agent wants full answers'.
Problem
A single hallucinated loop in the output (the model rambling, repeating, or generating filler) consumes the full context budget on one call. This dominates the run cost. Worse, a slow generation locks up the agent thread for tens of seconds. Distinct from step-budget (which caps total agent steps) and cost-gating (which caps total spend) — this is the per-call output cap.
Forces
- max_tokens defaults vary per SDK; some require explicit setting.
- Engineers underestimate how much a single call can over-produce when the prompt is even slightly off.
- Capping output too aggressively truncates legitimate answers.
Example
A summarization agent calls the model without max_tokens. A malformed prompt makes the model produce a 50,000-token rambling answer. One request costs more than the previous day's traffic. Discovered when the model gateway flags the call as anomalous.
Diagram
Solution
Therefore:
Set max_tokens per call site based on output schema. For structured-output schemas, derive the cap from the schema. For prose, use task-class defaults. Alert on cap-hit rate as a quality signal (it indicates undersized cap OR runaway generation). Pair with structured-output and step-budget.
What this pattern forbids. No useful constraint; the missing constraint is per-call output cap matched to expected output shape.
And the patterns that stand alongside it, or against it —
- complementsStep Budget★★— Cap the number of tool calls or loop iterations the agent is allowed within a single request.
- complementsCost Gating★★— Block actions whose expected cost exceeds a threshold without explicit user (or operator) acknowledgement.
- complementsStructured Output★★— Constrain the model's output to conform to a JSON Schema (or similar typed shape).
- complementsToken-Economy Blindness✕— Anti-pattern: operate multi-agent loops with no per-run token budget or alarm, allowing recursive loops to silently accumulate $10k+ in undetected costs.
- complementsUnbounded Loop✕— Anti-pattern: run the agent loop without a step budget and let model self-termination decide.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.