XIV · Anti-PatternsAnti-pattern

Missing max_tokens Cap

also known as Unbounded Output Cap, No Output Budget

Anti-pattern: call the model without an explicit max_tokens (or equivalent) so a single call can drain the run's budget on a runaway generation.

Context

An agent calls a model that supports a max_tokens parameter (or the SDK exposes one). The call site omits the parameter or sets it to the model's max, on the reasoning that 'the agent wants full answers'.

Problem

A single hallucinated loop in the output (the model rambling, repeating, or generating filler) consumes the full context budget on one call. This dominates the run cost. Worse, a slow generation locks up the agent thread for tens of seconds. Distinct from step-budget (which caps total agent steps) and cost-gating (which caps total spend) — this is the per-call output cap.

Forces

  • max_tokens defaults vary per SDK; some require explicit setting.
  • Engineers underestimate how much a single call can over-produce when the prompt is even slightly off.
  • Capping output too aggressively truncates legitimate answers.

Example

A summarization agent calls the model without max_tokens. A malformed prompt makes the model produce a 50,000-token rambling answer. One request costs more than the previous day's traffic. Discovered when the model gateway flags the call as anomalous.

Diagram

Solution

Therefore:

Set max_tokens per call site based on output schema. For structured-output schemas, derive the cap from the schema. For prose, use task-class defaults. Alert on cap-hit rate as a quality signal (it indicates undersized cap OR runaway generation). Pair with structured-output and step-budget.

What this pattern forbids. No useful constraint; the missing constraint is per-call output cap matched to expected output shape.

And the patterns that stand alongside it, or against it —

  • complementsStep Budget★★Cap the number of tool calls or loop iterations the agent is allowed within a single request.
  • complementsCost Gating★★Block actions whose expected cost exceeds a threshold without explicit user (or operator) acknowledgement.
  • complementsStructured Output★★Constrain the model's output to conform to a JSON Schema (or similar typed shape).
  • complementsToken-Economy BlindnessAnti-pattern: operate multi-agent loops with no per-run token budget or alarm, allowing recursive loops to silently accumulate $10k+ in undetected costs.
  • complementsUnbounded LoopAnti-pattern: run the agent loop without a step budget and let model self-termination decide.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance