Anti-Patterns

Missing max_tokens Cap

Anti-pattern: call the model without an explicit max_tokens (or equivalent) so a single call can drain the run's budget on a runaway generation.

Problem

A single hallucinated loop in the output (the model rambling, repeating, or generating filler) consumes the full context budget on one call. This dominates the run cost. Worse, a slow generation locks up the agent thread for tens of seconds. Distinct from step-budget (which caps total agent steps) and cost-gating (which caps total spend) — this is the per-call output cap.

Solution

Set max_tokens per call site based on output schema. For structured-output schemas, derive the cap from the schema. For prose, use task-class defaults. Alert on cap-hit rate as a quality signal (it indicates undersized cap OR runaway generation). Pair with structured-output and step-budget.

When to use

  • Never. Cite when reviewing model call sites.
  • Set explicit max_tokens per call matched to expected output.
  • Alert on cap-hit rate as a quality signal.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related