Anti-Patterns

Realtime API When Batchable

Anti-pattern: use the realtime/synchronous model API for workloads whose latency budget would permit batching, paying 2–10× the unit cost for no user-visible benefit.

Problem

Realtime API pricing is 2–10× the batch tier on every major provider. For workloads where latency could be 1h or 24h, this is pure overspend. The team often is not aware the batch API exists, or rejected it early as 'complex'. Cost shows up as a flat line in the bill: '$N per million tokens' instead of 'half of $N per million tokens'.

Solution

Identify model calls whose results are consumed asynchronously. Submit them via the provider's batch API (50% cheaper at OpenAI, similar at Anthropic). Poll or webhook for completion. Reserve realtime for genuinely user-facing or sub-minute-latency workloads. Track 'realtime calls without realtime latency requirement' as a metric in cost-observability.

When to use

  • Never. Cite when reviewing async jobs that use realtime APIs.
  • Classify workloads by latency budget and route to batch where the budget allows.
  • Track 'realtime calls without realtime SLA need' as a cost-waste metric.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related