X · Governance & ObservabilityMature★★

Cost Observability

also known as Token Telemetry, Cost Dashboard

Surface per-request, per-user, and per-feature cost and token consumption to operators in near-real-time.

Context

A team is running an agent product in production that calls one or more paid model providers and a set of paid tools. Spend depends on which feature the user touched, which model was routed to, how long the conversation got, and how many tool calls the agent decided to make. Operators need to know in close to real time where the money is going, not weeks later when the invoice arrives.

Problem

Without per-feature, per-route, per-model attribution, an aggregate dashboard only shows that total tokens went up. A single bad routing decision, a chatty new prompt, or a runaway loop in one feature can multiply the bill for that feature ten times while the global average barely twitches. The team is forced to choose between learning about the problem from the monthly billing statement or building ad-hoc spreadsheets every time a number looks off.

Forces

  • Telemetry schema must capture which feature, which model, which user.
  • Real-time vs daily aggregation.
  • Privacy on per-user attribution.

Example

An ops team notices the monthly LLM bill has tripled but can't say which feature drove it — the dashboard only shows total tokens. By the time billing arrives the runaway feature has been live for weeks. They add Cost Observability: every request is tagged with feature, user, and tenant, and per-feature spend rolls up in near-real-time. Within an hour of a regression the team can see which feature now costs ten times what it did yesterday.

Diagram

Solution

Therefore:

Tag every model and tool call with feature, route, user (anonymised), and model id. Stream to a telemetry store. Build dashboards by feature, by model, by tier, by hour. Set alerts on anomalies. Pair with cost-gating for prevention.

What this pattern forbids. Calls without telemetry tags fall into an 'unattributed' bucket; some internal gateways enforce tag-or-reject.

And the patterns that stand alongside it, or against it —

  • complementsCost Gating★★Block actions whose expected cost exceeds a threshold without explicit user (or operator) acknowledgement.
  • complementsLineage Tracking★★Track which prompt version, model version, and data sources produced each agent output.
  • alternative-toDemo-to-Production CliffAnti-pattern: ship a demo-validated agent straight into production without a frozen eval, cost ceiling, loop-detector, or named oncall, then act surprised when accuracy drops and cost runs away.
  • alternative-toToken-Economy BlindnessAnti-pattern: operate multi-agent loops with no per-run token budget or alarm, allowing recursive loops to silently accumulate $10k+ in undetected costs.
  • complementsRealtime API When BatchableAnti-pattern: use the realtime/synchronous model API for workloads whose latency budget would permit batching, paying 2–10× the unit cost for no user-visible benefit.
  • complementsTop-Tier Model For Everything (Cost)Anti-pattern: route every request through the highest-tier model regardless of difficulty, treating cost as a model-choice problem instead of a routing one.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.