Reasoning

Extended Thinking

Spend a configurable budget of internal reasoning tokens before producing a user-visible answer.

Problem

If the team relies on prompt-based chain-of-thought, the reasoning ends up mixed into the user-visible response, and the same prompt has to drive both easy and hard tasks. They have no clean control to say 'spend more compute on this one' without rewriting the prompt for that request, and the visible reasoning pollutes downstream turns by leaving long traces in the conversation. They need a way to dial up internal reasoning effort per request while keeping the response itself focused, and they need to be able to monitor how many reasoning tokens each request actually consumed.

Solution

Use the provider's reasoning-mode API (OpenAI o-series reasoning effort, Anthropic Claude extended thinking budget_tokens, Gemini thinking budget). Set budget per request based on task difficulty (cheap for routing, expensive for hard reasoning). Monitor reasoning-token consumption.

When to use

  • The provider exposes a reasoning-budget API and you want to tune effort per request.
  • Some tasks (routing, classification) need cheap reasoning and others (hard problems) need expensive reasoning.
  • Internal opaque reasoning that the user does not see is acceptable for the deployment.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related