Test-Time Compute Scaling
Allocate more inference-time compute (samples, search, deeper thinking) instead of scaling parameters to improve quality.
Problem
A single-pass call to even a strong model under-uses the compute available at inference time. The team knows several inference-time techniques exist — drawing many samples and picking the best, voting across many samples, searching over reasoning trees, allocating more internal reasoning tokens — but each technique shines on a different kind of task. Without a deliberate policy for how to spend inference budget per task class, the team leaves easy quality gains on the floor and pays too much on the items that would not have benefited.
Solution
Pick the inference-time technique that fits: best-of-N for verifier-amenable tasks, self-consistency for sampling-amenable tasks, tree search for combinatorial tasks, extended thinking for sequential reasoning. Compose techniques where complementary. Tune the compute budget per task class.
When to use
- Parameter scaling has saturated and inference-time techniques deliver further lift.
- The task is amenable to a known technique (best-of-N, self-consistency, tree search, extended thinking).
- Compute budget at inference time is available and worth spending for quality.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.