Reasoning
How the model thinks before acting.
17 patterns in this book. · Updated
When to reach for each
01. Chain of Thought Elicit multi-step reasoning by prompting the model to produce intermediate steps before its final answer. Best for: The task requires multi-step reasoning that single-shot answers fail at. Tradeoff: Single linear trace; no branching or self-correction. Watch for: The task is direct lookup or pattern completion where reasoning steps add no quality.
02. Extended Thinking Spend a configurable budget of internal reasoning tokens before producing a user-visible answer. Best for: The provider exposes a reasoning-budget API and you want to tune effort per request. Tradeoff: Cost spikes with budget. Watch for: Static prompt-based chain-of-thought already meets quality and cost targets.
03. Chain of Verification Reduce hallucination by drafting an answer, generating independent verification questions, answering them in isolation, and revising. Best for: The model hallucinates claims when it self-verifies in the same context as the draft. Tradeoff: 4x baseline cost. Watch for: The task has no factual claims to verify (pure stylistic or generative tasks).
04. Self-Ask Have the model emit explicit follow-up sub-questions, answer them (optionally via search), then compose the final answer. Best for: The task is multi-hop and the model knows each hop in isolation. Tradeoff: Latency: N sub-question calls per question. Watch for: Single-hop questions where decomposition adds latency without lift.
05. Test-Time Compute Scaling Allocate more inference-time compute (samples, search, deeper thinking) instead of scaling parameters to improve quality. Best for: Parameter scaling has saturated and inference-time techniques deliver further lift. Tradeoff: Latency-sensitive use cases cannot afford much. Watch for: Latency or cost budgets cannot absorb extra inference-time compute.
All patterns in this book
Chain of Thought
×4Elicit multi-step reasoning by prompting the model to produce intermediate steps before its final answer.
Extended Thinking
×4Spend a configurable budget of internal reasoning tokens before producing a user-visible answer.
Chain of Verification
×3Reduce hallucination by drafting an answer, generating independent verification questions, answering them in isolation, and revising.
Self-Ask
Have the model emit explicit follow-up sub-questions, answer them (optionally via search), then compose the final answer.
Test-Time Compute Scaling
Allocate more inference-time compute (samples, search, deeper thinking) instead of scaling parameters to improve quality.
Zero-Shot Chain-of-Thought
Elicit step-by-step reasoning with a single trigger phrase rather than few-shot exemplars.
Adaptive Compute Allocation
Allocate inference-time compute (thinking tokens, samples, depth, model size) per query based on input difficulty, rather than using a fixed budget across all queries.
Generate-and-Test Strategy
Generate multiple candidate solutions in parallel, then systematically test each against declared constraints rather than committing to the first plausible one — adapted from Langley & Simon's cognit…
Large Reasoning Model (LRM) Paradigm
Route reasoning-heavy tasks to a reasoning-tuned model that trades inference time for deliberation, rather than to a fast LLM that exhibits premature-closure.
Least-to-Most Prompting
Decompose a hard problem into an ordered list of easier subproblems, then solve them sequentially with each answer feeding the next.
ReST-EM
Iterate generate → reward-filter → fine-tune to bootstrap reasoning capabilities without human-labelled data.
Socratic Questioning Agent
Drive the agent toward its goal by asking the user a sequence of strategic, open-ended questions that surface the user's own latent knowledge, goal, or context — rather than producing an answer direc…
STaR Bootstrapping
Bootstrap a model's reasoning by training it on its own correct chain-of-thought outputs.
Tree of Thoughts
Search over a tree of partial reasoning states with explicit lookahead, evaluation, and backtracking.
Graph of Thoughts
Model reasoning as an arbitrary DAG so thoughts can be merged, refined, and aggregated across branches.
Latent-Space Reasoning
Let the model reason in continuous hidden-state space instead of decoding each step to text, feeding the last hidden state back as the next input embedding, so one latent step can hold several contin…
Recursive Language Model
Treat an over-long prompt as an environment the model navigates by code, letting it partition and recursively call itself over snippets, so it answers over inputs far larger than its context window.