← All booksBook I

Reasoning

How the model thinks before acting.

17 patterns in this book. · Updated

↓ download as png

When to reach for each

01. Chain of Thought Elicit multi-step reasoning by prompting the model to produce intermediate steps before its final answer. Best for: The task requires multi-step reasoning that single-shot answers fail at. Tradeoff: Single linear trace; no branching or self-correction. Watch for: The task is direct lookup or pattern completion where reasoning steps add no quality.

02. Extended Thinking Spend a configurable budget of internal reasoning tokens before producing a user-visible answer. Best for: The provider exposes a reasoning-budget API and you want to tune effort per request. Tradeoff: Cost spikes with budget. Watch for: Static prompt-based chain-of-thought already meets quality and cost targets.

03. Chain of Verification Reduce hallucination by drafting an answer, generating independent verification questions, answering them in isolation, and revising. Best for: The model hallucinates claims when it self-verifies in the same context as the draft. Tradeoff: 4x baseline cost. Watch for: The task has no factual claims to verify (pure stylistic or generative tasks).

04. Self-Ask Have the model emit explicit follow-up sub-questions, answer them (optionally via search), then compose the final answer. Best for: The task is multi-hop and the model knows each hop in isolation. Tradeoff: Latency: N sub-question calls per question. Watch for: Single-hop questions where decomposition adds latency without lift.

05. Test-Time Compute Scaling Allocate more inference-time compute (samples, search, deeper thinking) instead of scaling parameters to improve quality. Best for: Parameter scaling has saturated and inference-time techniques deliver further lift. Tradeoff: Latency-sensitive use cases cannot afford much. Watch for: Latency or cost budgets cannot absorb extra inference-time compute.

All patterns in this book

Chain of Thought

×4

Elicit multi-step reasoning by prompting the model to produce intermediate steps before its final answer.

Extended Thinking

×4

Spend a configurable budget of internal reasoning tokens before producing a user-visible answer.

Chain of Verification

×3

Reduce hallucination by drafting an answer, generating independent verification questions, answering them in isolation, and revising.

Self-Ask

Have the model emit explicit follow-up sub-questions, answer them (optionally via search), then compose the final answer.

Test-Time Compute Scaling

Allocate more inference-time compute (samples, search, deeper thinking) instead of scaling parameters to improve quality.

Adaptive Compute Allocation

Allocate inference-time compute (thinking tokens, samples, depth, model size) per query based on input difficulty, rather than using a fixed budget across all queries.

Generate-and-Test Strategy

Generate multiple candidate solutions in parallel, then systematically test each against declared constraints rather than committing to the first plausible one — adapted from Langley & Simon's cognit…

Large Reasoning Model (LRM) Paradigm

Route reasoning-heavy tasks to a reasoning-tuned model that trades inference time for deliberation, rather than to a fast LLM that exhibits premature-closure.

Least-to-Most Prompting

Decompose a hard problem into an ordered list of easier subproblems, then solve them sequentially with each answer feeding the next.

ReST-EM

Iterate generate → reward-filter → fine-tune to bootstrap reasoning capabilities without human-labelled data.

Socratic Questioning Agent

Drive the agent toward its goal by asking the user a sequence of strategic, open-ended questions that surface the user's own latent knowledge, goal, or context — rather than producing an answer direc…

STaR Bootstrapping

Bootstrap a model's reasoning by training it on its own correct chain-of-thought outputs.

Tree of Thoughts

Search over a tree of partial reasoning states with explicit lookahead, evaluation, and backtracking.

Graph of Thoughts

Model reasoning as an arbitrary DAG so thoughts can be merged, refined, and aggregated across branches.

Latent-Space Reasoning

Let the model reason in continuous hidden-state space instead of decoding each step to text, feeding the last hidden state back as the next input embedding, so one latent step can hold several contin…

Recursive Language Model

Treat an over-long prompt as an environment the model navigates by code, letting it partition and recursively call itself over snippets, so it answers over inputs far larger than its context window.