I · ReasoningEmerging

Large Reasoning Model (LRM) Paradigm

also known as LRM, Reasoning-Tuned Model, Inference-Time Reasoning

Route reasoning-heavy tasks to a reasoning-tuned model that trades inference time for deliberation, rather than to a fast LLM that exhibits premature-closure.

Context

A task involves interconnected constraints, multi-step deduction, math, or formal reasoning. Standard LLMs (GPT-4o-class) respond fast but make systematic errors on constraint-heavy problems because next-token prediction biases toward fluency over correctness. Reasoning-tuned models exist (o1 family, DeepSeek R1, Gemini Thinking) — slow but methodical.

Problem

Routing every task to a fast LLM means constraint-heavy tasks fail in characteristic ways (premature-closure, false-confidence-syndrome). Routing everything to an LRM is slow and expensive for easy tasks. The team needs a routing decision.

Forces

  • LRM latency is 10–100× LLM (often minutes).
  • LRM cost is higher per token.
  • Some tasks genuinely need fast response; LRM is unacceptable there.

Example

A financial-analysis agent handles two query types: 'what was Apple's Q3 revenue' (simple lookup) and 'given these 12 covenants, can this acquisition close?' (multi-constraint reasoning). Router sends the first to GPT-4o-mini (200ms, $0.0001). Second goes to o1 (90s, $0.40, methodically tests each covenant against the term sheet). Both succeed at their task class; routing keeps cost bounded.

Diagram

Solution

Therefore:

Build a router that classifies tasks: simple lookups / generation → LLM; multi-step math, formal reasoning, interconnected-constraint problems → LRM. Track per-class success rate to refine routing. Pair with complexity-based-routing, multi-model-routing, test-time-compute-scaling, generate-and-test-strategy, golden-rule-simpler-is-better (don't overuse LRM).

What this pattern forbids. LRM is used only for tasks classified as constraint-heavy / multi-step-reasoning; routing decisions are logged and reviewed.

And the patterns that stand alongside it, or against it —

  • complementsComplexity-Based RoutingEstimate a request's difficulty up front and bind it to the cheapest model tier that can answer well, using an explicit complexity classifier as the routing key.
  • complementsMulti-Model Routing★★Send each request to the cheapest model that can handle it well.
  • complementsTest-Time Compute Scaling★★Allocate more inference-time compute (samples, search, deeper thinking) instead of scaling parameters to improve quality.
  • complementsExtended Thinking★★Spend a configurable budget of internal reasoning tokens before producing a user-visible answer.
  • complementsGenerate-and-Test StrategyGenerate multiple candidate solutions in parallel, then systematically test each against declared constraints rather than committing to the first plausible one — adapted from Langley & Simon's cognitive-science research on human expert problem-solving.
  • alternative-toContext FragmentationAnti-pattern: the LLM cannot hold multiple interconnected constraints in mind simultaneously the way human working memory can; it processes each constraint locally and loses the cross-constraint view.
  • alternative-toPremature ClosureThe LLM commits to a confident answer before processing all constraints, characteristic of constraint-heavy tasks where it fills in plausible answers fast and gets cross-constraint interactions wrong.
  • complementsTest-Time Memorization (Titans)·Memory module that learns at inference time by incorporating recent inputs into its parameters during the session rather than relying solely on pre-trained weights.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.