Large Reasoning Model (LRM) Paradigm

also known as LRM, Reasoning-Tuned Model, Inference-Time Reasoning

Route reasoning-heavy tasks to a reasoning-tuned model that trades inference time for deliberation, rather than to a fast LLM that exhibits premature-closure.

Context

A task involves interconnected constraints, multi-step deduction, math, or formal reasoning. Standard LLMs (GPT-4o-class) respond fast but make systematic errors on constraint-heavy problems because next-token prediction biases toward fluency over correctness. Reasoning-tuned models exist (o1 family, DeepSeek R1, Gemini Thinking) — slow but methodical.

Problem

Routing every task to a fast LLM means constraint-heavy tasks fail in characteristic ways (premature-closure, false-confidence-syndrome). Routing everything to an LRM is slow and expensive for easy tasks. The team needs a routing decision.

Forces

LRM latency is 10–100× LLM (often minutes).
LRM cost is higher per token.
Some tasks genuinely need fast response; LRM is unacceptable there.

Example

A financial-analysis agent handles two query types: 'what was Apple's Q3 revenue' (simple lookup) and 'given these 12 covenants, can this acquisition close?' (multi-constraint reasoning). Router sends the first to GPT-4o-mini (200ms, $0.0001). Second goes to o1 (90s, $0.40, methodically tests each covenant against the term sheet). Both succeed at their task class; routing keeps cost bounded.

Diagram

flowchart TD Req[Task] --> Class[Classify reasoning load] Class -->|simple| LLM[Fast LLM] Class -->|constraint-heavy / multi-step| LRM[Large Reasoning Model] LLM --> Out[Response] LRM --> Out

Solution

Therefore:

Build a router that classifies tasks: simple lookups / generation → LLM; multi-step math, formal reasoning, interconnected-constraint problems → LRM. Track per-class success rate to refine routing. Pair with complexity-based-routing, multi-model-routing, test-time-compute-scaling, generate-and-test-strategy, golden-rule-simpler-is-better (don't overuse LRM).

What this pattern forbids. LRM is used only for tasks classified as constraint-heavy / multi-step-reasoning; routing decisions are logged and reviewed.

And the patterns that stand alongside it, or against it —

complementsComplexity-Based Routing★— Estimate a request's difficulty up front and bind it to the cheapest model tier that can answer well, using an explicit complexity classifier as the routing key.
complementsMulti-Model Routing★★— Send each request to the cheapest model that can handle it well.
complementsTest-Time Compute Scaling★★— Allocate more inference-time compute (samples, search, deeper thinking) instead of scaling parameters to improve quality.
complementsExtended Thinking★★— Spend a configurable budget of internal reasoning tokens before producing a user-visible answer.
complementsGenerate-and-Test Strategy★— Generate multiple candidate solutions in parallel, then systematically test each against declared constraints rather than committing to the first plausible one — adapted from Langley & Simon's cognitive-science research on human expert problem-solving.
alternative-toContext Fragmentation✕— Anti-pattern: the LLM cannot hold multiple interconnected constraints in mind simultaneously the way human working memory can; it processes each constraint locally and loses the cross-constraint view.
alternative-toPremature Closure✕— The LLM commits to a confident answer before processing all constraints, characteristic of constraint-heavy tasks where it fills in plausible answers fast and gets cross-constraint interactions wrong.
complementsTest-Time Memorization (Titans)·— Memory module that learns at inference time by incorporating recent inputs into its parameters during the session rather than relying solely on pre-trained weights.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in frameworks

References

Provenance

Source: patterns/large-reasoning-model-paradigm.md on GitHub · commit 4002557 · view history
Added to catalog: 2026-05-23
Last updated: 2026-05-23
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.