I · ReasoningExperimental·

Recursive Language Model

also known as RLM, Prompt-as-Environment Recursion, Recursive Inference

Treat an over-long prompt as an environment the model navigates by code, letting it partition and recursively call itself over snippets, so it answers over inputs far larger than its context window.

Context

A team needs an agent to reason over an input far larger than the model's context window — a huge codebase, a long transcript corpus, thousands of retrieved chunks. Stuffing everything into one prompt either does not fit or degrades sharply as the input grows. The team has to decide how the model can work over the whole input without being limited by what fits in a single call.

Problem

Truncation and naive chunking drop information the answer may depend on, and even when a long input fits, model accuracy falls as the prompt grows. Fixed map-reduce scaffolds impose one decomposition the model cannot adapt: they split the input the same way regardless of the question and lose cross-chunk structure. Compaction and summarization throw away detail before the model has decided what matters. The team needs the model itself to decide how to break the input down and to look only at the parts each sub-question needs.

Forces

  • The input is larger than the context window, so not all of it can be in one call.
  • Model accuracy degrades as the prompt grows, even within the window.
  • A fixed decomposition (map-reduce, summarize) cannot adapt to the question.
  • The model should look only at the snippets a sub-question actually needs.
  • Recursion and sub-calls add latency and cost that must stay comparable to alternatives.

Example

An agent must answer a question that depends on details scattered across a five-million-token log archive, far beyond its context window. Instead of truncating, it loads the archive into a code-interpreter variable, writes code to grep for the relevant sessions, partitions them, and recursively calls itself on each partition, then combines the findings. Only the snippets each sub-question needs ever enter a model call, and the agent answers over the whole archive at a cost comparable to a long-context scaffold.

Diagram

Solution

Therefore:

Place the long input in an environment the model can manipulate programmatically — for example a variable in a code interpreter — instead of pasting it into the prompt. The root model writes code to peek at, search, and partition the input, and spawns recursive calls to itself or a smaller sub-model over the snippets it selects, combining their results. Because the model decides at runtime how to grep, slice, and recurse, the decomposition adapts to the question, and only the relevant snippets ever enter any single call. Inputs orders of magnitude larger than the context window are handled at cost comparable to long-context scaffolds.

What this pattern forbids. The full input must not be forced into a single context window; the model may load only the snippets it selects from the prompt environment, and recursion depth must be bounded.

And the patterns that stand alongside it, or against it —

  • alternative-toLLM Map-Reduce IsolationProcess each untrusted document in its own sealed sub-agent and merge only structured outputs, so an injection in one document cannot steer the processing of others.
  • complementsCode Execution★★Let the model emit code, run it in a sandbox, and treat the run as the answer instead of trusting the model to compute in its head.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.