I · ReasoningExperimental·

Latent-Space Reasoning

also known as Continuous-Thought Reasoning, Coconut, Latent Chain-of-Thought

Let the model reason in continuous hidden-state space instead of decoding each step to text, feeding the last hidden state back as the next input embedding, so one latent step can hold several continuations.

Context

A team is building an agent that must do hard multi-step reasoning — planning that needs backtracking, logical deduction with dead ends. The standard approach is chain-of-thought: the model writes its reasoning out as text tokens, step by step. The team has to decide whether reasoning must happen in natural language at all, given that most of those tokens exist for fluent text rather than for the computation itself.

Problem

Forcing every reasoning step through natural-language tokens spends most of the compute on producing coherent words rather than on the few decisions that matter, and it makes the model commit to one continuation at each step — once a token is emitted, the path is chosen. Tasks that need to keep several options open and backtrack are penalised, because token-by-token decoding cannot represent 'either of these next steps' in a single state. The language channel becomes a bottleneck on reasoning that is shaped for human readers, not for search.

Forces

  • Most reasoning tokens ensure fluent text, not the computation the task needs.
  • Decoding to a token forces the model to commit to one continuation per step.
  • Tasks needing backtracking benefit from keeping several next steps open.
  • A hidden state can encode a distribution over continuations a single token cannot.
  • Reasoning that never becomes text is far harder to inspect and supervise.

Example

An agent solves a logic puzzle that requires trying a branch, hitting a contradiction, and backing up. With text chain-of-thought it commits to one branch per emitted token and struggles to backtrack. With latent-space reasoning it carries its reasoning as continuous hidden states, each of which can hold several candidate next moves at once, exploring breadth-first before decoding only the final solution to text — reaching the answer with fewer thinking tokens.

Diagram

Solution

Therefore:

Instead of decoding each reasoning step into a word token and re-encoding it, take the model's last hidden state as the reasoning state — a 'continuous thought' — and feed it directly back as the next input embedding. The model reasons through a sequence of these latent states and only decodes to text when it produces the final answer. Because a continuous state is not collapsed onto one token, it can encode several alternative next steps at once, letting the model explore breadth-first and defer commitment, which helps on tasks that require backtracking. Training mixes latent steps into the reasoning trace so the model learns to use them.

What this pattern forbids. Intermediate reasoning is not decoded to text; the model may emit tokens only for the final answer, and the continuous reasoning state cannot be read back as a natural-language trace.

And the patterns that stand alongside it, or against it —

  • alternative-toChain of Thought★★Elicit multi-step reasoning by prompting the model to produce intermediate steps before its final answer.
  • complementsTree of ThoughtsSearch over a tree of partial reasoning states with explicit lookahead, evaluation, and backtracking.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.