Planning & Control Flow

Language Agent Tree Search

Lift the agent loop into a search tree with a learned value function and backtracking.

Problem

Single-chain agent loops like ReAct (the reason-act-observe loop) and Plan-and-Execute commit to one chain of thought from the first step. When that chain enters a wrong frame they cannot backtrack cheaply; they either thrash inside the wrong frame or restart from scratch. Self-consistency (sample many answers and vote) helps for one-shot tasks but does not help an agent that needs to interleave tool calls with reasoning. The team needs a way to explore alternative trajectories while still spending most of the compute on the branches that are paying off.

Solution

Apply Monte Carlo Tree Search (MCTS) to the agent loop. Each node is a partial trajectory. Expansion samples next thoughts/actions. Backpropagation updates a value estimate. Selection chooses the next node by UCT. The agent can backtrack from a failing branch instead of committing.

When to use

Single-chain agent loops commit too early on ambiguous problems.
A learned or heuristic value function can score partial trajectories.
Backtracking from failing branches is worth the search overhead.

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Problem

Solution

When to use

Related