Adaptive Branching Tree Search
also known as AB-MCTS, 適応的分岐モンテカルロ木探索, TreeQuest, Multi-LLM AB-MCTS
At each node of an inference-time search tree, use Thompson sampling to decide whether to deepen an existing answer or branch a fresh attempt, optionally choosing per-node which underlying LLM to invoke.
This pattern helps complete certain larger patterns —
- specialisesLanguage Agent Tree Search·— Lift the agent loop into a search tree with a learned value function and backtracking.
- specialisesTest-Time Compute Scaling★★— Allocate more inference-time compute (samples, search, deeper thinking) instead of scaling parameters to improve quality.
Context
A team is using a large language model to attack problems whose outputs can be scored — running code against tests, checking a math answer, or grading an abstract-reasoning puzzle. They have a fixed budget of model calls to spend at inference time and want to spend it better than a flat sampling pass would. Several models with different strengths may be available at once, and the controller can choose which to call at each step.
Problem
Existing inference-time search schemes commit to a fixed shape. Monte Carlo Tree Search over language-model rollouts uses a fixed branching factor and treats every node the same; tree-of-thoughts expands at a fixed width; best-of-N is flat and never refines anything. None of these adapt the trade-off between trying more fresh attempts and refining a promising one based on what the scores are actually telling the controller, and none can pick a different model for a hard node. On difficult problems this leaves a lot of compute on payoff-poor branches.
Forces
- Width (more fresh attempts) and depth (refining existing ones) compete for the same budget.
- The right width/depth balance differs per node and is not known in advance.
- Multiple LLMs have complementary failure modes; picking the right one per node is itself a search axis.
- Thompson sampling is principled but adds bookkeeping over plain MCTS.
- Inference-time compute is expensive; wasted rollouts hurt directly.
Example
A team tackles ARC-AGI-2 puzzles with three different LLMs. They drop the puzzles into TreeQuest, which builds a search tree where each node decides via Thompson sampling whether to refine the current candidate program, generate a fresh sibling, and which of the three models to use. After a fixed compute budget the tree has concentrated rollouts on the branches that scored well — and on the model that turned out to handle that puzzle family best — producing higher pass rates than flat best-of-N at the same cost.
Diagram
Solution
Therefore:
Each node in the search tree maintains posterior estimates over the value of its possible actions. Actions are: refine the current candidate (deepen), generate a fresh sibling (branch), and — in the multi-LLM variant — which model to call. At each step the controller draws a Thompson sample from the per-action posterior and picks the highest sampled value; the resulting rollout's score updates the posterior. Over many rollouts the tree concentrates compute on the branches and models that are paying off. The score function must be either verifiable (compiler, test, oracle) or a trusted evaluator. The framework runs until a budget or success threshold is hit.
What this pattern forbids. The controller must update posteriors from observed rollout scores before drawing the next sample; node expansion must not exceed the declared budget; the agent itself cannot bypass the Thompson sample to pick a favoured branch directly.
The smaller patterns that complete this one —
- generalisesBest-of-N Sampling★— Sample N candidate outputs and select the highest-ranked by a reward model or scorer.
And the patterns that stand alongside it, or against it —
- alternative-toTree of Thoughts★— Search over a tree of partial reasoning states with explicit lookahead, evaluation, and backtracking.
- complementsSelf-Consistency★★— Sample the same question multiple times at non-zero temperature and aggregate by majority or judge to mitigate hallucination.
- complementsMulti-Path Plan Generator★★— Generate multiple candidate next-steps at each plan node enabling later selection — the planning generator pattern paired with tree-of-thoughts / LATS-style search.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.