Vectorless Reasoning-Based Retrieval

also known as Reasoning-Based RAG, Tree-Search Retrieval, Table-of-Contents Retrieval, Vectorless RAG

Retrieve by having the model reason its way down a document's own table-of-contents tree to the relevant sections, instead of embedding chunks and ranking them by vector similarity.

This pattern helps complete certain larger patterns —

used-byAgentic RAG★★— Replace static retrieve-then-generate with autonomous agents that plan, choose sources, retrieve iteratively, reflect, and re-query.

Context

A team answers questions over long, structured professional documents — financial filings, contracts, regulatory manuals, technical specifications — where the source already carries a clear hierarchy of parts, sections, and subsections. The standard retrieval-augmented pipeline splits each document into fixed-size chunks, embeds them, and at query time returns the chunks whose embeddings sit closest to the query in vector space. On these documents that pipeline keeps surfacing passages that look similar to the question but are not the ones that answer it, and chunk boundaries cut tables, clauses, and definitions in half.

Problem

Vector similarity is a proxy for relevance, and on long professional documents the proxy breaks down: the passage that repeats the query's words is often not the passage that answers it, while the passage that does answer it shares little surface vocabulary. Fixed-size chunking compounds the mismatch by severing the structure the document relied on to make sense — a number is separated from the line item it belongs to, a clause from the term it defines. The retrieved context is also opaque: the system returns vector hits with no account of why this span and not another, so an analyst cannot audit the retrieval and the generator inherits whatever the embedding happened to rank highest.

Forces

Surface similarity and topical relevance diverge on dense, jargon-heavy documents, yet similarity is what embeddings measure.
Chunking is needed to fit an embedding window but destroys the document's own structure.
Reasoning over structure is more accurate but spends an LLM call per navigation step, where a vector lookup is a single cheap nearest-neighbour query.
Retrieval that cannot be explained cannot be audited, which matters most in the regulated domains where these documents live.

When to use

Documents are long and carry a clear, reliable hierarchy of parts, sections, and subsections worth navigating.
The domain is one where vocabulary overlap misleads similarity search — finance, law, regulatory, technical manuals.
Retrieval must be auditable, with each result pointing to a named page and section.
Keeping spans intact — tables, clauses, definitions — matters more than embedding-window economy.

When not to use

The corpus is large and unstructured and queries are broad lookups that similarity search already serves well.
Documents are flat or have no usable section structure for the model to navigate.
Per-query latency and cost budgets cannot absorb an LLM call per navigation step.
Queries are corpus-wide sensemaking that needs cross-document aggregation rather than within-document navigation.

Example

An analyst asks a question over a 200-page annual report — what was the year-over-year change in operating cash flow, and what did management attribute it to? A chunk-and-embed pipeline returns several passages that mention operating cash flow but not the one tying the figure to management's explanation, because that explanatory paragraph shares little vocabulary with the question. The team switches to a vectorless approach: the report is parsed into its own table-of-contents tree, and the model reads the section titles, descends into the cash-flow statement and then the management-discussion section, and returns those two sections with their page numbers. The answer is grounded in named sections the analyst can open and check, and no chunk boundary split the figure from its surrounding line items.

Diagram

flowchart TD D[Long structured document] --> P[Parse into section tree with node summaries] P --> T[Tree index - no vectors, no chunks] Q[Query] --> NAV{Navigator reasons over node summaries} T --> NAV NAV -- choose branch --> NAV NAV -- relevant leaves reached --> L[Read intact leaf sections] L --> R[Return sections with page and section refs] R --> G[Generator answers with citations]

Solution

Therefore:

At index time, parse the document into a tree that mirrors its natural structure — parts, sections, subsections — and write a short summary at each node, keeping the leaf text intact rather than splitting it into fixed-size chunks. No embeddings are computed and no vector store is built. At query time, present the model with the tree as a table of contents and have it judge which branch is most likely to hold the answer, descend into that node, and repeat — a tree search in which the model, not a similarity score, decides each step. The walk ends at the leaf sections the model judges relevant, and retrieval returns those sections together with their page and section identifiers, so every result is traceable to a named location in the source. Compose with a generator that reads the returned sections, and with citation-attribution since the page and section references are already in hand.

What it gives you

Retrieval follows the document's own structure, so spans stay whole and a result is a named section rather than an arbitrary window.
Every retrieval is traceable to a page and section, which makes the step auditable and feeds citations directly.
There is no embedding model, vector store, or chunking pipeline to build, tune, or keep in sync as the corpus changes.
Relevance is a reasoning judgement, so a section that answers the query in different words than it uses is still reachable.

What it costs you

Each navigation step is an LLM call, so retrieval latency and cost scale with tree depth rather than with a single nearest-neighbour lookup.
A wrong branch choice high in the tree is unrecoverable for that walk — the same failure mode as any top-down routing.
The approach assumes the document has a usable hierarchy; flat or poorly structured sources give the model little to navigate.
It targets retrieval within structured documents and does not address corpus-wide retrieval across many unstructured sources, where similarity search still earns its place.

What this pattern forbids. Retrieval may only return sections the model reaches by reasoning down the document tree; a passage the walk never descends into is not retrievable, and there is no similarity-ranked fallback over the whole corpus.

And the patterns that stand alongside it, or against it —

alternative-toNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
alternative-toHierarchical Retrieval★★— Route a query through a multi-level cascade — coarse source or index selection, then per-source narrower retrieval, then chunk-level — so each retrieval decision is pushed to the cheapest tier that can answer it.
alternative-toGraphRAG★— Build an LLM-extracted entity-and-relation knowledge graph plus hierarchical community summaries, then answer global queries via map-reduce over those summaries.
complementsCitation Attribution★★— Track and surface, alongside a RAG-grounded answer, which retrieved chunks supported which claims, so the binding between answer span and source survives all the way to the user.
complementsTable-Augmented Generation·— Answer a natural-language question over a database in three stages — synthesise an executable query, run it against the data layer with model calls embedded in execution, then generate the answer from the result.