IV · Retrieval & RAGExperimental·

Vectorless Reasoning-Based Retrieval

also known as Reasoning-Based RAG, Tree-Search Retrieval, Table-of-Contents Retrieval, Vectorless RAG

Retrieve by having the model reason its way down a document's own table-of-contents tree to the relevant sections, instead of embedding chunks and ranking them by vector similarity.

This pattern helps complete certain larger patterns —

  • used-byAgentic RAG★★Replace static retrieve-then-generate with autonomous agents that plan, choose sources, retrieve iteratively, reflect, and re-query.

Context

A team answers questions over long, structured professional documents — financial filings, contracts, regulatory manuals, technical specifications — where the source already carries a clear hierarchy of parts, sections, and subsections. The standard retrieval-augmented pipeline splits each document into fixed-size chunks, embeds them, and at query time returns the chunks whose embeddings sit closest to the query in vector space. On these documents that pipeline keeps surfacing passages that look similar to the question but are not the ones that answer it, and chunk boundaries cut tables, clauses, and definitions in half.

Problem

Vector similarity is a proxy for relevance, and on long professional documents the proxy breaks down: the passage that repeats the query's words is often not the passage that answers it, while the passage that does answer it shares little surface vocabulary. Fixed-size chunking compounds the mismatch by severing the structure the document relied on to make sense — a number is separated from the line item it belongs to, a clause from the term it defines. The retrieved context is also opaque: the system returns vector hits with no account of why this span and not another, so an analyst cannot audit the retrieval and the generator inherits whatever the embedding happened to rank highest.

Forces

  • Surface similarity and topical relevance diverge on dense, jargon-heavy documents, yet similarity is what embeddings measure.
  • Chunking is needed to fit an embedding window but destroys the document's own structure.
  • Reasoning over structure is more accurate but spends an LLM call per navigation step, where a vector lookup is a single cheap nearest-neighbour query.
  • Retrieval that cannot be explained cannot be audited, which matters most in the regulated domains where these documents live.

Example

An analyst asks a question over a 200-page annual report — what was the year-over-year change in operating cash flow, and what did management attribute it to? A chunk-and-embed pipeline returns several passages that mention operating cash flow but not the one tying the figure to management's explanation, because that explanatory paragraph shares little vocabulary with the question. The team switches to a vectorless approach: the report is parsed into its own table-of-contents tree, and the model reads the section titles, descends into the cash-flow statement and then the management-discussion section, and returns those two sections with their page numbers. The answer is grounded in named sections the analyst can open and check, and no chunk boundary split the figure from its surrounding line items.

Diagram

Solution

Therefore:

At index time, parse the document into a tree that mirrors its natural structure — parts, sections, subsections — and write a short summary at each node, keeping the leaf text intact rather than splitting it into fixed-size chunks. No embeddings are computed and no vector store is built. At query time, present the model with the tree as a table of contents and have it judge which branch is most likely to hold the answer, descend into that node, and repeat — a tree search in which the model, not a similarity score, decides each step. The walk ends at the leaf sections the model judges relevant, and retrieval returns those sections together with their page and section identifiers, so every result is traceable to a named location in the source. Compose with a generator that reads the returned sections, and with citation-attribution since the page and section references are already in hand.

What this pattern forbids. Retrieval may only return sections the model reaches by reasoning down the document tree; a passage the walk never descends into is not retrievable, and there is no similarity-ranked fallback over the whole corpus.

And the patterns that stand alongside it, or against it —

  • alternative-toNaive RAG★★Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
  • alternative-toHierarchical Retrieval★★Route a query through a multi-level cascade — coarse source or index selection, then per-source narrower retrieval, then chunk-level — so each retrieval decision is pushed to the cheapest tier that can answer it.
  • alternative-toGraphRAGBuild an LLM-extracted entity-and-relation knowledge graph plus hierarchical community summaries, then answer global queries via map-reduce over those summaries.
  • complementsCitation Attribution★★Track and surface, alongside a RAG-grounded answer, which retrieved chunks supported which claims, so the binding between answer span and source survives all the way to the user.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.