Hierarchical Retrieval
Route a query through a multi-level cascade — coarse source or index selection, then per-source narrower retrieval, then chunk-level — so each retrieval decision is pushed to the cheapest tier that can answer it.
Problem
Flat retrieval over a single union index pays the cost of querying everything for every question, even when most sources are irrelevant. Fanning out to every retriever in parallel is even worse: latency stacks, costs multiply, and the downstream reranker has to filter noise from sources the query never needed. At the same time, retrieving at one fixed granularity (always paragraphs, or always full documents) mismatches half of the query mix; some questions want a corpus-level answer and some want a single span. The team needs a way to spend retrieval budget proportional to how much routing the query actually requires.
Solution
Index the corpus hierarchically: a parser builds parent-child relationships (document → section → chunk, or topic-cluster → document → chunk) and stores both levels. At query time, a top-level router picks the source or sub-index that matches the query (by classifier, by embedding similarity to source summaries, or by an LLM call). The selected source runs its own retriever, optionally a further router or a coarse-to-fine descent (retrieve summaries, then retrieve the children of the top-ranked summaries). The chunk-level retriever returns the final candidates. Compose with cross-encoder reranking on the final candidate set; compose with hybrid search inside each leaf retriever.
When to use
- Several distinct corpora or indexes exist and only a subset is relevant to any given query.
- Documents have meaningful parent-child structure (chapter, section, paragraph) worth preserving in the index.
- Query mix spans different granularities — some questions are summary-level, some are span-level.
- Retrieval cost or latency makes querying every source in parallel impractical.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.