Hybrid Search
also known as BM25 + Dense, Lexical + Semantic Retrieval
Combine sparse lexical retrieval (BM25) with dense vector retrieval and fuse the results.
This pattern helps complete certain larger patterns —
- specialisesNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
Context
A team is running a retrieval pipeline over a corpus where the user queries fall into two very different shapes. Some queries are short and exact, hinging on matching specific identifiers, product codes, person names, or technical terms verbatim. Other queries are longer and rely on semantic similarity between paraphrased ideas, where the surface vocabulary may differ between query and source. A single retrieval method serves only one of these well.
Problem
Dense vector retrieval handles paraphrase and semantic similarity but misses queries that hinge on an exact identifier the embedding has flattened away. Sparse keyword retrieval — BM25 and similar lexical methods — handles exact terms but misses paraphrased queries whose vocabulary does not overlap with the source text. Picking either method alone means leaving recall on the table for whichever query shape was not chosen, and no downstream re-ranking stage can rescue a chunk that was never retrieved in the first place.
Forces
- Score fusion (RRF, weighted sum, learned) is a design choice.
- Two indexes mean two pipelines to maintain.
- Tuning fusion weights is empirical and corpus-specific.
Example
A coding-assistant searches its codebase for 'how do we authenticate with Stripe?' Pure semantic search misses files that mention 'stripe-api-key' verbatim; pure keyword search misses files that talk about 'payment processor authentication'. Hybrid search runs both at once: a keyword scorer catches the exact tokens, an embedding scorer catches the conceptual matches, and a fusion step blends the two ranked lists.
Diagram
Solution
Therefore:
Index the corpus twice: BM25 for sparse, dense embeddings for semantic. At query time, retrieve top-k from each, fuse with Reciprocal Rank Fusion or weighted aggregation. Pass the fused top-N forward (typically into a reranker). Do not weight raw scores directly; use rank-based fusion (RRF) or score-normalised aggregation, since BM25 and dense scores live on incompatible scales.
What this pattern forbids. The retrieval set is the fusion of sparse and dense top-k; neither alone is the input to downstream stages.
And the patterns that stand alongside it, or against it —
- composes-withCross-Encoder Reranking★★— After cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate).
- composes-withContextual Retrieval★— Prepend a short LLM-generated description to each chunk before embedding so the chunk carries its situating context.
- composes-withQuery Rewriting★★— Use an LLM to generate several alternative formulations of the user's query, retrieve documents for each, and rank-fuse the results so recall does not depend on one phrasing.
- composes-withModular RAG★— Decompose RAG into a typed three-layer hierarchy of Module Types, Modules, and Operators so the pipeline (routing, scheduling, fusion, retrieval, post-retrieval, generation) can be rearranged per query rather than running a fixed linear retrieve-then-generate.
- composes-withHierarchical Retrieval★★— Route a query through a multi-level cascade — coarse source or index selection, then per-source narrower retrieval, then chunk-level — so each retrieval decision is pushed to the cheapest tier that can answer it.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.