Retrieval & RAG

Query Rewriting

Use an LLM to generate several alternative formulations of the user's query, retrieve documents for each, and rank-fuse the results so recall does not depend on one phrasing.

Problem

A single query embedding samples only one point in the semantic space and retrieves only the chunks closest to that point. Relevant chunks expressed in different vocabulary, at a different specificity level, or framed as a different sub-question are missed entirely, and no downstream reranker can rescue a chunk that was never retrieved. The user's first phrasing is a noisy estimator of intent, and recall is bottlenecked by how well that one phrasing aligns with how the answer chunks were written.

Solution

At query time, prompt an LLM to produce N reformulations of the user's query (typically 3–5) covering paraphrase, decomposition into sub-questions, and specificity shifts. Retrieve top-k chunks for each variant in parallel. Fuse the result lists with Reciprocal Rank Fusion or a deduplicated union, then pass the fused top-N forward to the generator or to a downstream reranker. The original query is included as one of the variants so the system never does worse than a single-query baseline.

When to use

Users under-specify or phrase queries in vocabulary that does not echo the corpus.
Queries are compound and naturally decompose into sub-questions answered by different chunks.
Single-query retrieval shows recall@k below the level a reranker could rescue.
Latency budget can absorb one LLM call plus N parallel retrievals.

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Problem

Solution

When to use

Related