HyDE
also known as Hypothetical Document Embeddings
Have the LLM write a hypothetical answer document, embed it, and use it as the retrieval query.
This pattern helps complete certain larger patterns —
- specialisesNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
Context
A team is using dense vector retrieval to find documents that match user queries, but the queries are short and underspecified — often a few words — while the passages in the corpus are long, well-formed, and written in a different style. The team also does not have labelled query-document relevance pairs that would let them train a query encoder to bridge the asymmetry.
Problem
Short queries embed far from long-form passages in the dense vector space because their length and style differ so much from the source text. Without supervised relevance pairs, the team cannot fine-tune a query encoder to close this gap, and zero-shot dense retrieval recall on short queries stays poor. They need a way to translate the user's short query into something that lives in the same neighbourhood of the embedding space as the target passages, using only the resources they already have on hand.
Forces
- Hallucinated documents that miss the topic redirect retrieval badly.
- Adds an LLM call per query.
- Often paired with reranking to recover from off-topic hallucinations.
Example
A documentation-search agent for a developer platform keeps missing relevant pages because users type three-word queries like 'rate limit auth' while the docs are written in long prose. The team adds HyDE: the LLM first drafts a hypothetical answer paragraph to the query, that paragraph is embedded, and retrieval runs against the answer-shaped embedding instead of the bare query. Recall on short queries jumps without changing the index, the encoder, or the docs.
Diagram
Solution
Therefore:
On query: prompt the LLM to draft a hypothetical answer to the query. Embed the hypothetical answer. Retrieve top-k by similarity to that embedding (not the original query). Pass the retrieved chunks into normal RAG.
What this pattern forbids. Retrieval queries the index with the hypothetical answer's embedding, not the user query's embedding.
And the patterns that stand alongside it, or against it —
- composes-withCross-Encoder Reranking★★— After cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate).
- alternative-toQuery Rewriting★★— Use an LLM to generate several alternative formulations of the user's query, retrieve documents for each, and rank-fuse the results so recall does not depend on one phrasing.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.