Retrieval & RAG

HyDE

Have the LLM write a hypothetical answer document, embed it, and use it as the retrieval query.

Problem

Short queries embed far from long-form passages in the dense vector space because their length and style differ so much from the source text. Without supervised relevance pairs, the team cannot fine-tune a query encoder to close this gap, and zero-shot dense retrieval recall on short queries stays poor. They need a way to translate the user's short query into something that lives in the same neighbourhood of the embedding space as the target passages, using only the resources they already have on hand.

Solution

On query: prompt the LLM to draft a hypothetical answer to the query. Embed the hypothetical answer. Retrieve top-k by similarity to that embedding (not the original query). Pass the retrieved chunks into normal RAG.

When to use

Short user queries underperform on dense retrieval against long documents.
An LLM call to draft a hypothetical answer fits the latency and cost budget.
Recall on the first stage of RAG is the current bottleneck.

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Problem

Solution

When to use

Related