Retrieval & RAG

Contextual Retrieval

Prepend a short LLM-generated description to each chunk before embedding so the chunk carries its situating context.

Problem

When a user query names an entity by its full name and the corpus chunk that contains the answer only refers to that entity by pronoun, vector search finds the chunk distant and misses it. A naive chunk-and-embed pipeline therefore destroys exactly the context it most needs to preserve, and recall on otherwise-easy queries collapses. The chunks need to carry enough surrounding context that their embeddings stay close to the queries that should retrieve them, without inflating the corpus so much that indexing and retrieval cost become unaffordable.

Solution

For each chunk, prompt an LLM with the parent document and the chunk; receive a short description that situates the chunk. Prepend that description to the chunk. Embed the prepended chunk. Store BM25 over both prepended chunks (Contextual BM25) and dense vectors (Contextual Embeddings). Compose with reranking for further gains.

When to use

Naive chunking destroys context and queries miss chunks that refer to entities by pronoun or shorthand.
An LLM pass over each chunk to produce a situating description is affordable at index time.
BM25 over prepended chunks and dense embeddings can both be wired into the retrieval stack.

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Problem

Solution

When to use

Related