Retrieval & RAG

Naive RAG

Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.

Problem

A bare language model has no access to information beyond what is baked into its weights, and any attempt to answer from parametric memory alone tends to hallucinate plausible-sounding answers, cannot cite a source, and cannot be updated without retraining. The team needs the model to pull relevant external knowledge in at query time, but doing so requires deciding how to chunk the corpus, how to index it, what to retrieve per query, and how to feed it into the prompt. Without that retrieval machinery, the model is stuck with what it already knew at training time.

Solution

Chunk the corpus. Embed each chunk with a dense encoder. At query time, embed the query, retrieve top-k by similarity, prepend chunks to the prompt, generate. The simplest production RAG pipeline.

When to use

  • Knowledge lives outside the model and must be conditioned on at query time.
  • Citations must be tied to retrieved sources, not invented from parameters.
  • A simple chunk-and-embed pipeline meets the recall and quality bar.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related