← All booksBook IV

Retrieval & RAG

Knowledge from outside parameters.

23 patterns in this book. · Updated 2026-06-14

Top 5 patterns in Retrieval & RAG by usage

↓ download as png

AGENT PATTERNS · BOOK IV · RETRIEVAL & RAG

Top 5 patterns by usage

agentpatternscatalog.org

Agentic RAG
a.k.a. Iterative RAG
Replace static retrieve-then-generate with autonomous agents that plan, choose sources, retrieve iteratively, reflect, and re-query.
×53 compositions
Hybrid Search
a.k.a. BM25 + Dense · Lexical + Semantic Retrieval
Combine sparse lexical retrieval (BM25) with dense vector retrieval and fuse the results.
×11 compositions
Naive RAG
a.k.a. Retrieval-Augmented Generation · Top-K Retrieve-and-Stuff
Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
×9 compositions
Cross-Encoder Reranking
a.k.a. Reranker · Two-Stage Retrieval
After cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate).
×6 compositions
GraphRAG
a.k.a. Graph-Based RAG · Knowledge Graph RAG
Build an LLM-extracted entity-and-relation knowledge graph plus hierarchical community summaries, then answer global queries via map-reduce…
×5 compositions

When to reach for each

01. Agentic RAG Replace static retrieve-then-generate with autonomous agents that plan, choose sources, retrieve iteratively, reflect, and re-query. Best for: A single retrieve-then-generate pass is insufficient for the task's information needs. Tradeoff: Cost and latency rise with loop iterations. Watch for: Static one-shot RAG already meets quality targets at lower cost and latency.

02. Hybrid Search Combine sparse lexical retrieval (BM25) with dense vector retrieval and fuse the results. Best for: Queries mix semantic intent with rare tokens (codes, IDs, proper nouns) that embeddings miss. Tradeoff: Two indexes to keep in sync. Watch for: The corpus is uniformly conceptual; dense alone is enough.

03. Naive RAG Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters. Best for: Knowledge lives outside the model and must be conditioned on at query time. Tradeoff: Chunk boundaries destroy context. Watch for: The needed knowledge is already in a tool, database, or scoped system prompt (see naive-rag-first).

04. Cross-Encoder Reranking After cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate). Best for: Initial retrieval returns a noisy top-100 and accuracy of top-5 matters. Tradeoff: Latency adds one call per candidate. Watch for: Latency target is sub-100ms end-to-end; cross-encoders blow it.

05. GraphRAG Build an LLM-extracted entity-and-relation knowledge graph plus hierarchical community summaries, then answer global queries via map-reduce over those summaries. Best for: Users ask global, corpus-wide questions that local chunk retrieval cannot answer. Tradeoff: High indexing cost (orders of magnitude more LLM calls). Watch for: Queries are narrowly local and naive RAG already serves them well.

Retrieval & RAG

Top 5 patterns by usage

Agentic RAG

Hybrid Search

Naive RAG

Cross-Encoder Reranking

GraphRAG

When to reach for each

All patterns in this book

Agentic RAG

Hybrid Search

Naive RAG

Cross-Encoder Reranking

GraphRAG

Citation Attribution

Contextual Retrieval

Repo Map

Query Rewriting

HippoRAG

Semantic Response Cache

Streaming Feature Pipeline

CDC-Driven Vector Sync

CRAG

HyDE

Modular RAG

RAFT

Hierarchical Retrieval

Self-RAG

Table-Augmented Generation

Vectorless Reasoning-Based Retrieval

Tacit-Knowledge Elicitation Agent

Dependency-Aware Skill Retrieval