RAG Agent Patterns
Patterns for building retrieval-augmented generation agents: naive RAG, agentic RAG, hybrid search, cross-encoder reranking, contextual retrieval, HyDE, CRAG, Self-RAG, RAFT, GraphRAG, citation streaming.
A RAG agent is an LLM agent that grounds its answers in retrieved documents instead of relying on what the base model happens to remember. The patterns here describe how to do that defensibly: how to retrieve, how to rerank, how to verify, how to render citations that the user can check, and how to fall back when retrieval fails.
Naive RAG is the entry point — embed the query, fetch nearest neighbours, paste into the prompt — but production RAG is everything that surrounds it: hybrid search across vector and lexical indexes, cross-encoder reranking, contextual chunking, query rewriting (HyDE), corrective retrieval (CRAG), self-reflection on retrieval quality (Self-RAG), fine-tuning the model on retrieval-shaped tasks (RAFT), graph-aware retrieval (GraphRAG), and citations that stream as they resolve.
Field-tested patterns to start with
- Agentic RAG — Replace static retrieve-then-generate with autonomous agents that plan, choose sources, retrieve iteratively, reflect, and re-query.
- Naive RAG — Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
- Hybrid Search — Combine sparse lexical retrieval (BM25) with dense vector retrieval and fuse the results.
- Cross-Encoder Reranking — After cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate).
- Contextual Retrieval — Prepend a short LLM-generated description to each chunk before embedding so the chunk carries its situating context.
- HyDE — Have the LLM write a hypothetical answer document, embed it, and use it as the retrieval query.
- CRAG — Add a lightweight retrieval evaluator that grades each retrieved document and triggers corrective web search on poor retrievals.
- Self-RAG — Fine-tune the model to emit reflection tokens that decide when to retrieve, evaluate retrieved relevance, and assess generated support.
- RAFT — Train the model to ignore irrelevant retrieved documents (distractors) in a domain-specific RAG setting.
- GraphRAG — Build an LLM-extracted entity-and-relation knowledge graph plus hierarchical community summaries, then answer global queries via map-reduce over those summaries.
- Citation Streaming — Stream citations alongside generated text so the UI can render source links in place as content appears.
- Chain of Verification — Reduce hallucination by drafting an answer, generating independent verification questions, answering them in isolation, and revising.
Recommended reading
- Retrieval & RAG — 17 patterns
- Verification & Reflection — 27 patterns
Or open the full contents for all 421 patterns in 14 books.
Related guides
- LLM Agent Design Patterns — A GoF-formal catalog of LLM agent design patterns: ReAct, tool use, plan-and-execute, reflection, step budget, and more. Each pattern decom…
- Agentic AI Architecture — How to structure agentic AI: the architectural patterns that hold an LLM-powered system together. Supervisor, orchestrator-workers, augment…
- Multi-Agent Patterns — Patterns for coordinating multiple LLM agents: supervisor, orchestrator-workers, handoff, debate, hierarchical agents, swarm, role assignme…
- AI Agent Safety Patterns — Safety patterns for LLM agents: step budget, kill switch, constitutional charter, approval queue, sandbox isolation, input/output guardrail…
About this catalog
The Agent Patterns Catalog is an open, GoF-formal reference of 421 design patterns for building LLM agents. Each pattern is decomposed in the manner of Christopher Alexander (1977) and the Gang of Four (1994). Source of truth at github.com/agentpatternscatalog/patterns — CC BY 4.0.