Guide

RAG Agent Patterns

Patterns for building retrieval-augmented generation agents: naive RAG, agentic RAG, hybrid search, cross-encoder reranking, contextual retrieval, HyDE, CRAG, Self-RAG, RAFT, GraphRAG, citation streaming.

A RAG agent is an LLM agent that grounds its answers in retrieved documents instead of relying on what the base model happens to remember. The patterns here describe how to do that defensibly: how to retrieve, how to rerank, how to verify, how to render citations that the user can check, and how to fall back when retrieval fails.

Naive RAG is the entry point — embed the query, fetch nearest neighbours, paste into the prompt — but production RAG is everything that surrounds it: hybrid search across vector and lexical indexes, cross-encoder reranking, contextual chunking, query rewriting (HyDE), corrective retrieval (CRAG), self-reflection on retrieval quality (Self-RAG), fine-tuning the model on retrieval-shaped tasks (RAFT), graph-aware retrieval (GraphRAG), and citations that stream as they resolve.

Field-tested patterns to start with

  • Agentic RAGReplace static retrieve-then-generate with autonomous agents that plan, choose sources, retrieve iteratively, reflect, and re-query.
  • Naive RAGCondition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
  • Hybrid SearchCombine sparse lexical retrieval (BM25) with dense vector retrieval and fuse the results.
  • Cross-Encoder RerankingAfter cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate).
  • Contextual RetrievalPrepend a short LLM-generated description to each chunk before embedding so the chunk carries its situating context.
  • HyDEHave the LLM write a hypothetical answer document, embed it, and use it as the retrieval query.
  • CRAGAdd a lightweight retrieval evaluator that grades each retrieved document and triggers corrective web search on poor retrievals.
  • Self-RAGFine-tune the model to emit reflection tokens that decide when to retrieve, evaluate retrieved relevance, and assess generated support.
  • RAFTTrain the model to ignore irrelevant retrieved documents (distractors) in a domain-specific RAG setting.
  • GraphRAGBuild an LLM-extracted entity-and-relation knowledge graph plus hierarchical community summaries, then answer global queries via map-reduce over those summaries.
  • Citation StreamingStream citations alongside generated text so the UI can render source links in place as content appears.
  • Chain of VerificationReduce hallucination by drafting an answer, generating independent verification questions, answering them in isolation, and revising.

Recommended reading

Or open the full contents for all 421 patterns in 14 books.

Related guides

  • LLM Agent Design PatternsA GoF-formal catalog of LLM agent design patterns: ReAct, tool use, plan-and-execute, reflection, step budget, and more. Each pattern decom…
  • Agentic AI ArchitectureHow to structure agentic AI: the architectural patterns that hold an LLM-powered system together. Supervisor, orchestrator-workers, augment…
  • Multi-Agent PatternsPatterns for coordinating multiple LLM agents: supervisor, orchestrator-workers, handoff, debate, hierarchical agents, swarm, role assignme…
  • AI Agent Safety PatternsSafety patterns for LLM agents: step budget, kill switch, constitutional charter, approval queue, sandbox isolation, input/output guardrail…

About this catalog

The Agent Patterns Catalog is an open, GoF-formal reference of 421 design patterns for building LLM agents. Each pattern is decomposed in the manner of Christopher Alexander (1977) and the Gang of Four (1994). Source of truth at github.com/agentpatternscatalog/patterns — CC BY 4.0.

Open the contents