Guide

RAG Agent Patterns

Patterns for building retrieval-augmented generation agents: naive RAG, agentic RAG, hybrid search, cross-encoder reranking, contextual retrieval, HyDE, CRAG, Self-RAG, RAFT, GraphRAG, citation streaming.

A RAG agent is an LLM agent that grounds its answers in retrieved documents instead of relying on what the base model happens to remember. The patterns here describe how to do that defensibly: how to retrieve, how to rerank, how to verify, how to render citations that the user can check, and how to fall back when retrieval fails.

Naive RAG is the entry point — embed the query, fetch nearest neighbours, paste into the prompt — but production RAG is everything that surrounds it: hybrid search across vector and lexical indexes, cross-encoder reranking, contextual chunking, query rewriting (HyDE), corrective retrieval (CRAG), self-reflection on retrieval quality (Self-RAG), fine-tuning the model on retrieval-shaped tasks (RAFT), graph-aware retrieval (GraphRAG), and citations that stream as they resolve.

Field-tested patterns to start with

Agentic RAG — Replace static retrieve-then-generate with autonomous agents that plan, choose sources, retrieve iteratively, reflect, and re-query.
Naive RAG — Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
Hybrid Search — Combine sparse lexical retrieval (BM25) with dense vector retrieval and fuse the results.
Cross-Encoder Reranking — After cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate).
Contextual Retrieval — Prepend a short LLM-generated description to each chunk before embedding so the chunk carries its situating context.
HyDE — Have the LLM write a hypothetical answer document, embed it, and use it as the retrieval query.
CRAG — Add a lightweight retrieval evaluator that grades each retrieved document and triggers corrective web search on poor retrievals.
Self-RAG — Fine-tune the model to emit reflection tokens that decide when to retrieve, evaluate retrieved relevance, and assess generated support.
RAFT — Train the model to ignore irrelevant retrieved documents (distractors) in a domain-specific RAG setting.
GraphRAG — Build an LLM-extracted entity-and-relation knowledge graph plus hierarchical community summaries, then answer global queries via map-reduce over those summaries.
Citation Streaming — Stream citations alongside generated text so the UI can render source links in place as content appears.
Chain of Verification — Reduce hallucination by drafting an answer, generating independent verification questions, answering them in isolation, and revising.

Related guides

AI Agents Patterns — AI agents patterns: named, reusable shapes for building AI agents that reason, use tools, coordinate, and stay safe — single-agent loops an…
AI Agents Patterns Catalog — The AI agents patterns catalog: a complete, GoF-formal pattern language for AI agents across reasoning, planning, tool use, retrieval, memo…
LLM Agent Design Patterns — A GoF-formal catalog of LLM agent design patterns: ReAct, tool use, plan-and-execute, reflection, step budget, and more. Each pattern decom…
Agentic Design Patterns — A GoF-formal catalog of agentic design patterns — named, reusable shapes for building autonomous AI agents: agent loops, tool use, planning…
Agentic AI Design Patterns — Agentic AI design patterns for systems already in production — what to ship, what to observe, what to budget, what to gate. Augmented LLM,…
AI Agent Design Patterns — How to build an AI agent: the named shapes you reach for during design and implementation — reasoning (ReAct, plan-and-execute, reflection)…
Agent Design Patterns — Agent design patterns treat the agent loop as a software-engineering primitive: an observe→reason→act cycle wrapped in tools, memory, super…
Agentic Patterns — A complete pattern language for agentic systems, organised in Alexander-style books across reasoning, planning, tool use, retrieval, verifi…
Agentic AI Architecture — How to structure agentic AI: the architectural patterns that hold an LLM-powered system together. Supervisor, orchestrator-workers, augment…
Multi-Agent Patterns — Patterns for coordinating multiple LLM agents: supervisor, orchestrator-workers, handoff, debate, hierarchical agents, swarm, role assignme…
AI Agent Safety Patterns — Safety patterns for LLM agents: step budget, kill switch, constitutional charter, approval queue, sandbox isolation, input/output guardrail…

About this catalog

The Agent Patterns Catalog is an open, GoF-formal reference of 527 design patterns for building LLM agents. Each pattern is decomposed in the manner of Christopher Alexander (1977) and the Gang of Four (1994). Source of truth at github.com/agentpatternscatalog/patterns — CC BY 4.0.

Open the contents

Field-tested patterns to start with

Recommended reading

Related guides

About this catalog