Agentic RAG

also known as Iterative RAG

Replace static retrieve-then-generate with autonomous agents that plan, choose sources, retrieve iteratively, reflect, and re-query.

This pattern helps complete certain larger patterns —

specialisesHierarchical Retrieval★★— Route a query through a multi-level cascade — coarse source or index selection, then per-source narrower retrieval, then chunk-level — so each retrieval decision is pushed to the cheapest tier that can answer it.

Context

A team is building a retrieval-augmented system to answer user questions over a corpus, but the questions are not all of one kind. Some are multi-hop, where the answer depends on facts from two or three different documents combined. Some are ambiguous, where the question itself does not pin down what is being asked. And the corpus or the user's information need is evolving over time. A single retrieve-once, generate-once pipeline cannot serve all of these reliably.

Problem

Naive retrieval-augmented generation runs one retrieval per question and feeds the top chunks straight into the generator. It cannot decide whether retrieval is even needed for a given question, cannot choose between several available sources, cannot tell when it has gathered enough evidence to stop, and has no path to recover when the retrieval comes back with poor or irrelevant chunks. Easy questions get pointless retrieval calls, multi-hop questions get partial answers, and bad retrievals quietly corrupt the output.

Forces

Agentic loops cost more than single-shot retrieval.
Source selection requires capability descriptions.
Loop bounds must prevent runaway retrieval.

Example

A consulting agent is asked, 'Compare our 2023 and 2024 revenue by region.' Naive RAG would do one search and pass whatever it found to the model. Agentic RAG instead runs in a loop: it queries the 2023 figures, decides it also needs 2024 figures, queries those, notices the EMEA numbers are missing, queries again with a more specific phrase, then produces the comparison from a complete set.

Diagram

flowchart TD Q[Query] --> A{Agent: retrieve?} A -- no --> ANS[Answer] A -- yes --> P[Pick retriever<br/>vector / graph / web] P --> R[Retrieve evidence] R --> E{Sufficient?} E -- no, reformulate --> P E -- yes --> ANS

Solution

Therefore:

Treat retrieval as a tool. The agent decides whether to retrieve, formulates and reformulates the query, picks among multiple retrievers (vector, graph, keyword, web), evaluates retrieved evidence, and re-queries on insufficient results. Composes naturally with reflection, planning, and tool-use patterns.

What this pattern forbids. Retrieval is one tool among many; the agent decides invocation, but each retrieval is bounded by the step budget.

The smaller patterns that complete this one —

generalisesNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
usesReAct★★— Interleave a single thought, a single tool call, and a single observation per step so the agent reasons over fresh evidence.
usesReflection★★— Have the model review its own output and produce a revised version in one or more passes.
usesTool Use★★— Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.
generalisesSelf-RAG★— Fine-tune the model to emit reflection tokens that decide when to retrieve, evaluate retrieved relevance, and assess generated support.
generalisesCRAG★— Add a lightweight retrieval evaluator that grades each retrieved document and triggers corrective web search on poor retrievals.
generalisesCo-Located Memory Surfacing·— Surface relevant persistent memories proactively when the human mentions a concrete entity the agent has prior knowledge of, so the human does not bear the burden of remembering to ask.
usesVectorless Reasoning-Based Retrieval·— Retrieve by having the model reason its way down a document's own table-of-contents tree to the relevant sections, instead of embedding chunks and ranking them by vector similarity.

And the patterns that stand alongside it, or against it —

composes-withCross-Encoder Reranking★★— After cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate).
alternative-toModular RAG★— Decompose RAG into a typed three-layer hierarchy of Module Types, Modules, and Operators so the pipeline (routing, scheduling, fusion, retrieval, post-retrieval, generation) can be rearranged per query rather than running a fixed linear retrieve-then-generate.
alternative-toOver-Search and Under-Search✕— Anti-pattern: let an agentic RAG system miscalibrate when to retrieve, so it either re-retrieves information already in context or skips retrieval when its parametric knowledge is stale.
complementsCDC-Driven Vector Sync★★— Treat the source-of-truth document store as the only writer; keep the vector index in sync by emitting change-data-capture events onto a queue that the feature pipeline consumes.
complementsTenant-Scoped Tool Binding★— Bind every tool call and retrieval to the active tenant in code at the execution layer, so a multi-tenant agent can never be talked into reading or writing another tenant's data.
complementsCanonical-Entity Grounding★— Require the agent to resolve every business identifier it uses — SKU, account, supplier, customer — through an authoritative lookup against the system of record, rather than emitting the identifier from the model's parametric memory.
complementsSemantic Response Cache★— Embed each query and, when its nearest cached neighbour is within a similarity threshold, return the stored answer instead of re-running the model so near-duplicate questions are answered cheaply.
alternative-toTable-Augmented Generation·— Answer a natural-language question over a database in three stages — synthesise an executable query, run it against the data layer with model calls embedded in execution, then generate the answer from the result.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

Production RAG
core

Used in frameworks

Show 42 more

References

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
paper

Provenance

Source: patterns/agentic-rag.md on GitHub · commit 4fa1213 · view history
Added to catalog: 2026-04-30
Last updated: 2026-05-22
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.