Retrieval & RAG

Modular RAG

Decompose RAG into a typed three-layer hierarchy of Module Types, Modules, and Operators so the pipeline (routing, scheduling, fusion, retrieval, post-retrieval, generation) can be rearranged per query rather than running a fixed linear retrieve-then-generate.

Problem

A fixed Naive RAG pipeline is too rigid for heterogeneous workloads: every retrieval flows through the same retrieve-rerank-generate stages regardless of query shape, paying the worst-case cost on every request. Forking the pipeline per query type duplicates code, splits operational metrics across pipelines, and loses the ability to share modules. There is no contract between stages, so swapping a reranker, adding a query rewriter, or routing between corpora requires touching the pipeline orchestration directly.

Solution

Define six Module Types covering the RAG lifecycle (Indexing, Pre-Retrieval, Retrieval, Post-Retrieval, Generation, Orchestration). Within each, name concrete Modules (e.g. under Pre-Retrieval: Query Rewriting, HyDE, Decomposition). Implement each Module from typed Operators (atomic, swappable steps). At request time, an Orchestration Module assembles a pipeline by picking one Module per stage, possibly with branching, conditional routing, and fusion. Modules expose a typed input/output contract so any compatible Module can swap in; new modules ship without touching orchestration.

When to use

  • The query mix is heterogeneous enough that one linear pipeline overpays on the easy queries.
  • Multiple RAG pipelines have started to fork and share their modules informally.
  • The team wants per-query routing, fusion, or conditional branching as first-class concerns.
  • Module-level eval (recall per Module, cost per Module) is more useful than pipeline-level eval.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related