RAGFlow
Type: full-code · Vendor: InfiniFlow · Language: Python, TypeScript · License: Apache-2.0 · Status: active · Status in practice: mature
Open-source RAG engine that pairs deep document-understanding (DeepDoc) layout-aware parsing with an agentic, graph-orchestrated workflow runtime, MCP support, and an extensive citation/traceability surface.
Description. RAGFlow (78k+ GitHub stars, Apache-2.0) is InfiniFlow's open-source RAG engine. Its defining feature is DeepDoc — a layout-aware, template-driven document parser that handles PDF, Word, Excel, slides, scanned copies, structured data, and web pages with format-specific chunking strategies rather than naive splitting. From v0.8 onward RAGFlow became agentic: a graph-based workflow editor lets users compose retrieval + reasoning + tool stages, with MCP support, GraphRAG, and (from late 2025) memory for AI agents. Citations are first-class and traceable to source chunks with visualisations.
Agent loop shape. Document ingestion runs DeepDoc parsing → template-based chunking → embedding → indexing (with optional GraphRAG knowledge-graph construction). At query time, a no-code graph workflow defines retrieval + rerank + reasoning + tool stages. Each query enters the graph at the start node and traverses with branch/loop semantics. MCP exposes retrieval to external agents.
Primary use cases
- enterprise document QA over complex format mixes (PDF, Office, scans, structured data)
- agentic RAG with no-code workflow composition over retrieval + reasoning + tools
- high-fidelity citation rendering with chunk-level provenance to source
- MCP-server integration so external agents can call RAGFlow's retrieval as tools
Key concepts
- DeepDoc (docs) — Layout-aware, template-driven parser for complex document formats.
- Graph workflow editor → orchestrator-workers — No-code DAG for composing RAG + agent stages.
- Template-based chunking — Per-format chunking templates instead of naive token splits.
- Citation rendering → citation-attribution — Traceable chunk-level citations attached to answers.
Patterns this full-code implements —
- ★★Naive RAG
- ★★Agentic RAG
v0.8+ explicitly agentic.
- ★★Hybrid Search
- ★★Cross-Encoder Reranking
- ★GraphRAG
GraphRAG supported as a knowledge graph option.
- ★★Citation Attribution
Chunk-level citations with visualisation.
- ★★Model Context Protocol
MCP server exposes retrieval as tools.
- ★★Citation Streaming
- ★Contextual Retrieval
- ★★Query Rewriting
- ★★Event-Driven Agent
- ★★Orchestrator-Workers
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.