HippoRAG
Type: full-code · Vendor: OSU NLP Group · Language: Python · License: Apache-2.0 · Status: active · Status in practice: experimental
Hippocampus-inspired RAG framework that builds a knowledge graph from documents and uses Personalized PageRank for multi-hop retrieval, replacing naive top-k vector search.
Description. HippoRAG is a research RAG framework from OSU NLP Group that draws on hippocampal indexing theory: documents are decomposed into entity-relation triples that form a knowledge graph, and retrieval runs Personalized PageRank from question entities across that graph. The shape targets multi-hop (associative) questions and sense-making over large corpora where naive dense retrieval fails. HippoRAG 2 (the current implementation) builds on the same PPR core and adds deeper passage integration and more effective online use of an LLM. It is distributed as a Python library and reference implementation accompanying the NeurIPS 2024 HippoRAG paper and the ICML 2025 HippoRAG 2 paper.
Agent loop shape. Two-phase pipeline. Offline indexing: documents are passed through an LLM to extract OpenIE-style entity-relation triples that are merged into a persistent knowledge graph with vector embeddings on nodes. Online retrieval: at query time the system extracts entities from the question, seeds Personalized PageRank from those entity nodes over the KG, and returns the top-scoring passages associated with the highest-ranked nodes for an LLM reader (the rag_qa step in the Python API).
Primary use cases
- multi-hop (associative) question answering over a document corpus
- knowledge-graph-augmented retrieval pipelines
- sense-making over long / interconnected contexts
- research baselines comparing graph retrieval to dense top-k and to other graph-RAG systems (GraphRAG, RAPTOR, LightRAG)
Key concepts
- Hippocampal indexing theory → hippocampus-rag (docs) — Cognitive-neuroscience theory that the hippocampus stores sparse pointers (indices) into neocortical patterns, used as the design metaphor for HippoRAG's KG + retrieval split.
- OpenIE knowledge graph → graphrag (docs) — Documents are decomposed by an LLM into entity-relation triples that are merged into a persistent knowledge graph with vector embeddings on nodes (the offline indexing phase).
- Personalized PageRank retrieval → hippocampus-rag (docs) — At query time, entities extracted from the question seed Personalized PageRank over the KG; top-scoring nodes' associated passages are returned to the reader LLM.
- Neocortex / hippocampus analogy → hippocampus-rag (docs) — The framework's design metaphor: the LLM plays the neocortex (pattern completion / language) and the KG + PPR plays the hippocampus (associative pointer index).
- Associative / factual / sense-making memory (docs) — HippoRAG 2 explicitly targets three memory regimes: associative (multi-hop), factual, and sense-making (integrating large complex contexts), reporting gains over baseline RAG on each.
- HippoRAG Python API (docs) — Library surface: HippoRAG(save_dir, llm_model_name, embedding_model_name).index(docs), .retrieve(queries), .rag_qa(queries) — combined or separate retrieval-and-QA calls.
Patterns this full-code implements —
- ★HippoRAG
HippoRAG is the eponymous implementation of the hippocampus-inspired retrieval shape: a knowledge graph plus Personalized PageRank as the retrieval primitive, mirroring the hippocampal indexing theor…
- ★GraphRAG
HippoRAG is in the GraphRAG family (KG-augmented retrieval) but differs in retrieval primitive: Personalized PageRank over entity nodes rather than community summaries. The README explicitly position…