Full-Code · Orchestration Frameworksactive

HippoRAG

Type: full-code  ·  Vendor: OSU NLP Group  ·  Language: Python  ·  License: Apache-2.0  ·  Status: active  ·  Status in practice: experimental

Links: homepage repo

Hippocampus-inspired RAG framework that builds a knowledge graph from documents and uses Personalized PageRank for multi-hop retrieval, replacing naive top-k vector search.

Description. HippoRAG is a research RAG framework from OSU NLP Group that draws on hippocampal indexing theory: documents are decomposed into entity-relation triples that form a knowledge graph, and retrieval runs Personalized PageRank from question entities across that graph. The shape targets multi-hop (associative) questions and sense-making over large corpora where naive dense retrieval fails. HippoRAG 2 (the current implementation) builds on the same PPR core and adds deeper passage integration and more effective online use of an LLM. It is distributed as a Python library and reference implementation accompanying the NeurIPS 2024 HippoRAG paper and the ICML 2025 HippoRAG 2 paper.

Agent loop shape. Two-phase pipeline. Offline indexing: documents are passed through an LLM to extract OpenIE-style entity-relation triples that are merged into a persistent knowledge graph with vector embeddings on nodes. Online retrieval: at query time the system extracts entities from the question, seeds Personalized PageRank from those entity nodes over the KG, and returns the top-scoring passages associated with the highest-ranked nodes for an LLM reader (the rag_qa step in the Python API).

Primary use cases

  • multi-hop (associative) question answering over a document corpus
  • knowledge-graph-augmented retrieval pipelines
  • sense-making over long / interconnected contexts
  • research baselines comparing graph retrieval to dense top-k and to other graph-RAG systems (GraphRAG, RAPTOR, LightRAG)

Key concepts

  • Hippocampal indexing theory hippocampus-rag (docs)Cognitive-neuroscience theory that the hippocampus stores sparse pointers (indices) into neocortical patterns, used as the design metaphor for HippoRAG's KG + retrieval split.
  • OpenIE knowledge graph graphrag (docs)Documents are decomposed by an LLM into entity-relation triples that are merged into a persistent knowledge graph with vector embeddings on nodes (the offline indexing phase).
  • Personalized PageRank retrieval hippocampus-rag (docs)At query time, entities extracted from the question seed Personalized PageRank over the KG; top-scoring nodes' associated passages are returned to the reader LLM.
  • Neocortex / hippocampus analogy hippocampus-rag (docs)The framework's design metaphor: the LLM plays the neocortex (pattern completion / language) and the KG + PPR plays the hippocampus (associative pointer index).
  • Associative / factual / sense-making memory (docs)HippoRAG 2 explicitly targets three memory regimes: associative (multi-hop), factual, and sense-making (integrating large complex contexts), reporting gains over baseline RAG on each.
  • HippoRAG Python API (docs)Library surface: HippoRAG(save_dir, llm_model_name, embedding_model_name).index(docs), .retrieve(queries), .rag_qa(queries) — combined or separate retrieval-and-QA calls.

Patterns this full-code implements

Provenance

  • Last analyzed:
  • Last updated:
  • Verification status: verified