HippoRAG

also known as Hippocampus-Indexed Retrieval, PPR-over-LLM-KG, 海马体启发的检索增强生成

Build an LLM-extracted schemaless knowledge graph from the corpus and run Personalized PageRank seeded on the query's key concepts so multi-hop retrieval completes in a single pass.

This pattern helps complete certain larger patterns —

specialisesNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.

Context

A team runs RAG over a corpus where the answer to many queries lives across several documents that share entities or relations rather than vocabulary. Multi-hop questions — 'which Stanford professor co-authored a paper with someone now at DeepMind on RLHF?' — require crossing edges in entity space, not just embedding similarity. Iterative retrieve-then-reason loops do work but pay an LLM call per hop and lose context between hops.

Problem

Single-query dense retrieval lands in one embedding neighbourhood and cannot follow entity-mediated chains across documents. Iterative agentic retrieval reaches the answer but costs an LLM call per hop and the agent has no global view of the graph that connects passages. Community-summary approaches such as GraphRAG handle global queries via map-reduce over pre-built summaries, but their cost and latency are dominated by the summary build and they do not naturally surface a tight path between two concrete entities.

Forces

Multi-hop answers depend on entity-mediated paths the embedding similarity flattens away.
Iterative agentic retrieval costs one LLM call per hop and drifts off-topic.
Pre-building dense community summaries is expensive and re-runs on corpus updates.
Graph construction quality bounds retrieval quality; bad NER means bad recall.

Example

A research assistant gets the query 'which professor at Stanford co-authored an RLHF paper with someone now at DeepMind?' A single embedding lookup retrieves vaguely-related ML papers but cannot follow the author edges. HippoRAG's offline extraction has produced a graph with Person, Affiliation, and Authored-Paper nodes. The query's key concepts ('Stanford', 'DeepMind', 'RLHF') seed Personalized PageRank, which propagates mass through Authored-Paper edges and lands on the specific paper plus its two co-author nodes. The generator answers from those passages in a single retrieval pass.

Diagram

flowchart TD subgraph Offline C[Corpus] --> E[LLM triple extraction] E --> G[Schemaless KG hippocampal index] end Q[Query] --> K[LLM key-concept extraction] K --> S[Seed PPR on graph nodes] G --> S S --> P[Personalized PageRank traversal] P --> R[Top-N passages by PPR mass] R --> Gen[Generator]

Solution

Therefore:

Offline, prompt an LLM to extract (subject, predicate, object) triples from each passage and store the resulting schemaless graph alongside per-node passage pointers — this is the artificial hippocampal index. At query time, extract the query's key concepts (also via LLM), seed Personalized PageRank on the corresponding graph nodes, run PPR to propagate relevance through entity-mediated edges, and surface the top passages by aggregated PPR mass. Pass the surfaced passages forward to the generator, optionally through a reranker.

What this pattern forbids. Retrieval cannot rely on the query embedding alone; relevance is propagated through the LLM-extracted entity graph via Personalized PageRank, and passages with no graph anchor are unreachable.

And the patterns that stand alongside it, or against it —

alternative-toGraphRAG★— Build an LLM-extracted entity-and-relation knowledge graph plus hierarchical community summaries, then answer global queries via map-reduce over those summaries.
composes-withCross-Encoder Reranking★★— After cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate).
complementsKnowledge Graph Memory★— Persist agent memory as entities and relations in a structured graph so symbolic queries (path, neighbour, type) become possible.
alternative-toHierarchical Retrieval★★— Route a query through a multi-level cascade — coarse source or index selection, then per-source narrower retrieval, then chunk-level — so each retrieval decision is pushed to the cheapest tier that can answer it.