IV · Retrieval & RAGEmerging

Repo Map

also known as Repository Map, Structural Code Context

Give the agent a compact, ranked map of the codebase's symbols and their dependencies so it orients on what matters before reading any files.

Context

A coding agent must work in a repository far larger than its context window — thousands of files, deep dependency chains, conventions spread across modules. It cannot read everything, and keyword search finds matching text but says nothing about which symbols are important or how they connect. Dumping whole files wastes the window on code the task never touches.

Problem

Without a structural overview the agent explores blindly: it greps, opens files at random, and misses the few symbols that actually govern the change, while burning context on irrelevant code. It needs a high-signal summary of the repository's structure — which symbols exist and how they depend on each other — small enough to fit the window yet ranked so the important parts survive truncation.

Forces

  • A repository never fits the context window, but blind keyword search is structure-blind and whole-file dumps are wasteful.
  • A static structural map costs precompute and goes stale as the agent edits the code.
  • Ranking by importance keeps the map small, but the ranking signal must be cheap to compute over a large graph.

Example

An agent is asked to add rate limiting to an API server with 800 source files. Instead of searching for the word 'limit', it first reads a repo map that ranks Server.handle, Router.dispatch, and Middleware.apply as the most central symbols. The map shows that requests flow through Middleware.apply, so the agent opens just that file and its two callers, makes the change, and never loads the other 795 files.

Diagram

Solution

Therefore:

Parse the repository with a language-aware parser such as tree-sitter into symbols (functions, classes, methods) and the call and import edges between them. Run a centrality measure like PageRank over that graph to score each symbol's importance, optionally biased toward files the current task mentions. Render the top-ranked symbols and their signatures as a compact text map and place it in the agent's context as orientation, refreshing it as the working tree changes. The agent reads the map first, then opens only the files the map points to.

What this pattern forbids. The agent does not browse the repository blind; it must consult the ranked structural map first and may only open files the map surfaces as relevant, rather than dumping whole files or relying on keyword search alone.

And the patterns that stand alongside it, or against it —

  • alternative-toHierarchical Retrieval★★Route a query through a multi-level cascade — coarse source or index selection, then per-source narrower retrieval, then chunk-level — so each retrieval decision is pushed to the cheapest tier that can answer it.
  • alternative-toGraphRAGBuild an LLM-extracted entity-and-relation knowledge graph plus hierarchical community summaries, then answer global queries via map-reduce over those summaries.
  • complementsFilesystem as ContextUse the filesystem as the agent's externalized working memory, writing plans, notes, and large tool outputs to files, dropping them out of the live window, and re-reading on demand.
  • complementsCode-as-Action AgentHave the agent emit a code snippet as its action each step, executed in a constrained interpreter, instead of emitting JSON tool calls; tool composition becomes function nesting and control flow inside the snippet.
  • complementsContext-Driven Architecture DriftAnti-pattern: let a coding agent change a brownfield codebase guided only by the files it can see, so it silently violates the architecture conventions that live in nobody's machine-readable form.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.