Over-Search and Under-Search
also known as Retrieval-Frequency Miscalibration
Anti-pattern: let an agentic RAG system miscalibrate when to retrieve, so it either re-retrieves information already in context or skips retrieval when its parametric knowledge is stale.
Context
An agent has search-as-tool wired into its loop and decides at each step whether to invoke retrieval. The decision policy is implicit — it falls out of the prompt and the model's general disposition rather than from a calibrated signal. The team measures end-to-end task accuracy and tool-call counts, but not whether each individual retrieval was warranted.
Problem
The agent re-retrieves passages it has already seen in the same context window (over-search), burning tokens and latency on duplicates, and it skips retrieval when its parametric knowledge is wrong (under-search), producing confident hallucinations. Both failures are invisible at the aggregate metric level — accuracy averages can stay flat while individual queries either pay for the same passage four times or get answered from stale weights. The HiPRAG paper measures over-search at double-digit baseline rates in standard agentic-RAG setups, with under-search rates rising under reinforcement-learning training that rewards short trajectories.
Forces
- Naive policies (always retrieve, never retrieve) are easy; calibrated policies require a learned or rule-based decision signal.
- End-to-end accuracy hides retrieval miscalibration because the agent can still arrive at correct answers via expensive or lucky paths.
- Token cost and latency from over-search compound silently; hallucinations from under-search are noticed only when a downstream check catches them.
Example
A team ships an agentic-RAG assistant that reaches 82% accuracy on its eval. After three months of complaints about latency and cost, they instrument per-step retrieval decisions. They find the agent re-retrieves the same three policy documents on every fourth turn (over-search at 31%), and on 9% of queries it skips retrieval entirely and answers from stale parametric knowledge (under-search). They add a process-reward signal that penalises duplicate retrieval and rewards retrieval when the model's confidence is low. Over-search drops to 4%, under-search to 2%, accuracy rises to 87%, and token cost per query falls by 38%.
Diagram
Solution
Therefore:
Don't ship agentic RAG without calibrated retrieval decisions. Adopt agentic-rag with explicit retrieval-decision instrumentation: per-step rewards that penalise redundant retrieval and reward retrieval when parametric knowledge is insufficient. Track over-search and under-search rates as first-class evaluation metrics. Compare against naive-rag (always retrieve) and naive-rag-first (RAG-by-reflex) as baselines — the goal is calibrated, not maximally agentic.
What this pattern forbids. No useful constraint; the missing constraint is a calibrated retrieval-decision policy with per-step measurement.
And the patterns that stand alongside it, or against it —
- alternative-toAgentic RAG★★— Replace static retrieve-then-generate with autonomous agents that plan, choose sources, retrieve iteratively, reflect, and re-query.
- complementsNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
- complementsNaive-RAG-First✕— Anti-pattern: reach for naive RAG before checking whether the knowledge actually needs retrieval.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.