Methodology · RAG Constructionemergingverified

Tools-First, Then RAG

also known as check-the-knowledge-shape, anti-naive-RAG-first

Applies to: agentrag-systemllm-app

Tags: rag-decisionknowledge-shapeanti-naive-rag

Before you build RAG (retrieval-augmented generation), check whether the knowledge really needs search at all. Data that lives in a database or behind an API is best reached with a direct query or API call. A small, stable set of text can just be pasted into the prompt. Only large piles of unstructured text actually need search over embeddings. Reaching for RAG by default wastes effort and gives fuzzy answers in cases where a direct query would have been exact.

Methodology process overview

flowchart TD inv[Knowledge inventory] --> classify{Per source: shape?} classify -- structured store/API --> tool[Author typed tool: SQL / API / search] classify -- small stable corpus --> inline[Inline in system prompt or scoped doc] classify -- large unstructured --> rag[Build RAG on residual only] classify -- mixed --> split[Split the source] split --> classify tool --> eval[Measure end-to-end correctness on eval set] inline --> eval rag --> eval eval -- retriever beats tool? --> fixtool[Fix the tool, not the retriever] fixtool --> tool eval -- correct --> out[Per-source access-mechanism assignment]

Intent. Check what shape your knowledge is in before you choose search, then pick the simplest way to reach each source.

When to apply. Use this when you start any agent or app that answers from a body of knowledge: a support bot, an internal Q&A tool, a docs helper, a policy assistant. Run it before you commit to a vector store. Don't apply it if you already know the knowledge is large and unstructured, because then RAG is the right choice from the start.

Example scenario

A two-person team at an HR-tech company is building an internal 'people-ops copilot' for HR staff. The popular move at the time was 'load every HR document into a vector store and use RAG.' They stopped and ran this methodology instead. Their knowledge inventory listed: an HR database (employees, roles, managers, pay bands), a benefits-vendor API (current choices, claim status), a Confluence space of about 4,000 pages of HR policy, a 12-page employee handbook, and a Slack archive of staff Q&A. They sorted each one. The HR database became a SQL tool. The benefits vendor became an API tool. The 12-page handbook was pasted into the prompt (under 5k tokens). The Confluence policies were large and unstructured, so they became the candidate for RAG. The Slack archive was split: the structured tags went into a tool, and the free text stayed for RAG. They built three direct tools first (get_employee, get_benefits_elections, search_handbook_tags), and only then built a small RAG store over the Confluence subset and the Slack free text. When they ran their test set, one kind of question ('what's my PTO balance') was being answered better by RAG than by the get_employee tool. The search was pulling up a policy page with an example calculation, and the model was copying the example numbers. So they fixed the tool by adding a PTO field, rather than tuning the search. Correctness for that kind of question jumped from 71% to 99%. The vector store ended up about one-tenth the size of the 'everything in embeddings' design, and the 80% of questions handled by direct tools took a single round-trip instead of a retrieve-rerank-generate chain.

Inputs

Knowledge inventory — A list of every knowledge source the agent has to reach: databases, APIs, document stores, internal wikis, web pages.
Per-source schema or sample — For each source: how it is structured (tables, endpoints, fields), how big it is, how often it changes, and a few sample records.
User query distribution — A sample of the questions real users ask, not the ones the team imagines they ask.

Outputs

Access-mechanism assignment per source — For each source: how the agent will reach it (direct call, pasted-in text, search, or a mix) and why.
Tool catalogue — The set of direct calls (SQL, API, search) the agent uses to reach structured sources.
RAG corpus boundary — The leftover unstructured text, kept as small as possible, that will go through search.

Steps (6)

Inventory the knowledge
List every knowledge source the agent has to reach. For each one, note how it is structured, how big it is, and how often it changes. Don't move on until every source is on the list.
Sort each source by how you reach it
A database or typed API becomes a direct call. A small, stable set of text (under about 5k tokens) gets pasted into the prompt or loaded on demand. A large, unstructured pile of text is a candidate for search. A mixed source gets split apart.
Build the tools first
For every structured source, write a direct call: a SQL helper, an API client, or a search endpoint. The agent calls these straight away, with no embeddings involved.
usesTool Use
Paste in the small sets of text
Put small, stable sets of text straight into the prompt or a loaded document. There is no search round-trip and nothing gets lost.
Build RAG only for what's left
Whatever didn't fit the earlier steps — large, unstructured text — goes through search. By now the body of text is small and well-defined, so testing the search is easy to scope.
Measure correctness end to end
On your test set, compare the answers from direct calls against the answers from search for the same questions. If search wins for a question that has a structured source behind it, fix the tool rather than tuning the search.

Framework-specific instructions

Pick a framework and generate a framework-targeted rewrite of this methodology's steps.

Choose framework

AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.

Principles

The shape of the knowledge decides how you reach it. Pick the method that fits.
Direct calls return exact answers; search returns rough passages. Prefer exact answers when the source allows it.
RAG is the leftover option, not the default.
If search beats a direct call, the tool is wrong, not the rule.