Table-Augmented Generation

also known as TAG, Query-Synthesis-Execute-Generate, LLM-in-SQL Querying

Answer a natural-language question over a database in three stages — synthesise an executable query, run it against the data layer with model calls embedded in execution, then generate the answer from the result.

Context

An analyst asks a question of structured data that neither plain Text2SQL nor document retrieval can satisfy alone. Text2SQL handles only questions expressible in relational algebra, and retrieval over chunks handles only point lookups of a few records. Many real questions need both the scalable computation of the data system and the world knowledge and semantic judgement of a language model applied to the rows themselves — for example ranking suppliers by a quality the schema never recorded.

Problem

A question such as 'which of last quarter's outage incidents were caused by a vendor misconfiguration' cannot be expressed as pure relational algebra, because deciding whether a free-text postmortem describes a vendor misconfiguration is a semantic judgement, not a column comparison. Pushing the whole table into a prompt does not scale, and a single SQL query cannot reason over the prose. The system needs to combine exact, scalable computation over the rows with per-row semantic reasoning, without loading the table into context or flattening the question into a lookup.

Forces

Relational engines compute exactly and scale to large tables, but cannot answer questions whose predicates require world knowledge or judgement over free text.
A language model can judge and reason over a row's content, but reasoning over every row in context does not scale and is costly.
Embedding the model inside query execution unlocks semantic predicates and aggregations, yet each embedded call adds latency, cost, and a new failure surface to the query plan.
Treating Text2SQL and retrieval as the only options leaves a large class of real questions unanswerable; a unified paradigm covers them but is harder to implement and evaluate.

Example

A support lead asks 'which of last quarter's outage incidents were caused by a vendor misconfiguration?'. The system writes a query that filters incidents to last quarter in the database, then calls the model on each incident's free-text postmortem to decide whether it describes a vendor misconfiguration, and finally reads the short list of matching incidents back to write the answer. The exact date filtering runs in the engine; the model only judges the prose of the rows that survive the filter.

Diagram

flowchart TD Q[Natural-language question] --> S[Synthesise executable query<br/>SQL + embedded LLM calls] S --> E[Execute on data engine] E --> R[Relational work: joins, filters, aggregation] E --> M[Per-row LLM call for semantic predicate] R --> RS[Result set] M --> RS RS --> G[Generate answer grounded in result set]

Solution

Therefore:

Decompose answering into three stages over a database. In query synthesis the model translates the natural-language question into an executable query — typically SQL extended with calls back to a language model, exposed as user-defined functions or semantic operators — so a clause like a relevance filter or a free-text classification becomes part of the plan rather than something done afterward in a prompt. In query execution the data engine runs that query: exact relational work (joins, filters, aggregation) stays in the engine where it scales, and the embedded model calls are evaluated per row or per group only where semantic judgement is required, so reasoning is pushed into the data layer instead of pulling the table into context. In answer generation the model reads the compact, computed result set and produces the response grounded in those rows. Text2SQL is the special case where synthesis emits pure relational algebra and execution needs no model calls; retrieval is the special case where the query is a point lookup of a few records — both fall out of the same paradigm.

What it gives you

Questions that need semantic reasoning over rows — not just relational algebra or point lookups — become answerable within one paradigm.
Exact, scalable computation stays in the data engine while the model is invoked only on the rows where judgement is needed, avoiding loading the full table into context.
Text2SQL and retrieval become special cases, so a single system covers a wider span of questions instead of switching architectures per question type.

What it costs you

Model calls embedded in query execution add latency and cost that grow with the number of rows the semantic predicate touches.
A non-deterministic call inside a query plan makes results harder to reproduce and the plan harder to optimise than pure SQL.
Few engines natively support model calls inside execution, so the paradigm often requires a custom runtime or extension layer.
Standard methods answer a small fraction of such questions correctly, so the synthesis and execution steps need careful evaluation before being trusted.

What this pattern forbids. The final answer must be grounded only in the result set the data engine returns; the model may not bypass query execution and answer from the table dumped into context, and the full table is never loaded into the prompt.

The smaller patterns that complete this one —

generalisesNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.

And the patterns that stand alongside it, or against it —

complementsVectorless Reasoning-Based Retrieval·— Retrieve by having the model reason its way down a document's own table-of-contents tree to the relevant sections, instead of embedding chunks and ranking them by vector similarity.
alternative-toAgentic RAG★★— Replace static retrieve-then-generate with autonomous agents that plan, choose sources, retrieve iteratively, reflect, and re-query.
complementsCanonical-Entity Grounding★— Require the agent to resolve every business identifier it uses — SKU, account, supplier, customer — through an authoritative lookup against the system of record, rather than emitting the identifier from the model's parametric memory.
alternative-toSemantic-Layer Query Guardrail★— Route natural-language data questions through a curated semantic layer so the model selects and parameterises vetted metrics and dimensions instead of free-authoring raw SQL against production data.