Semantic-Layer Query Guardrail
also known as Semantic-Layer SQL Guardrail, Metric-Layer Query Routing, Vetted-Metric Query Guardrail
Route natural-language data questions through a curated semantic layer so the model selects and parameterises vetted metrics and dimensions instead of free-authoring raw SQL against production data.
Context
An agent answers natural-language questions over a production data warehouse — revenue last quarter, active users by region, churn for a cohort. Free-form text-to-SQL is the obvious mechanism, but in practice it rarely survives contact with a real schema: column names are cryptic, business terms are organisation-specific, and the same word means different things to different teams. A separate semantic layer — a metric store such as a dbt Semantic Layer, LookML, or Cube model — already encodes the vetted definitions of each metric and dimension and can compile a metric request into correct SQL on its own.
Problem
Natural language is ambiguous and business terminology is domain-specific, so the model does not know what a particular organisation means by revenue or active user, and a text-to-SQL system that lets the model author raw SQL will silently pick the wrong column, the wrong join, or the wrong filter and return a confidently incorrect number. Letting the model write arbitrary SQL against production also widens the action surface to every table the credential can reach. The system needs the model's language understanding to map the question onto a metric, without granting it the authority to define what that metric is or to author the query that computes it.
Forces
- Free-form text-to-SQL covers any question the schema can express, but a single misread column or join yields a wrong answer that looks authoritative and is hard to catch.
- Business definitions such as revenue or active user are organisation-specific and change over time; encoding them once in a shared layer is correct, while re-deriving them in each generated query is fragile.
- Arbitrary model-authored SQL against production widens the blast radius to every table the credential can read; a fixed catalogue of metrics narrows it.
- A semantic layer must be modelled and maintained up front, which is real work and cannot answer a question whose metric nobody has defined yet.
Example
A business-intelligence agent is asked 'what was active-user revenue in EMEA last quarter?'. Instead of writing SQL, it maps the question onto two pre-defined metrics — active_users and revenue — sets the region dimension to EMEA and the time grain to the prior quarter, and emits a structured metric request. The dbt Semantic Layer validates the request, compiles it to the correct SQL using the team's vetted definitions of active user and revenue, runs it, and returns the figures. When someone later asks for a metric nobody has modelled, the agent reports that the metric is undefined rather than improvising a query.
Diagram
Solution
Therefore:
Model the metrics, dimensions, and business definitions once in a semantic layer and expose querying it as the agent's only data tool. The agent's job shrinks from authoring SQL to selecting: it maps the natural-language question onto one or more pre-defined metrics, picks the dimensions to group by, and fills in time grain and filter parameters, emitting a structured metric request rather than a SQL string. The semantic layer validates that request against its schema and deterministically compiles it to SQL — resolving the canonical column, join path, and definition behind each metric — then executes it and returns the result. Because the definitions live in the layer, every consumer computes revenue the same way; because the agent cannot author raw SQL, it cannot reach a table outside the modelled catalogue or invent a definition. A question whose metric is not yet modelled is refused or escalated to extend the layer rather than answered by improvised SQL.
What this pattern forbids. The agent may not author or execute raw SQL against production data; it may only select from and parameterise metrics and dimensions defined in the semantic layer, and a question whose metric is not modelled must be refused or escalated rather than answered by improvised SQL.
And the patterns that stand alongside it, or against it —
- alternative-toTable-Augmented Generation·— Answer a natural-language question over a database in three stages — synthesise an executable query, run it against the data layer with model calls embedded in execution, then generate the answer from the result.
- complementsCanonical-Entity Grounding★— Require the agent to resolve every business identifier it uses — SKU, account, supplier, customer — through an authoritative lookup against the system of record, rather than emitting the identifier from the model's parametric memory.
- complementsRisk-Tiered Action Autonomy★— Set an agent's permitted action class by the financial materiality of the action, letting it read and draft freely while requiring a different human principal to release material postings, payments, or filings.
- complementsTool Discovery★— Let the agent discover available tools at runtime rather than hardcoding the tool list at agent build time.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.