VIII · Safety & ControlMature★★

Refusal

also known as Decline, Out-of-Scope Response

Explicitly refuse requests that fall outside the agent's scope, capability, or policy boundaries.

Context

A team runs an agent with a defined scope — customer support for a specific product, technical help in a specific domain, internal operations for a specific team — and real users will ask it things outside that scope: medical advice from a banking agent, legal interpretation from a coding assistant, competitor comparisons from a vendor's own bot. Some of these requests are simply off-topic; others are unsafe, regulated, or beyond what the model can reliably do.

Problem

A helpful-by-default agent answers these out-of-scope questions anyway, producing plausible-sounding but unauthorised content: a stock pick from a system that has no business giving one, a dosage suggestion from a tool that is not a medical device, a confident wrong answer in a domain the model has not been validated against. Silently routing such requests through the model also strips the user of the signal that the agent has a boundary. Without an explicit, kind refusal at the named boundary, the agent drifts into territory that erodes trust and exposes the operator.

Forces

  • Over-refusal frustrates users.
  • Under-refusal lands the agent in trouble.
  • Refusal text quality matters; templated refusals feel insulting.

Example

A customer-service agent for a bank starts being asked for stock picks, legal advice, and competitor comparisons. Helpful-by-default, it answers and gets the bank into hot water. The team defines refusal triggers (regulatory boundary, out-of-scope, capability gap) and a kind, specific refusal template that names the boundary and points to a human team. Out-of-scope replies stop being plausible-sounding hallucinations and start being short, clear handoffs.

Diagram

Solution

Therefore:

Define refusal triggers (policy violation, out-of-scope, capability gap, regulatory boundary). Return a clear, kind, specific refusal that names the boundary and (when possible) suggests an alternative. Log refusals for review.

What this pattern forbids. When triggers fire, the agent must refuse rather than attempt the task.

The smaller patterns that complete this one —

And the patterns that stand alongside it, or against it —

  • complementsInput/Output Guardrails★★Validate inputs before they reach the model and outputs before they reach the user.
  • conflicts-withCode-Switching-Aware AgentTreat mixed-language input (e.g. Hinglish in Roman script) as the expected shape, and design tokenisation, language tagging, and tool routing to handle it natively without forcing the user to commit to one language.
  • complementsPolicy-as-Code GateEvaluate every proposed agent action against externally-managed machine-readable policies before dispatch, so compliance authorship lives outside the prompt and outside the agent code.
  • complementsTyped Refusal CodesDefine a single source of truth for machine-readable refusal codes across all guard surfaces, so refusals can be triaged mechanically rather than by string-grepping ad-hoc human-readable messages.
  • complementsReflexive Metacognitive Agent·Agent maintains an explicit self-model of its own capabilities, confidence and limitations, and reasons over that model when accepting / refusing / handing off tasks.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

References

Provenance