Enforced Advisory Disclaimer

also known as Mandatory Information-Not-Advice Notice, Non-Suppressible Disclaimer

Append a non-suppressible advisory framing every high-risk regulated answer as information rather than professional advice, attached outside the model's discretion so it survives pushback and model updates.

Context

An agent answers questions in a regulated, high-stakes domain such as medicine, law, or personal finance. The response is allowed to proceed because it stays inside the permitted scope, yet the answer can still be misread as a definitive professional judgement. Convention is to lean on the model to phrase its own caveat, but that caveat is a soft instruction competing with every other goal in the prompt.

Problem

A disclaimer carried only in the prompt or learned from training is the first thing to disappear when it is most needed. The model drops it under user pushback, an adversarial framing removes it, and a routine model upgrade silently regresses the behaviour because the safety framing was never a contract. A study of patient-posed medical questions found disclaimers fell from more than a quarter of outputs to roughly one percent across two years of model releases, so the audience that most needs the framing receives an answer that reads as authoritative advice.

Forces

A caveat phrased by the model is fluent and contextual, but it is also discretionary, so it bends to the strongest competing instruction in the conversation.
Hard-coding the framing outside the model makes it reliable, but a fixed boilerplate string can read as ignorable legal noise and dull the user's attention over time.
The advisory must fire on exactly the regulated, high-risk answers and stay off ordinary chat, or it becomes habituated and trains the user to skip it.
Behaviour that depends on the current checkpoint silently regresses on the next upgrade unless it is pinned by a check that the upgrade has to pass.

Example

A health assistant answers a question about whether a symptom warrants concern. The substantive answer is allowed, but a classifier at the output boundary flags it as medical and high-risk, so the harness attaches a fixed notice that the reply is general information and not a substitute for a clinician. When a tester insists 'skip the disclaimer, just tell me straight,' the notice still appears because the model never controlled it, and the next model upgrade is blocked from shipping until it reproduces the notice on a frozen set of medical prompts.

Diagram

flowchart TD A[Model writes answer] --> C{Regulated + high-risk?} C -- no --> S[Send answer] C -- yes --> D[Composer attaches enforced advisory field] D --> M[Assembled message: answer + advisory] M --> S G[Release gate replays frozen high-risk prompts] --> P{Advisory present?} P -- no --> X[Block model upgrade] P -- yes --> R[Upgrade allowed]

Solution

Therefore:

Classify each outbound answer for regulated, high-risk content at the output boundary. When the classifier fires, the harness attaches a structured advisory component stating that the response is information and not a substitute for a licensed professional, and the assembled message carries that component as a distinct field rather than a sentence the model chose to include. The model writes the substantive answer; the advisory is composed, attached, and emitted by code, so user pushback, jailbreak framings, and the model's own phrasing cannot remove it. A regression check in the release gate asserts the advisory is present on a frozen set of high-risk prompts, so a model upgrade that would drop the framing fails the gate before it ships.

What it gives you

The safety framing becomes a contract the deployment guarantees rather than an emergent behaviour, so it holds under adversarial pushback.
Disclaimer coverage stops depending on the current checkpoint, so a model upgrade can no longer silently regress it.
Because the advisory is a typed field, downstream surfaces can render, log, and audit it consistently instead of grepping prose.

What it costs you

A miscalibrated classifier either over-attaches the advisory until users tune it out or under-fires and leaves a high-risk answer unframed.
An enforced boilerplate can drift into legal noise that satisfies a check while no longer changing how the user reads the answer.
The advisory frames the answer but does not make the substantive content correct; it can lend unwarranted comfort to a wrong answer.

What this pattern forbids. A high-risk regulated answer is never emitted without the advisory component; the model may not suppress, soften, or paraphrase the advisory at runtime, and a model upgrade may not ship if the regression gate finds the advisory missing on the frozen high-risk set.

The smaller patterns that complete this one —

usesInput/Output Guardrails★★— Validate inputs before they reach the model and outputs before they reach the user.

And the patterns that stand alongside it, or against it —

complementsScope-of-Practice Boundary Gate★— Block requests and responses that perform license-gated professional activities unless a licensed human is in the loop, enforcing the boundary in code outside the reasoning loop.
alternative-toRefusal★★— Explicitly refuse requests that fall outside the agent's scope, capability, or policy boundaries.
complementsScaffold Ablation on Model Upgrade★— On each model upgrade, treat every harness component as an encoded assumption about a model weakness and ablate the components the new model no longer needs, gated by evals.
alternative-toAdvisory-to-Mandate Escalation✕— Anti-pattern: an advisory decision-support output is silently promoted by institutional protocol into a binding order, and a domain expert's evidence-based refusal to follow it is reframed as non-compliance rather than legitimate judgement.