Safety & Control

Enforced Advisory Disclaimer

Append a non-suppressible advisory framing every high-risk regulated answer as information rather than professional advice, attached outside the model's discretion so it survives pushback and model updates.

Problem

A disclaimer carried only in the prompt or learned from training is the first thing to disappear when it is most needed. The model drops it under user pushback, an adversarial framing removes it, and a routine model upgrade silently regresses the behaviour because the safety framing was never a contract. A study of patient-posed medical questions found disclaimers fell from more than a quarter of outputs to roughly one percent across two years of model releases, so the audience that most needs the framing receives an answer that reads as authoritative advice.

Solution

Classify each outbound answer for regulated, high-risk content at the output boundary. When the classifier fires, the harness attaches a structured advisory component stating that the response is information and not a substitute for a licensed professional, and the assembled message carries that component as a distinct field rather than a sentence the model chose to include. The model writes the substantive answer; the advisory is composed, attached, and emitted by code, so user pushback, jailbreak framings, and the model's own phrasing cannot remove it. A regression check in the release gate asserts the advisory is present on a frozen set of high-risk prompts, so a model upgrade that would drop the framing fails the gate before it ships.

When to use

Answers in a regulated, high-risk domain (health, legal, finance) are permitted to proceed but can be misread as definitive professional advice.
The safety framing must hold under user pushback, adversarial prompts, and across model upgrades rather than depending on the current checkpoint.
A high-risk classification signal exists or can be built at the output boundary to scope where the advisory fires.

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Problem

Solution

When to use

Related