VIII · Safety & ControlMature★★

Mandatory Red-Flag Escalation

also known as Unconditional Escalation Trigger, Red-Flag Short-Circuit

Maintain a deterministic set of high-risk triggers so that on any match the agent immediately aborts its workflow and hands off to a human, without weighing whether to escalate.

Context

An agent runs a conversational or task workflow where a small fraction of inputs signal a life-, safety-, or money-critical situation: a clinical triage line, a crisis-support chat, a fraud or safety hotline, an industrial-control assistant. The workflow normally proceeds turn by turn, gathering context and acting, but a few signals mean that continuing the normal flow at all is the wrong move and a human must take over at once.

Problem

When a high-stakes signal appears, an agent left to its own judgement tends to keep doing what it was doing: it asks one more clarifying question, finishes the current step, or scores the situation against a soft threshold before deciding to escalate. Each of those extra turns is a chance to mishandle an emergency, and a model that treats escalation as one option among many will sometimes rationalise staying in the loop. The system needs a guarantee that a matched red flag stops normal handling immediately, not a tendency that usually does.

Forces

  • Speed of handoff competes with completeness of context: aborting instantly is safest, yet a warm handoff still needs the situation captured so far.
  • A deterministic trigger set is auditable and hard to talk past, but it must be tuned or it over-triages and trains operators to ignore it.
  • Letting the model decide when to escalate is flexible but unreliable under pressure; hard-coding the abort is rigid but dependable.
  • Red-flag rules drift as protocols change, so the trigger set is a maintained artifact rather than a one-time list.

Example

A telehealth triage voice agent is collecting symptoms when the caller says their father is not breathing. The phrase matches a red-flag trigger, so the agent abandons the symptom questionnaire mid-question, stops all further autonomous handling, and warm-transfers the call to an on-call nurse with the transcript and the fields gathered so far attached, rather than finishing the form first.

Diagram

Solution

Therefore:

Keep red-flag triggers as a maintained, deterministic rule set — keyword and phrase matches, classifier thresholds, or structured-field conditions — evaluated against every turn before the normal workflow runs. The check sits outside the model's discretion: a match fires regardless of what the agent was planning. On a match the runtime short-circuits the remaining workflow, stops any further autonomous handling, and performs a warm handoff that passes the human operator the transcript and any structured context already collected. The model never votes on whether to escalate; its only role after a match is to relay the brief, fixed escalation message while control transfers. The trigger set is versioned and reviewed against the underlying protocol so it stays neither too noisy nor too narrow.

What this pattern forbids. On a matched red flag the agent must abort the workflow and hand off to a human immediately; it may not continue autonomous handling, ask further questions, or treat escalation as one option to be weighed against continuing.

The smaller patterns that complete this one —

And the patterns that stand alongside it, or against it —

  • complementsHuman-in-the-Loop★★Require explicit human approval at defined points before the agent performs an action.
  • complementsRisk-Tiered Action AutonomySet an agent's permitted action class by the financial materiality of the action, letting it read and draft freely while requiring a different human principal to release material postings, payments, or filings.
  • complementsInput/Output Guardrails★★Validate inputs before they reach the model and outputs before they reach the user.
  • complementsSLA-Aware Triage ScoringOrder the work queue by a single fused score that blends each ticket's time-to-SLA-breach, the requester's entitlement tier, and sentiment trajectory, and surface items predicted to breach before they do.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.