VIII · Safety & ControlExperimental·

Calibrated Help-Gate via Conformal Prediction

also known as Conformal Help-Gate, KnowNo

Use conformal prediction to form a calibrated set of candidate actions and have the agent ask a human for help only when that set is not a singleton, giving a statistical task-completion guarantee.

This pattern helps complete certain larger patterns —

  • specialisesDisambiguation★★Have the agent ask a clarifying question before acting on an ambiguous request.

Context

An agent, often an embodied or tool-using one, must decide each step whether it is sure enough to act or should stop and ask a human. Self-reported confidence is poorly calibrated — the model says ninety percent and is wrong a third of the time — so a fixed confidence threshold either asks for help too often, killing autonomy, or too rarely, acting on tasks it cannot complete.

Problem

Deciding when an agent should defer to a human is usually done with an uncalibrated confidence number, which gives no guarantee about how often the agent will be wrong when it proceeds. Set the bar too high and the human is flooded with needless questions; too low and the agent confidently acts on instructions it has misunderstood. The agent needs a principled, tunable rule for when to ask that comes with a real guarantee on task success.

Forces

  • Asking for help too rarely lets the agent act on tasks it cannot complete; asking too often destroys autonomy and overloads the human.
  • Raw model confidence scores are not calibrated, so a fixed threshold gives no guarantee on the error rate.
  • A statistical guarantee requires a held-out calibration set and a target coverage level chosen in advance.

Example

A kitchen robot is told to 'put it in the bowl' with two bowls on the counter. Its planner scores the candidate placements, and conformal prediction returns a set with both bowls in it — not a singleton — so instead of guessing, the robot asks 'which bowl?'. When the planner is sure and only one candidate clears the set, it places the item without bothering the human, and across many tasks the calibrated rate of correct completions matches the target it was set.

Diagram

Solution

Therefore:

Collect a calibration set of scored decisions and pick a target success level. At run time the planner emits candidate next actions with scores; conformal prediction turns those scores into a prediction set sized so that, at the chosen coverage, the correct action is inside it. If the set contains exactly one action the agent acts autonomously; if it contains more than one, or none, the agent is uncertain and asks the human to choose. The coverage level tunes the trade-off, and the calibration guarantees the task-completion rate rather than relying on the model's self-assessment.

What this pattern forbids. The agent acts autonomously only when its calibrated prediction set is a singleton; whenever the set holds more than one candidate it must stop and request human help rather than guess.

The smaller patterns that complete this one —

  • usesHuman-in-the-Loop★★Require explicit human approval at defined points before the agent performs an action.

And the patterns that stand alongside it, or against it —

  • alternative-toConfidence ReportingSurface the agent's uncertainty about its answer alongside the answer itself.
  • alternative-toConfidence-Checking WorkflowAlways ask the agent, for each part of its output, to state its confidence and identify which parts need human verification, like triaging a junior analyst's work.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.