VII · Verification & ReflectionEmerging

Confidence Reporting

also known as Uncertainty Surfacing, Calibrated Output

Surface the agent's uncertainty about its answer alongside the answer itself.

Context

A team ships an assistant whose answers feed into a downstream decision: a user choosing whether to trust a recommendation, a coder choosing whether to route a record to a senior reviewer, a workflow engine choosing whether to auto-approve a change. The cost of acting on a wrong answer is meaningfully higher than the cost of pausing to verify. The agent already produces answers; the question is how to attach a usable signal of how sure it is.

Problem

Large language models produce answers in the same confident tone whether they actually know the answer or are guessing, so downstream code and human readers cannot tell the two cases apart. Users either trust everything (and get burned on the cases the model fabricated) or distrust everything (and lose the value of the cases the model got right). A routing layer that should escalate uncertain cases to human review has no signal to route on, so it either escalates everything or nothing. Self-reports of confidence from the model are themselves miscalibrated, so simply asking the model whether it is sure does not solve the problem on its own.

Forces

  • Confidence signals are themselves miscalibrated by the model.
  • Surfacing uncertainty erodes user trust if overdone.
  • Sample-based confidence (self-consistency) costs N calls.

Example

A medical-coding assistant proposes ICD-10 codes for clinician review. Coders trust every suggestion equally because the tone is uniform, and miss the cases where the model was actually guessing. The team adds Confidence Reporting: each suggested code carries an explicit calibrated probability and a 'low / medium / high' band, surfaced beside the code. Coders now spend their attention on the low-confidence rows and rubber-stamp the high-confidence ones, and the workflow tool can auto-defer low-confidence cases to a senior coder.

Diagram

Solution

Therefore:

Produce a confidence label (high/medium/low or numeric) alongside each answer. Derive from sample variance (self-consistency), evaluator score, retrieval recall, or rubric score. Render in UI; route low-confidence to fallback or human review.

What this pattern forbids. Outputs without a confidence label are not consumable by confidence-aware downstream code.

The smaller patterns that complete this one —

  • usesSelf-Consistency★★Sample the same question multiple times at non-zero temperature and aggregate by majority or judge to mitigate hallucination.

And the patterns that stand alongside it, or against it —

  • complementsDisambiguation★★Have the agent ask a clarifying question before acting on an ambiguous request.
  • complementsFallback Chain★★Try a primary handler; on failure or low confidence, fall through to a sequence of fallback handlers.
  • complementsAttention-Manipulation Explainability·Surface which input tokens caused a given output by perturbing attention across all transformer layers and measuring the resulting change in output probability, producing a per-token relevance map alongside the model's response.
  • complementsHypothesis Tracking·Persist the agent's candidate provisional answers as a typed ledger of records carrying summary, confidence, status, and next-test, so guesses survive sessions and stay distinguishable from open questions.
  • complementsReflexive Metacognitive Agent·Agent maintains an explicit self-model of its own capabilities, confidence and limitations, and reasons over that model when accepting / refusing / handing off tasks.
  • alternative-toFalse Confidence SyndromeAnti-pattern: the model produces incorrect answers with the same high confidence as correct ones, failing to vary its expressed certainty with its actual reliability — Oxford-documented for constraint-heavy prompts.
  • complementsConfidence-Checking WorkflowAlways ask the agent, for each part of its output, to state its confidence and identify which parts need human verification, like triaging a junior analyst's work.
  • complementsPreference-Uncertain Agent·Agent treats its own reward/objective as a hidden variable to be inferred from human behaviour, not a fixed target.
  • complementsRisk-Averse Reward Proxy·When operating outside the distribution the reward was designed for, treat the specified objective as a noisy proxy and plan conservatively across plausible true objectives.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

References

Provenance