Safety & Control

Degenerate-Output Detection

Detect when the agent is about to emit a near-duplicate of its own recent output and either drop, replace, or escalate to a stronger model rather than ship the loop.

Problem

The model produces visibly identical or near-identical replies turn after turn — 'How can I help today?' five times in a row — and from the user's side this looks like a broken machine. The model itself cannot detect the repetition because it does not see its own previous outputs as something to compare against, and because each generation samples without memory of the last. Without a layer outside the model that fingerprints recent outputs and reacts, shallow loops keep shipping to users as if each were a fresh answer.

Solution

Maintain a small ring buffer (e.g. last 8 outgoing messages). Before publishing a new reply, normalize (lowercase, strip punctuation) and compare: exact normalized match → duplicate; high Jaccard token overlap (≥0.7) on short replies → near-duplicate. On hit: replace the body with a transparent marker ('I caught myself looping — switching to <stronger-provider> for the next turn. Ask again.') and force-escalate the next turn through a stronger provider. Append a SYSTEM note to history telling the model exactly what it did wrong so it can self-correct.

When to use

  • The agent produces outputs in a loop where consecutive replies can be compared.
  • Near-duplicate outputs are observable failure mode (model wedged, decoding loop, prompt collapse).
  • Cost of detection (similarity check) is small relative to cost of shipping the duplicate.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related