Workflow-Success vs Business-Validity Gap

also known as Green-Run Fallacy, Technically-Done-Not-Publishable, Success-Status Means Business-Correct

Anti-pattern: a terminal success status from the agent or its workflow engine is read as proof the deliverable is business-correct, when it certifies only technical completion.

Context

An agent runs inside a workflow or pipeline that publishes a terminal status when the run finishes. The run touched the right files, produced a format the downstream system accepts, raised no exception, and exited cleanly. A controller, a dashboard, or a human watching the queue then treats that green status as the answer to the question that actually matters: is this output something the business can ship.

Problem

Technical completion and business validity are two different properties, and the exit signal only measures the first. A run that finishes without error has proven that the steps executed, not that the deliverable is right for its purpose: a generated article can be on-format, on-length, and on-time yet factually wrong, off-brand, or unpublishable. When the green status is trusted as a quality verdict, business-invalid output flows downstream unreviewed and is discovered only by a customer, an auditor, or a regulator, long after the run was marked done.

Forces

A clean exit is cheap to compute and easy to surface, while business validity needs a separate, slower judgement that the workflow engine cannot make on its own.
Workflow engines and agent harnesses are built to report execution status, so their strongest, most visible signal is exactly the one that says nothing about correctness.
Volume pressure pushes operators to clear the queue on the green status alone, because reviewing every run for business correctness is the work the automation was meant to remove.

Example

A content agent regenerates marketing pages overnight. Each run reads the brief, rewrites the page, saves it in the CMS format, and exits without error, so the pipeline marks every page 'done'. One morning a customer points out a page that quotes the wrong price and a competitor's name. The run was green: it finished cleanly. Nothing ever checked whether the page was actually right to publish.

Diagram

flowchart TD R[Agent run finishes] --> S{Terminal status} S -->|success: technical only| M[Misread as 'deliverable is correct'] M --> P[Auto-publish business-invalid output] S -->|correct path| G[Business-validation gate] G -->|valid| OK[Publish] G -->|invalid| H[Hold for review]

Solution

Therefore:

The remedy is to split the two signals and never collapse them. Keep the workflow's terminal status as a statement about execution only, and add an explicit business-validation step that scores the deliverable against the rules that decide whether it can ship: factual grounding, brand and policy conformance, completeness against the brief, and any domain checks the format alone cannot express. A run that exits cleanly enters a held state pending that validation rather than a published state. Surface the two outcomes separately on the dashboard, so a green execution status with a failing business check is visible as a problem rather than hidden behind a single tick.

What this pattern forbids. A terminal success status must not be treated as a business-correctness verdict; a cleanly-finished run cannot be published or marked correct before a separate business-validation step has checked the deliverable against shipping rules.

The patterns that counter or replace it —

complementsPhantom Action Completion✕— Anti-pattern: the agent reports a side-effecting action as complete from its own narration, when the tool call silently failed or never ran and nothing checked that the effect occurred.
complementsFalse Resolution✕— The agent proposes a compromise that addresses each constraint individually but subtly violates one in joint interpretation, shipping as success but discovered as failure at audit.
alternative-toSupervisor-Plus-Gate★— Supervisor controller that validates and gates LLM outputs against deterministic checks before they commit to side-effects.
alternative-toDeterministic-LLM Sandwich★— Bracket every LLM call with deterministic checks on both sides.
complementsSilent Hypotheses in Generated Code✕— Anti-pattern: model-written code rests on an unstated runtime premise that passing tests and code review never surface, so the hidden assumption travels into production and fails there.