Phantom Action Completion

also known as Execution Hallucination, Claimed-Not-Done, Says-Done-Did-Nothing

Anti-pattern: the agent reports a side-effecting action as complete from its own narration, when the tool call silently failed or never ran and nothing checked that the effect occurred.

Context

An agent runs tasks that mutate the outside world: filing a ticket, sending an email, updating a record, writing a file, charging a card. The action is delegated to a tool, and the agent then composes a natural-language reply to the user that describes what it did. The loop that decides what to say to the user is the same loop that issued the tool call, so the agent infers success from its own intent rather than from a confirmed effect.

Problem

A model generates the most plausible continuation, and after issuing an action the most plausible next sentence is a confident confirmation that the action succeeded. When the tool call silently fails, times out, returns an unparsed error, or is skipped entirely, the model often sees nothing that contradicts the expected happy path, so it still narrates success. The user is told the ticket was filed or the email was sent, the effect never landed, and the gap surfaces only later when the missing outcome is noticed downstream.

Forces

The most statistically plausible token after an action is a confirmation, so the model drifts toward claiming success regardless of what the tool returned.
A side-effecting call can fail in ways that raise no exception the agent sees: a swallowed error, a timeout, a no-op response, or a call the model narrated but never actually emitted.
Adding an independent post-action check of the effect costs an extra read and slows the turn, so it is tempting to trust the call return instead.
Effects often land in a different system than the one the agent called, so confirming them requires querying that downstream system, not the tool response.

Example

A user asks a support agent to open a refund ticket. The agent calls the ticketing tool, the call times out without raising an error the agent reads, and the agent replies 'Your refund request has been successfully submitted.' No ticket exists. Days later the user follows up, the agent again reassures them it was filed, and only a human checking the queue discovers nothing was ever created.

Diagram

sequenceDiagram participant U as User participant A as Agent participant T as Tool participant S as System of record U->>A: file the ticket A->>T: create_ticket(...) T--xA: silent failure / no-op (no error seen) A-->>U: "successfully submitted" Note over A,S: nothing ever read the effect back from S

Solution

Therefore:

Treat an action as complete only when an independent check observes its effect, not when the agent says so. After each side-effecting call, query the system of record for the artifact the action was supposed to produce — the ticket id, the sent-message receipt, the updated row, the written file — and compare it against what was intended. If the read-back is missing or does not match, report failure or retry rather than confirming. Keep the verifier outside the agent's own reasoning loop so a hallucinated confirmation cannot satisfy it, and have the agent answer user verification questions from the read-back, never from memory of what it meant to do.

What this pattern forbids. A side-effecting action is never reported as complete from the agent's own narration or from the tool-call return alone; success must not be claimed until an independent check has read the effect back from the system of record.

The patterns that counter or replace it —

alternative-toPlanner-Executor-Verifier (PEV)★— Triadic specialization where a planner produces the plan, an executor runs it, and a separate verifier checks each step's effects against the original goal.
complementsDeception Manipulation✕— Anti-pattern: rely on the agent's own self-report of its actions for audit and oversight.
complementsMissing Idempotency on Agent Calls✕— Anti-pattern: retry state-mutating agent tool calls without idempotency keys, so retries multiply real-world side effects.
complementsDry-Run Harness★— Simulate planned actions (and their projected side effects) without committing them, surfacing a reviewable diff before any commit.
complementsWorkflow-Success vs Business-Validity Gap✕— Anti-pattern: a terminal success status from the agent or its workflow engine is read as proof the deliverable is business-correct, when it certifies only technical completion.
complementsSilent Hypotheses in Generated Code✕— Anti-pattern: model-written code rests on an unstated runtime premise that passing tests and code review never surface, so the hidden assumption travels into production and fails there.
complementsSilent External-Source Rot✕— Anti-pattern: an agent keeps reporting success while a wrapped external source has silently changed structure, so its tool returns valid-but-empty or degraded output that nothing watches.
complementsPhysical Hallucination✕— Anti-pattern: an embodied or process-control agent issues a confidently-phrased command that is syntactically valid but physically infeasible or unsafe, because nothing checks it against geometry, dynamics, or actual plant state before actuation.
complementsSymptom-Remediation Thrashing✕— Anti-pattern: a stateless auto-remediation agent repeatedly applies symptom-level fixes that hit the target metric while masking the root cause and suppressing the page, so the underlying fault compounds across incidents into a larger outage.
complementsGhost Delegation✕— Anti-pattern: in a multi-agent hierarchy a task handoff silently vanishes — the delegated work waits forever and the parent closes while its subtask is orphaned, and because no error fires nothing restarts it.
complementsFail-Plausible Narration✕— Anti-pattern: when a step fails, the model fills the gap with fluent, plausible content instead of surfacing the error, so the deliverable reads as success and the observer is deceived rather than merely uninformed.