Cascading Agent Failures

also known as Kaskadierende Ausfälle, ASI08, Multi-Agent Cascade

Anti-pattern: build a multi-agent system where one agent's failure or hallucination propagates as input to peers, until the whole system has drifted.

Context

A multi-agent system has agents that consume each other's outputs — a researcher feeds a writer, a writer feeds an editor, a critic feeds a planner. Each agent treats its inbound messages as if they were trustworthy peer outputs. There is no circuit-breaker between agents.

Problem

A localised failure — a hallucinated fact, a corrupted memory write, a tool error misinterpreted as success — propagates through the message graph. Each downstream agent integrates the failure into its own reasoning and emits a confidently-wrong output that the next agent in turn treats as input. The system fails as a unit, not as individual agents; classical per-agent retries do not help because the inputs are themselves poisoned.

Forces

Multi-agent systems gain throughput by delegating; eliminating inter-agent trust eliminates the gain.
Failures in one agent are silent at the message layer — bad outputs look syntactically valid.
Synchronous fan-out amplifies single failures into multi-agent failures within one trace.

Example

A research-pipeline of researcher → drafter → editor agents produces a customer-facing report. The researcher hallucinates a citation. The drafter integrates it with confident phrasing. The editor polishes the prose and adds three more references that 'support' the hallucinated one. The report ships. Postmortem: no inter-agent validation; the editor's job was prose, but the failure was factual, and no edge in the graph was responsible for catching it.

Diagram

flowchart TD Trigger[Agent A emits bad output → Agent B treats as fact] --> Bad{Recognise as anti-pattern?} Bad -- no --> Harm[Harm propagates] Bad -- yes --> Mitigate[Apply mitigation pattern] Mitigate --> Safe[Risk bounded] classDef bad fill:#fee,stroke:#c33; class Trigger,Harm bad;

Solution

Therefore:

Don't. Apply per-edge validation between agents — type checks, schema validation, confidence thresholds. Use external-critic or agent-as-judge on intermediate messages, not just final output. Cap retry-fan-out so one root failure cannot recursively spawn more agents. See unbounded-subagent-spawn and unbounded-loop for related shapes.

What this pattern forbids. No useful constraint; the missing constraint is per-edge validation.

The patterns that counter or replace it —

complementsUnbounded Subagent Spawn✕— Anti-pattern: a supervisor or orchestrator spawns sub-agents that can themselves spawn sub-agents without a global cap.
complementsUnbounded Loop✕— Anti-pattern: run the agent loop without a step budget and let model self-termination decide.
alternative-toAgent-as-a-Judge★— Evaluate an agent's full trajectory (steps, tool calls, intermediate states) by another agent rather than scoring only the final output.
alternative-toSubagent Isolation★— Run subagents in isolated workspaces so their writes do not collide and parallelism is safe.
complementsMemory Poisoning✕— Anti-pattern: write to agent long-term memory (vector store, knowledge graph, episodic log) from any surface the agent reads, with no provenance check.
complementsInsecure Inter-Agent Channel✕— Anti-pattern: pass messages between agents on shared transports without authenticating the sending agent, the message content, or the sequence.
complementsAgent Bullwhip Effect✕— Anti-pattern: distributed supply-chain or replenishment agents, each optimising locally, amplify order variability through their own decision policy, so a local demand spike triggers synchronised chain-wide reordering and supplier stockouts that propagate backward.
complementsGhost Delegation✕— Anti-pattern: in a multi-agent hierarchy a task handoff silently vanishes — the delegated work waits forever and the parent closes while its subtask is orphaned, and because no error fires nothing restarts it.
complementsHidden Distributed Monolith (Multi-Agent)✕— Anti-pattern: a multi-agent system is presented as decoupled, independently deployable agents, but at runtime they share context, run in synchronous chains, and have no failure isolation, so it behaves as a tightly-coupled distributed monolith.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance

Source: patterns/cascading-agent-failures.md on GitHub · commit 159e600 · view history
Added to catalog: 2026-05-21
Last updated: 2026-05-21
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.