Cascading Agent Failures
also known as Kaskadierende Ausfälle, ASI08, Multi-Agent Cascade
Anti-pattern: build a multi-agent system where one agent's failure or hallucination propagates as input to peers, until the whole system has drifted.
Context
A multi-agent system has agents that consume each other's outputs — a researcher feeds a writer, a writer feeds an editor, a critic feeds a planner. Each agent treats its inbound messages as if they were trustworthy peer outputs. There is no circuit-breaker between agents.
Problem
A localised failure — a hallucinated fact, a corrupted memory write, a tool error misinterpreted as success — propagates through the message graph. Each downstream agent integrates the failure into its own reasoning and emits a confidently-wrong output that the next agent in turn treats as input. The system fails as a unit, not as individual agents; classical per-agent retries do not help because the inputs are themselves poisoned.
Forces
- Multi-agent systems gain throughput by delegating; eliminating inter-agent trust eliminates the gain.
- Failures in one agent are silent at the message layer — bad outputs look syntactically valid.
- Synchronous fan-out amplifies single failures into multi-agent failures within one trace.
Example
A research-pipeline of researcher → drafter → editor agents produces a customer-facing report. The researcher hallucinates a citation. The drafter integrates it with confident phrasing. The editor polishes the prose and adds three more references that 'support' the hallucinated one. The report ships. Postmortem: no inter-agent validation; the editor's job was prose, but the failure was factual, and no edge in the graph was responsible for catching it.
Diagram
Solution
Therefore:
Don't. Apply per-edge validation between agents — type checks, schema validation, confidence thresholds. Use external-critic or agent-as-judge on intermediate messages, not just final output. Cap retry-fan-out so one root failure cannot recursively spawn more agents. See unbounded-subagent-spawn and unbounded-loop for related shapes.
What this pattern forbids. No useful constraint; the missing constraint is per-edge validation.
And the patterns that stand alongside it, or against it —
- complementsUnbounded Subagent Spawn✕— Anti-pattern: a supervisor or orchestrator spawns sub-agents that can themselves spawn sub-agents without a global cap.
- complementsUnbounded Loop✕— Anti-pattern: run the agent loop without a step budget and let model self-termination decide.
- alternative-toAgent-as-a-Judge★— Evaluate an agent's full trajectory (steps, tool calls, intermediate states) by another agent rather than scoring only the final output.
- alternative-toSubagent Isolation★— Run subagents in isolated workspaces so their writes do not collide and parallelism is safe.
- complementsMemory Poisoning✕— Anti-pattern: write to agent long-term memory (vector store, knowledge graph, episodic log) from any surface the agent reads, with no provenance check.
- complementsInsecure Inter-Agent Channel✕— Anti-pattern: pass messages between agents on shared transports without authenticating the sending agent, the message content, or the sequence.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.