Scatter-Gather Plus Saga
also known as Scatter-Gather Saga, Distributed-Transaction Fan-Out
Distribute tasks across worker agents and aggregate results while maintaining distributed-transaction semantics via compensating actions on partial failure.
This pattern helps complete certain larger patterns —
- specialisesParallelization★★— Run independent LLM calls concurrently and combine results.
Context
A team uses parallel agent fan-out for throughput. Workers produce side-effects (writes to systems of record). When some workers fail mid-flight, the partial commits leave the system in an inconsistent state. Plain parallelization has no rollback story; map-reduce assumes pure functions.
Problem
Without saga semantics, partial failures in a fan-out leave half-committed state. The system has no way to recover atomically: workers already committed cannot un-commit, and there is no coordinator that knows which compensating actions to run. Distinct from parallelization (no transactional model) and map-reduce (assumes pure).
Forces
- Distributed transactions across heterogeneous side-effects are not natively supported.
- Compensating actions must be defined per worker — engineering work per side-effect class.
- Partial-failure detection requires per-worker confirmation tracking.
Example
A booking agent fans out to 'reserve flight', 'reserve hotel', 'reserve car'. Flight and car succeed, hotel fails. Saga coordinator runs flight.cancel() and car.cancel() before reporting BookingFailed to the user. Without saga, the user sees a flight and car they did not want and a hotel they did not get.
Diagram
Solution
Therefore:
Each worker exposes (do_action, compensate_action). Coordinator dispatches all workers in parallel. On all-success, gather and return. On any failure, coordinator runs compensate_action for all workers that already committed. Reports outcome as atomic: either all committed (and gathered) or none. Pair with compensating-action, parallelization, map-reduce, supervisor-plus-gate.
What this pattern forbids. Every worker must declare a compensating action; coordinator must run compensations on any worker failure before reporting outcome.
And the patterns that stand alongside it, or against it —
- alternative-toMapReduce for Agents★— Split an oversize task into independent chunks, process each in parallel, then aggregate.
- complementsCompensating Action★★— Pair every irreversible-looking agent action with a compensating action that can undo or counteract it.
- complementsSupervisor-Plus-Gate★— Supervisor controller that validates and gates LLM outputs against deterministic checks before they commit to side-effects.
- complementsMissing Idempotency on Agent Calls✕— Anti-pattern: retry state-mutating agent tool calls without idempotency keys, so retries multiply real-world side effects.
- complementsParallel Fan-Out / Gather★— Multiple independent agents execute in parallel on a partitioned task; a dedicated aggregator agent reconciles their results into a single output.
- complementsContract Net Protocol★★— Classical bid-based multi-agent task allocation: a manager broadcasts a task announcement, contractors submit bids, and the manager awards the contract to the best bid.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.