XIV · Anti-PatternsAnti-pattern

Orchestrator as Bottleneck

also known as Single-Process Scheduler Bottleneck, Centralized Orchestrator Cap

Anti-pattern: route all agent runs through a single-process orchestrator that becomes the system-wide concurrency ceiling.

Context

A team adopts a workflow engine or supervisor pattern early and runs it as a single process. Workers scale horizontally, but the orchestrator is one box managing state, dispatching events, and tracking run progress.

Problem

The orchestrator becomes the load-bearing single point of contention. Practical scaling ceiling sits around 10–100 concurrent workflows depending on how chatty the orchestrator is. Adding workers does not help; they queue waiting for orchestrator decisions. The fix is structural (sharded orchestrator, event-driven dispatch, or stateless-reducer per workflow) and expensive to retrofit once business logic depends on the centralized view.

Forces

  • Centralized orchestrators are dramatically easier to reason about, debug, and visualize.
  • Sharding orchestration breaks naive global views (cross-workflow queries become expensive).
  • The bottleneck only shows up at scale, after the architecture is hard to change.

Example

A team builds a multi-agent system on a single Python supervisor process. Works fine for 30 concurrent workflows. At 200 concurrent workflows the supervisor pegs CPU dispatching events; workers idle waiting for assignments. Adding workers does nothing. The fix is sharding the supervisor by tenant id, which requires rewriting all cross-tenant analytics queries that assumed a single in-memory view.

Diagram

Solution

Therefore:

Partition orchestrator state by run id, tenant, or workflow type. Use durable event stores (Kafka, Temporal, Postgres logical replication) so multiple orchestrator replicas can subscribe independently. Where a single global view is needed, build it as a materialized projection of the event log, not as the orchestrator's local state. Pair with stateless-reducer-agent so each workflow can be rehydrated on any replica.

What this pattern forbids. No useful constraint; the missing constraint is horizontally partitionable orchestration from day one.

And the patterns that stand alongside it, or against it —

  • complementsStateless Reducer AgentDesign the agent as a pure function (state, event) → newState; entire execution history is held in an external event log; enables pause / resume / replay / time-travel without bespoke checkpointing.
  • alternative-toEvent-Driven Agent★★Trigger the agent on external events (webhooks, message queues, file changes) instead of user requests or schedules.
  • complementsDurable Workflow SnapshotCapture workflow execution state as a snapshot in a pluggable storage provider so a paused run can resume across deployments, process restarts, and host crashes.
  • complementsBlocking Sync Calls in Agent LoopAnti-pattern: run synchronous, blocking I/O inside the agent loop or HTTP handler, capping concurrency at the number of OS threads.
  • alternative-toSupervisor★★Place a coordinating agent above a set of specialised agents and route work to them.
  • complementsInfrastructure Burst Bottleneck (Agent Scale-Out)Anti-pattern: deploy agents whose scale-out behavior triggers sudden data-and-compute bursts that on-prem or under-provisioned cloud infrastructure cannot absorb; agents work at small scale and freeze in production.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance