Routing & Composition

Parallel Tool Calls

Allow the model to emit several independent tool calls in one assistant turn; the host executes them in parallel.

Problem

If the agent issues these calls sequentially, the wall-clock latency is the sum of every call even though none of them depend on the others, and the product feels sluggish for no good reason. Building a full directed-acyclic-graph planner that schedules tool calls and tracks dependencies is heavyweight for the simple case where the model already knows which calls are independent. The team needs a lighter way to let independent calls run at the same time without standing up a planner.

Solution

The provider's API allows the assistant turn to contain multiple tool calls. The host fans them out concurrently (with bounded concurrency and rate-limit handling). Results return as multiple tool messages; the next assistant turn sees all of them.

When to use

  • The model frequently issues multiple independent tool calls per turn.
  • The provider's API supports multiple tool calls in one assistant message.
  • The host can fan out concurrent calls with bounded concurrency and rate-limit handling.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related