IX · Routing & CompositionMature★★

Parallel Tool Calls

also known as Concurrent Function Calls, Multi-Tool Turn

Allow the model to emit several independent tool calls in one assistant turn; the host executes them in parallel.

This pattern helps complete certain larger patterns —

Context

A tool-using agent is on a task where the next step naturally splits into several independent lookups or actions — fetch three records from different tables, read four files, query two APIs that have nothing to do with each other. The provider's chat API supports a single assistant turn that contains more than one tool call, and the model is capable of identifying these independent calls in one breath rather than thinking step by step.

Problem

If the agent issues these calls sequentially, the wall-clock latency is the sum of every call even though none of them depend on the others, and the product feels sluggish for no good reason. Building a full directed-acyclic-graph planner that schedules tool calls and tracks dependencies is heavyweight for the simple case where the model already knows which calls are independent. The team needs a lighter way to let independent calls run at the same time without standing up a planner.

Forces

  • Concurrency limits per provider.
  • Provider must support multi-tool-call turns.
  • Aggregation of results back into the next turn.
  • Models sometimes emit dependent calls in one turn despite the prompt; the host must detect or document this contract.

Example

An agent that summarises a support ticket needs to fetch the customer record, the recent invoice, and the last three tickets — three independent calls. Sequential dispatch takes a second per call and makes the bot feel sluggish. The team enables parallel-tool-calls in the provider API: the model emits all three tool calls in one assistant turn, the host fans them out concurrently with bounded concurrency, and the next assistant turn sees all three results. Latency drops from three seconds to about one without changing the model.

Diagram

Solution

Therefore:

The provider's API allows the assistant turn to contain multiple tool calls. The host fans them out concurrently (with bounded concurrency and rate-limit handling). Results return as multiple tool messages; the next assistant turn sees all of them.

What this pattern forbids. Tool calls in the same assistant turn are treated as independent; cross-call dependencies are not allowed within one turn.

The smaller patterns that complete this one —

  • usesTool Use★★Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.

And the patterns that stand alongside it, or against it —

  • alternative-toLLMCompiler·Take ReWOO's plan-as-DAG and run independent steps in parallel through a task-fetching dispatcher.
  • alternative-toCode-as-Action AgentHave the agent emit a code snippet as its action each step, executed in a constrained interpreter, instead of emitting JSON tool calls; tool composition becomes function nesting and control flow inside the snippet.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.