II · Planning & Control FlowExperimental·

Speculative Agentic Actions

also known as Speculative Tool Execution, Action Lookahead, Preemptive Tool Bundling

Predict the tool calls the agent is most likely to issue next and execute them preemptively on the current turn, then keep the results that the confirmed trajectory needs and discard the rest.

Context

An agent works a long-horizon task as a strict request-act-observe loop: each turn it reads the prior observation, decides on one tool call, waits for the result, and only then plans the next call. On tasks that take dozens of turns this serialisation is the dominant cost. Every turn replays the growing transcript, pays a model round-trip, and waits on a tool whose outcome was often predictable from the previous observation, so the run exhausts its turn or token budget before the goal is reached.

Problem

Many of the tool calls an agent will make are highly predictable from the current state — after listing a directory it will read the obvious file, after a failing test it will open the named stack-frame, after a search hit it will fetch the top result. Forcing each of these through its own confirm-then-act turn spends a full model round-trip and a transcript replay on a decision the agent had already implicitly made. The agent needs a way to run ahead of itself on the predictable stretches without committing to a wrong branch when the prediction misses.

Forces

  • Each extra turn replays the whole transcript and pays a model round-trip, so collapsing turns directly buys horizon under a fixed token budget — but speculation that misses wastes the very budget it tried to save.
  • The next tool call is often near-certain from the current observation, yet the loop treats every call as if it were a fresh, uncertain decision.
  • Executing a predicted call early overlaps its latency with the model's reasoning, but a speculative call with side effects cannot simply be thrown away if the prediction is wrong.
  • A bolder prediction horizon collapses more turns when right and burns more budget when wrong, so the speculation depth must be tuned to the prediction's confidence.

Example

A debugging agent is told a test failed. From the failure observation a predictor guesses the agent's next move is to open the file named in the top stack frame, so the harness reads that file speculatively while the model is still reasoning. When the model commits to exactly that read, the already-fetched contents are spliced in and the read-and-wait turn is skipped; when the model instead decides to re-run the test, the speculative read is dropped and the loop proceeds normally.

Diagram

Solution

Therefore:

Add a speculation step to the agent loop. From the current observation a lightweight predictor proposes the tool call (or short chain of calls) the agent is most likely to issue next, and the harness dispatches those calls speculatively while the main model reasons about the same turn. When the model commits to its actual next action, the harness checks it against the speculation: on a hit it splices in the already-computed result and skips the round-trip, collapsing two or more turns into one; on a miss it drops the speculative result and falls back to the normal act-observe step. Speculation is confined to read-only, idempotent, side-effect-free calls so a discarded prediction costs only wasted compute, never corrupted state. The prediction horizon is bounded by confidence, so the loop speculates aggressively where the next step is near-certain and conservatively where it is not.

What this pattern forbids. Only read-only, idempotent, side-effect-free tool calls may be executed speculatively; state-changing actions must not run before the agent commits to them, and a mispredicted speculative result must be discarded rather than fed into the trajectory.

And the patterns that stand alongside it, or against it —

  • alternative-toParallel Tool Calls★★Allow the model to emit several independent tool calls in one assistant turn; the host executes them in parallel.
  • complementsLLMCompiler·Take ReWOO's plan-as-DAG and run independent steps in parallel through a task-fetching dispatcher.
  • complementsSleep-Time Compute·During idle or downtime, run the model offline against the user's standing context to pre-compute dense summaries and likely future answers, so test-time latency and cost drop when the user actually asks.
  • complementsWorld Model as Tool·Let a planning agent invoke a generative world model as a tool to roll out hypothetical futures before committing to an action, treating the world model as a callable simulator rather than a training target.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.