World Model as Tool
also known as Foresight Simulator Call, Generative-Sim Lookahead, Dyna-Think, Sim-as-Tool
Let a planning agent invoke a generative world model as a tool to roll out hypothetical futures before committing to an action, treating the world model as a callable simulator rather than a training target.
This pattern helps complete certain larger patterns —
- specialisesTool Use★★— Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.
- used-byCoalition Formation·— Agents form temporary subgroups around a task because the coalition can achieve more value than the sum of its members acting alone, with explicit rules for who joins and how payoff or credit is shared.
Context
A team builds a planning agent that has to act in an environment where the consequences of an action depend on physics, geometry, or rich perceptual dynamics: a household robot, a game-playing agent, an embodied agent moving in a 3D scene, or a control system over a continuous process. A capable generative world model (a video diffusion model, a learned dynamics model, an external simulator) exists that can produce a plausible rollout when given a description of the current state and a candidate action. Some of the actions the agent might take are irreversible or expensive enough that the team would rather not learn about them by acting first.
Problem
Text-level lookahead, where the agent just thinks step by step about what would happen if it acted, is weak when the answer depends on physical or perceptual details the model never represented in its text reasoning: whether the glass will tip at the shelf edge, whether the gripper will collide with the cup behind it, whether the lever will jam. The model can write a confident paragraph about either outcome without that paragraph having any contact with the actual dynamics. Training a tightly-integrated world model into the agent itself is expensive and locks the system to one model that quickly becomes stale. Acting without any lookahead is unsafe in environments where mistakes are not cheap to undo. The team needs grounded foresight without paying the cost of training their own world model from scratch.
Forces
- Text-level reasoning often underrates physical or perceptual consequences of an action.
- Generative world models are improving rapidly and are available off the shelf.
- Training a bespoke world model inside the agent is expensive and quickly stale.
- World-model rollouts are themselves noisy and must not be trusted verbatim as ground truth.
- Many environments are partially irreversible — acting without lookahead is costly.
Example
A household robot agent considers two candidate plans for placing a glass on a shelf. Before acting, it calls a generative world-model tool with the current scene and each candidate plan. The simulator returns two predicted rollouts; the second shows the glass tipping at the shelf edge with non-trivial probability. The agent picks the first plan and logs both rollouts alongside the action so the team can later audit why the second was rejected. The world model is not perfect, but its output catches a failure that text reasoning over the scene description had missed.
Diagram
Solution
Therefore:
Register the generative world model behind a tool interface: input is a structured description of the current state plus a candidate action sequence; output is a generated rollout (video frames, simulated trajectory, predicted observations) plus optional model-side uncertainty. The planning agent calls this tool when it considers an action whose physical or perceptual consequence is hard to reason about. The agent compares predicted rollouts across candidate actions, weighs them against text-level reasoning, and uses simulator agreement as a gate before any irreversible or expensive action. The world model is treated as fallible — its output is evidence, not truth — and is logged alongside the action for later replay.
What this pattern forbids. Rollouts from the world model must be treated as evidence, never as ground truth; the agent must not act on irreversible operations based on simulator output alone, and any acted-on rollout must be logged alongside the action for replay.
And the patterns that stand alongside it, or against it —
- complementsWorld-Model Separation★— Maintain an explicit, surprise-updated model of the environment (humans, repos, services, capabilities) in a separate file from the agent's self-model, so the two cannot be confused or co-mutated by reflection.
- complementsTree of Thoughts★— Search over a tree of partial reasoning states with explicit lookahead, evaluation, and backtracking.
- complementsLanguage Agent Tree Search·— Lift the agent loop into a search tree with a learned value function and backtracking.
- complementsSimulate Before Actuate★— Before issuing an irreversible action, run a deterministic simulation that computes pre-conditions, invariants, and expected deltas; require a verifier — automated or human — to green-light the simulated outcome before the real command is sent.
- complementsHybrid Symbolic-Neural Routing★— Per query, route between a symbolic path (rule engine, knowledge graph) and a neural path (LLM), using the LLM for interpretation and the symbolic layer for exact constraints.
- complementsWorld-Model Graph Memory★— Memory store structured as a typed entity-relation graph used as the agent's authoritative world model for planning — not only for retrieval.
- complementsMental-Model-In-The-Loop Simulator·— Run candidate multi-step strategies inside an internal simulator of the environment before committing in the real world — broader than simulate-before-actuate (single action) by simulating multi-step strategies.
- complementsBDI Agent★★— Agent maintains explicit Beliefs about the world, Desires (goals), and Intentions (committed plans), and reasons by reconciling the three.
- complementsJoint Commitment Team·— A team of agents adopts a shared goal plus the meta-commitment that each member will notify the others as soon as it believes the goal is achieved, impossible, or no longer relevant.
- complementsStigmergic Coordination★★— Agents coordinate indirectly by leaving and reading marks in a shared environment (files, queues, scratchpads, world model) so that one agent's trace stimulates another's next action, with no direct messaging.
- alternative-toDistributed Constraint Optimization·— A group of agents jointly assigns values to shared variables to minimise (or maximise) a global cost defined by inter-agent constraints, exchanging only the messages needed.
- complementsPartial Global Planning·— Each agent maintains a partial view of others' plans and incrementally merges local plans into a shared partial global plan, interleaving coordination with execution.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.