VI · Multi-AgentEmerging

Talker-Reasoner

also known as Fast-Slow Agent, System-1 / System-2 Agent Split, 快思考与慢思考Agent

Split an interactive agent into a fast Talker for conversational responses and a slow Reasoner for deliberative planning and tool use, so the conversational loop never blocks on reasoning.

This pattern helps complete certain larger patterns —

  • specialisesAugmented LLM★★Build the foundational agent block as an LLM augmented with retrieval, tools, and memory that the model actively chooses to use, rather than a bare-model call.

Context

A conversational agent has two responsibilities that have different latency profiles. It must keep the user engaged with timely, fluent replies (sub-second), and it must make correct decisions on problems that need multi-step reasoning, tool use, and planning (multi-second to multi-minute). A single agent doing both either feels slow (because every reply waits for the reasoning chain) or feels shallow (because reasoning is truncated to meet the latency budget).

Problem

When one agent loop serves both conversation and deliberation, the system inherits the worse of two latencies. Conversational turns wait for any tool call or reasoning step the agent is doing, so the user perceives the agent as slow even on trivial replies. Compressing the reasoning to fit a chat latency budget gives shallow answers on the queries that actually needed deliberation. The two responsibilities pull the loop in incompatible directions and there is no clean way to honour both.

Forces

  • Conversational latency budget is sub-second; deliberation budget is multi-second to minutes.
  • Truncating deliberation to fit chat latency loses answer quality on hard queries.
  • Coupling the loops means every chat turn pays the deliberation cost.
  • Two loops need a shared memory or hand-off contract so the Talker can reflect the Reasoner's progress.

Example

A sleep-coaching agent gets the message 'I've been waking up at 3am for two weeks.' The Talker replies immediately with an empathetic acknowledgement and asks one clarifying question, while invoking the Reasoner with the case state. Over the next 30 seconds, the Reasoner plans a multi-week intervention (consult sleep-hygiene tools, check the user's history, design a protocol) and writes its conclusions to shared memory. On the user's next turn, the Talker fluently surfaces the protocol without the user ever having waited synchronously for it.

Diagram

Solution

Therefore:

Stand up two sub-agents that share memory. The Talker (System 1) handles every user turn with low-latency intuitive replies grounded in the current shared state — including 'let me think about this' acknowledgements when the Reasoner is mid-flight. The Reasoner (System 2) runs asynchronously, invoked when the Talker recognises a query requires deliberation, and writes its conclusions (plans, tool-call results, evidence) back to shared memory for the Talker to consume on the next turn. The Talker decides what to surface and when; the Reasoner is non-blocking.

What this pattern forbids. The Talker cannot block on the Reasoner; conversational turns must complete from current shared state regardless of Reasoner progress, and the Reasoner cannot speak directly to the user.

And the patterns that stand alongside it, or against it —

  • alternative-toDual-System GUI AgentSplit a GUI agent into a decision model that plans and recovers from errors and a grounding model that observes pixels and emits the precise action; route each subproblem to the better-suited model.
  • composes-withExtended Thinking★★Spend a configurable budget of internal reasoning tokens before producing a user-visible answer.
  • composes-withHandoffTransfer the active conversation from one agent to another, carrying context across the switch.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.