Talker-Reasoner
Split an interactive agent into a fast Talker for conversational responses and a slow Reasoner for deliberative planning and tool use, so the conversational loop never blocks on reasoning.
Problem
When one agent loop serves both conversation and deliberation, the system inherits the worse of two latencies. Conversational turns wait for any tool call or reasoning step the agent is doing, so the user perceives the agent as slow even on trivial replies. Compressing the reasoning to fit a chat latency budget gives shallow answers on the queries that actually needed deliberation. The two responsibilities pull the loop in incompatible directions and there is no clean way to honour both.
Solution
Stand up two sub-agents that share memory. The Talker (System 1) handles every user turn with low-latency intuitive replies grounded in the current shared state — including 'let me think about this' acknowledgements when the Reasoner is mid-flight. The Reasoner (System 2) runs asynchronously, invoked when the Talker recognises a query requires deliberation, and writes its conclusions (plans, tool-call results, evidence) back to shared memory for the Talker to consume on the next turn. The Talker decides what to surface and when; the Reasoner is non-blocking.
When to use
- The agent serves an interactive conversational channel with a sub-second latency expectation.
- Some queries need multi-step deliberation that does not fit the conversational budget.
- Acknowledging 'I'm thinking' and surfacing partial progress is acceptable UX.
- Cost split between cheap fast and expensive slow models is meaningful.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.