Streaming & UX
How partial state reaches the user.
10 patterns in this book. · Updated
When to reach for each
01. Stop / Cancel Let the user interrupt an in-flight agent run cleanly, releasing resources and surfacing partial state. Best for: Long-running agents where the user may notice a wrong direction mid-run. Tradeoff: Cancellation plumbing is non-trivial across providers. Watch for: Runs are short and cancellation provides no real value.
02. Citation Streaming Stream citations alongside generated text so the UI can render source links in place as content appears. Best for: Outputs cite documents and users need to verify each claim. Tradeoff: Streaming protocol is more complex. Watch for: Outputs are creative and not grounded in retrievable documents.
03. Streaming Typed Events Push partial results to the client as typed events as they become available, rather than waiting for the full response. Best for: User-facing agents where time-to-first-token is perceived latency. Tradeoff: Connection management complexity. Watch for: Outputs are short enough that batching the full response is fine.
04. Salience-Triggered Output Have the agent emit a message only when an internal salience signal crosses a threshold, not on every cycle. Best for: The agent runs on a tick or always-on loop and emits too often or too seldom. Tradeoff: Threshold tuning is fragile to context shifts. Watch for: The agent is request-driven and emits exactly when asked.
05. Bidirectional Impulse Channel Let the user inject impulses into the agent and let the agent push messages to the user, both through one channel. Best for: The agent runs long enough that pure request-response chat misses the point. Tradeoff: Salience threshold tuning is empirical. Watch for: Interactions are bounded turn-pairs with no need for a back-channel.
All patterns in this book
Stop / Cancel
×8Let the user interrupt an in-flight agent run cleanly, releasing resources and surfacing partial state.
Citation Streaming
×6Stream citations alongside generated text so the UI can render source links in place as content appears.
Streaming Typed Events
×3Push partial results to the client as typed events as they become available, rather than waiting for the full response.
Salience-Triggered Output
×3Have the agent emit a message only when an internal salience signal crosses a threshold, not on every cycle.
Bidirectional Impulse Channel
×2Let the user inject impulses into the agent and let the agent push messages to the user, both through one channel.
Embodied-Proxy Handoff
×1Enable the human to share embodied state (energy, fatigue, environment) so the agent tailors response shape to the actual person rather than to a context-free abstract user.
Liminal-State Detection
×1Infer the human's attentional state (just-woke, focused, winding-down, distracted) from message timing and tone, and adapt response shape so the agent meets the person where they actually are.
Delayed Streams Modeling
Convert streaming speech tasks into a single decoder-only autoregressive problem by time-aligning the parallel input and output streams with a fixed offset in preprocessing, eliminating the learned r…
Generative UI
Let the agent decide which interface components to render at runtime and stream them to the frontend over a typed protocol, so the surface follows the agent's output instead of being hardcoded.
Unified Voice Interface
Expose text-to-speech, speech-to-text, and real-time speech-to-speech through a single interface so a voice agent can swap providers without rewriting the loop.