← All booksBook XII

Streaming & UX

How partial state reaches the user.

10 patterns in this book. · Updated

↓ download as png

When to reach for each

01. Stop / Cancel Let the user interrupt an in-flight agent run cleanly, releasing resources and surfacing partial state. Best for: Long-running agents where the user may notice a wrong direction mid-run. Tradeoff: Cancellation plumbing is non-trivial across providers. Watch for: Runs are short and cancellation provides no real value.

02. Citation Streaming Stream citations alongside generated text so the UI can render source links in place as content appears. Best for: Outputs cite documents and users need to verify each claim. Tradeoff: Streaming protocol is more complex. Watch for: Outputs are creative and not grounded in retrievable documents.

03. Streaming Typed Events Push partial results to the client as typed events as they become available, rather than waiting for the full response. Best for: User-facing agents where time-to-first-token is perceived latency. Tradeoff: Connection management complexity. Watch for: Outputs are short enough that batching the full response is fine.

04. Salience-Triggered Output Have the agent emit a message only when an internal salience signal crosses a threshold, not on every cycle. Best for: The agent runs on a tick or always-on loop and emits too often or too seldom. Tradeoff: Threshold tuning is fragile to context shifts. Watch for: The agent is request-driven and emits exactly when asked.

05. Bidirectional Impulse Channel Let the user inject impulses into the agent and let the agent push messages to the user, both through one channel. Best for: The agent runs long enough that pure request-response chat misses the point. Tradeoff: Salience threshold tuning is empirical. Watch for: Interactions are bounded turn-pairs with no need for a back-channel.

All patterns in this book

Stop / Cancel

×8

Let the user interrupt an in-flight agent run cleanly, releasing resources and surfacing partial state.

Citation Streaming

×6

Stream citations alongside generated text so the UI can render source links in place as content appears.

Streaming Typed Events

×3

Push partial results to the client as typed events as they become available, rather than waiting for the full response.

Salience-Triggered Output

×3

Have the agent emit a message only when an internal salience signal crosses a threshold, not on every cycle.

Bidirectional Impulse Channel

×2

Let the user inject impulses into the agent and let the agent push messages to the user, both through one channel.

Embodied-Proxy Handoff

×1

Enable the human to share embodied state (energy, fatigue, environment) so the agent tailors response shape to the actual person rather than to a context-free abstract user.

Liminal-State Detection

×1

Infer the human's attentional state (just-woke, focused, winding-down, distracted) from message timing and tone, and adapt response shape so the agent meets the person where they actually are.

Delayed Streams Modeling

Convert streaming speech tasks into a single decoder-only autoregressive problem by time-aligning the parallel input and output streams with a fixed offset in preprocessing, eliminating the learned r…

Generative UI

Let the agent decide which interface components to render at runtime and stream them to the frontend over a typed protocol, so the surface follows the agent's output instead of being hardcoded.

Unified Voice Interface

Expose text-to-speech, speech-to-text, and real-time speech-to-speech through a single interface so a voice agent can swap providers without rewriting the loop.