Streaming Typed Events
also known as SSE Streaming, Typed Event Stream, Token Stream + Cards
Push partial results to the client as typed events as they become available, rather than waiting for the full response.
This pattern helps complete certain larger patterns —
- used-byMultilingual Voice Agent Stack★— Compose a voice agent as a tightly co-located pipeline of speech-to-text, language-aware LLM reasoning, and text-to-speech, where one vendor owns all three so language and dialect propagate cleanly across stages.
Context
A team is building a user-facing agent where the time between the user pressing send and the first visible characters appearing is the latency the user actually perceives — what is often called time-to-first-token, or TTFT. The interface is not just plain prose: it shows cards, suggested follow-ups, tool-progress indicators, and progressively disclosed content. The team has to decide how the server should push partial results to the client as they become available.
Problem
Waiting until the full answer is generated before rendering anything feels sluggish even when the actual generation is fast, because the user has nothing to look at during the wait. Streaming a single channel of plain text helps with perceived latency but loses the structure the interface needs: the client receives a stream of characters with no way to tell apart a token of the main answer, the start of a tool call, a structured card, or an error. Without a typed event vocabulary on the stream, the client either waits for the end or guesses, and neither produces a good interface.
Forces
- Browser/network limits on long-lived connections.
- Event ordering and reconnection semantics.
- Backpressure when the client is slow.
Example
A chat product streams a single text channel; the UI cannot tell apart token text, structured cards, suggestions, and tool progress until everything is rendered. The team switches to typed events over SSE: `text_delta`, `card`, `suggestions`, `tool_start`, `tool_end`, `done`, `error`. The client routes each event to the right widget as it arrives; perceived latency drops, structured content renders early, and the UI gains progress indicators.
Diagram
Solution
Therefore:
Use Server-Sent Events (or WebSocket) with a typed event vocabulary: text_delta (token), card (structured), suggestions, tool_start, tool_end, done, error. The client routes each event to the right UI component. Reconnect with last-event-id resumption.
What this pattern forbids. Events are typed; clients cannot consume payloads outside the declared event vocabulary.
The smaller patterns that complete this one —
- generalisesCitation Streaming★★— Stream citations alongside generated text so the UI can render source links in place as content appears.
And the patterns that stand alongside it, or against it —
- complementsStructured Output★★— Constrain the model's output to conform to a JSON Schema (or similar typed shape).
- complementsBidirectional Impulse Channel·— Let the user inject impulses into the agent and let the agent push messages to the user, both through one channel.
- complementsSalience-Triggered Output·— Have the agent emit a message only when an internal salience signal crosses a threshold, not on every cycle.
- complementsStop / Cancel★★— Let the user interrupt an in-flight agent run cleanly, releasing resources and surfacing partial state.
- alternative-toDelayed Streams Modeling★— Convert streaming speech tasks into a single decoder-only autoregressive problem by time-aligning the parallel input and output streams with a fixed offset in preprocessing, eliminating the learned read/write policy that cascade pipelines require.
- complementsUnified Voice Interface★— Expose text-to-speech, speech-to-text, and real-time speech-to-speech through a single interface so a voice agent can swap providers without rewriting the loop.
- complementsGenerative UI★— Let the agent decide which interface components to render at runtime and stream them to the frontend over a typed protocol, so the surface follows the agent's output instead of being hardcoded.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.