IV · Retrieval & RAGEmerging

Streaming Feature Pipeline

also known as Real-Time RAG Feature Pipeline, Bytewax-Style RAG Ingest

Process raw documents into RAG features as a continuous stream rather than a batch job, with typed models pinning each stage.

Context

An LLM application's vector index must stay close to the live state of an evolving corpus. Batch rebuilds run every N hours and lag the source. The team wants the pipeline to consume change events as they happen and update the index immediately.

Problem

Batch ingestion lags the source by the rebuild cadence and wastes compute re-processing unchanged documents. Ad-hoc streaming code without a stage-pinning discipline (raw → cleaned → chunked → embedded) accumulates implicit data shape transitions that break silently as the pipeline evolves. Without a typed stream pipeline, real-time RAG ingestion becomes a debug nightmare on every schema or chunking change.

Forces

  • Lag between source change and vector update should be seconds, not hours.
  • Each stage (clean, chunk, embed) has different cost and parallelism profile.
  • Typed data at each stage catches shape drift early.
  • Failure of one event should not poison the stream.

Example

A documentation platform's RAG index must stay current as engineers edit pages. A Bytewax pipeline consumes a Kafka topic of page-change events. Each event flows through RawPage → CleanedPage → ChunkedPage → EmbeddedPage stages; the embedded output upserts into Qdrant. A bad page (binary content masquerading as HTML) goes to the DLQ; the stream keeps moving. Engineers see edited pages in RAG within seconds.

Diagram

Solution

Therefore:

Use a streaming framework (Bytewax, Flink, Kafka Streams) to consume change events. Define a Pydantic (or equivalent) model per stage: RawDocument → CleanedDocument → ChunkedDocument → EmbeddedDocument. Each stage is a map operation that takes one model and emits the next; type errors surface at the stage boundary. Failed events go to a dead-letter queue for inspection rather than blocking the stream. Upserts to the vector index happen as the embedded model flows out of the last stage.

What this pattern forbids. Real-time RAG/feature ingestion must not use implicit data shapes across pipeline stages; a typed model is pinned at each stage transition.

The smaller patterns that complete this one —

  • usesVector Memory★★Store memories as embeddings in a vector index and retrieve the most semantically similar items at query time.

And the patterns that stand alongside it, or against it —

  • composes-withCDC-Driven Vector Sync★★Treat the source-of-truth document store as the only writer; keep the vector index in sync by emitting change-data-capture events onto a queue that the feature pipeline consumes.
  • composes-withFTI LLM Pipeline Split★★Decompose an LLM/RAG system into three independently-deployable pipelines — feature, training, inference — communicating only via a feature store and a model registry.
  • complementsEvent-Driven Agent★★Trigger the agent on external events (webhooks, message queues, file changes) instead of user requests or schedules.
  • complementsNaive RAG★★Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.