Full-Code · Workflow Enginesactive

Pathway

Type: full-code · Vendor: Pathway · Language: Python, Rust · License: BSL-1.1 · Status: active · Status in practice: emerging · First released: 2023

Links: homepage docs repo

Pathway is a Python and Rust framework that builds streaming data and RAG pipelines on a differential-dataflow engine, keeping vector indexes synchronised with their sources as data changes.

Description. Pathway runs the same pipeline code over batch and streaming data, with a Rust engine that recomputes results incrementally as inputs change. Its LLM extension provides parsers, embedders, splitters, and an in-memory real-time vector index, plus document-store components that re-index automatically on new data. Teams use it to build retrieval-augmented generation pipelines whose indexes stay current without a separate batch re-indexing job. The framework integrates with LlamaIndex and LangChain.

Agent loop shape. Pathway is a data-pipeline runtime rather than an agent loop. Connectors ingest documents and change events; the differential-dataflow engine parses, chunks, embeds, and indexes them incrementally; and a vector or document store serves retrieval queries. A question-answering layer retrieves the top documents for a query and passes them to an LLM, with the index kept in sync as sources change.

Primary use cases

  • real-time retrieval-augmented generation pipelines
  • live document indexing and vector search
  • incremental ETL over streaming and batch data
  • continuously synchronised knowledge bases for agents

Key concepts

  • DocumentStore streaming-feature-pipeline (docs)A pipeline component that automatically indexes documents and updates itself when new data arrives, so the served index reflects the source without a separate batch re-indexing job.
  • Differential dataflow engine (docs)A scalable Rust engine that performs incremental computation, recomputing only what changed as inputs update, and runs the same Python pipeline code over both batch and streaming data.
  • In-memory real-time Vector Index vector-memory (docs)A vector index kept inside the running pipeline that stays synchronised with data sources in real time, serving similarity queries for RAG over live documents.
  • AdaptiveRAGQuestionAnswerer agentic-rag (docs)A RAG question-answerer that limits the number of documents sent to the LLM chat to save tokens, adjusting retrieval to the query.

Patterns this full-code implements —

Neighbourhood

Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.