XI · Structure & DataMature★★

FTI LLM Pipeline Split

also known as Feature-Training-Inference Split, FTI Architecture for LLMs

Decompose an LLM/RAG system into three independently-deployable pipelines — feature, training, inference — communicating only via a feature store and a model registry.

Context

An LLM application team owns data ingestion (cleaning raw documents into RAG features), model adaptation (SFT / DPO over the resulting datasets), and serving (retrieval + generation). Each axis has different cadence, hardware, and team ownership. Bundling them into one repository and deploy cycle couples otherwise independent work.

Problem

A monolithic LLM application makes every change touch every team. Re-embedding the corpus requires a deploy that the inference path inherits. Bumping the SFT recipe forces retraining tied to the inference release cycle. Serving SLOs are held hostage by data-pipeline failures. Without a clean decomposition along the F/T/I axes, teams step on each other and the system drifts toward incoherent versioning.

Forces

  • Feature, training, and inference have different cadences (continuous, periodic, on-request).
  • Different teams (data, ML, platform) want to own different axes.
  • Feature store and model registry are the natural integration points.
  • Decomposition adds two integration surfaces that must be operated.

Example

A RAG-and-fine-tuned-model product splits into three pipelines. The data team owns the feature pipeline that ingests Confluence and Salesforce, embeds, and writes to Pinecone. The ML team owns the training pipeline that periodically pulls eval-curated feature subsets and produces DPO-tuned models registered in MLflow. The platform team owns the inference service that reads Pinecone at request time and loads the current registered model. Each team deploys without coordination.

Diagram

Solution

Therefore:

Define three pipelines. Feature pipeline: ingests raw documents, cleans, chunks, embeds, writes to the feature store (typically a vector DB plus a document store). Training pipeline: reads features from the store, fine-tunes (SFT, DPO), writes models to the model registry. Inference pipeline: reads from the feature store at request time, loads the model from the registry, generates. Communication is only via the two integration surfaces — no direct code or service calls cross pipelines. Each pipeline deploys on its own cadence.

What this pattern forbids. An LLM/RAG system must not couple feature ingestion, model adaptation, and serving in one deploy unit; the three pipelines communicate only through a feature store and a model registry.

The smaller patterns that complete this one —

  • usesVector Memory★★Store memories as embeddings in a vector index and retrieve the most semantically similar items at query time.

And the patterns that stand alongside it, or against it —

  • composes-withBusiness + LLM Microservice Split★★Split an LLM application into a CPU-bound business microservice (retrieval, prompt assembly, orchestration) and a GPU-bound LLM microservice (only model.generate behind REST), so each tier scales on its own hardware budget.
  • composes-withCDC-Driven Vector Sync★★Treat the source-of-truth document store as the only writer; keep the vector index in sync by emitting change-data-capture events onto a queue that the feature pipeline consumes.
  • composes-withStreaming Feature PipelineProcess raw documents into RAG features as a continuous stream rather than a batch job, with typed models pinning each stage.
  • complementsNaive RAG★★Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
  • complementsAugmented LLM★★Build the foundational agent block as an LLM augmented with retrieval, tools, and memory that the model actively chooses to use, rather than a bare-model call.
  • composes-withCrawler Dispatcher★★Route each incoming URL to a domain-specific crawler through a central dispatcher mapping URL patterns to registered crawler classes.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.