FTI Pipeline Architecture
also known as feature-training-inference architecture, three-pipeline ML architecture
Split any machine-learning or LLM system into three separate pipelines: Feature, Training, and Inference. They connect only through a feature store and a model registry. Each pipeline runs on its own schedule, has its own dependencies, and scales on its own. They hand work to each other through versioned artefacts, never through direct calls. This lets each pipeline use the tools that fit it. You can use Spark for features, GPU jobs for training, and fast serving for inference. It also lets different teams own different pipelines.
Methodology process overview
Intent. Split a machine-learning or LLM system into three separate pipelines, joined only by a feature store and a model registry, so each one can scale, be swapped out, and be owned on its own.
When to apply. Use this when you design any real machine-learning or LLM system that has to run beyond a notebook, such as recommendation, search, retrieval-augmented generation, an LLM application, or classification. It pays off most when different teams own different parts, when features and training need to scale at different rates, or when serving must be faster than training. Do not apply it to a single-team prototype that fits in one script. The three-way split is just overhead until you hit at least two of the three problems it solves: separate scale, separate ownership, and separate lifecycles.
Inputs
- Raw data sources — Your source-of-truth data, such as events, documents, telemetry, and transactional databases, that the feature pipeline will transform.
- Feature store choice — The feature store you picked, either offline plus online or a single store. It becomes the contract between the feature pipeline and the pipelines that read from it.
- Model registry choice — The registry you picked, such as Comet, MLflow, or SageMaker. It holds versioned trained models with their metadata and lineage.
Outputs
- Feature pipeline — One or more standalone jobs that read raw data, compute features, and write them to the feature store on a set schedule.
- Training pipeline — Jobs that read features and labels from the feature store, train, evaluate, and publish to the model registry.
- Inference pipeline — One or more services that load a registered model and serve predictions. They read any features needed at request time from the feature store.
Steps (6)
Name the three pipelines
Write down feature, training, and inference as three separate components. Each gets its own owner, its own schedule, and its own failure modes. Commit to this before any code lands.
Define the feature contract
Write the feature schema in the feature store. Producers and consumers agree on this contract only, not on the pipelines on either side. Treat any schema change as a versioned event.
Build the feature pipeline
Read the raw sources, compute features, and write them to the feature store. Pick the runtime that fits, such as Spark, Flink, or a change-data-capture stream. Do not tie it to the training pipeline's runtime.
Build the training pipeline
Read features and labels from the store. Train, evaluate, and publish the model to the registry with versioned metadata: data version, code version, and test scores. The registry entry is the only thing the inference pipeline sees.
Build the inference pipeline
Load the model from the registry by version tag. Read request-time features from the feature store, usually the online one. Serve predictions. Promote new model versions through the registry, not by redeploying code.
Operate each pipeline independently
Each pipeline scales, schedules, alerts, and retries on its own. Feature lag, a training failure, and slow inference are three separate incidents with three separate owners. They are not one vague 'the ML system is broken'.
Framework-specific instructions
Pick a framework and generate a framework-targeted rewrite of this methodology's steps.
Choose framework
AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.
Principles
- Three pipelines, two contracts: the feature store and the model registry. The contracts are the only coupling.
- Each pipeline picks its own runtime. Couple pipelines through code instead of artefacts and you lose the whole split.
- Schema changes to a feature or a model are versioned events, not silent edits.
- Independence is the point: separate scale, separate ownership, separate failure surfaces.
Known failure modes (2)
Related patterns (4)
- ★★FTI LLM Pipeline Split
Decompose an LLM/RAG system into three independently-deployable pipelines — feature, training, inference — communicating only via a feature store and a model registry.
- ★Pipeline Triad Pattern
Staff each pipeline stage with a triad — Creator generates an artifact, Critic finds flaws, Arbiter makes a binding PASS/FAIL/PARTIAL decision — with four explicit human gates between stages.
- ★Streaming Feature Pipeline
Process raw documents into RAG features as a continuous stream rather than a batch job, with typed models pinning each stage.
- ★★Business + LLM Microservice Split
Split an LLM application into a CPU-bound business microservice (retrieval, prompt assembly, orchestration) and a GPU-bound LLM microservice (only model.generate behind REST), so each tier scales on its own hardware budget.
Related compositions (2)
- recipe · abstract shapeProduction LLM Platform
Stand up a production LLM/RAG system whose data pipeline, model pipeline, and inference path scale and deploy independently.
- recipe · abstract shapeProduction RAG
Retrieval-grounded generation built to be defensible: hybrid retrieval, reranking, contextualised chunks, citations rendered to the user, and verification before the answer ships.
Related methodologies (2)
- LLM Twin End-to-End Construction★
Produce a production-grade personalised LLM twin through a repeatable pipeline. The pipeline covers data collection, instruction-dataset generation, supervised fine-tuning, preference alignment, evaluation, deployment, and monitoring.
- RAG Microservice Inference Pipeline★★
Split LLM serving into a business microservice and an LLM microservice. The business side handles retrieval orchestration, prompt assembly, and an optional strong reference model. The LLM side loads a fine-tune from the registry and answers a clean prompt. Each side then scales and changes on its own.
Sources (2)
LLM Engineer's Handbook
Ch 1 'Understanding the LLM Twin Concept and Architecture', §'Building ML systems with feature/training/inference pipelines' “The FTI pipeline simplifies the architecture by dividing it into three critical pipelines: feature, training, and inference. ... This division is similar to the DB, business logic, and UI layers seen in classic software systems.”
PacktPublishing/LLM-Engineers-Handbook (book companion repo)
“The system processes data through sequential stages ... Instruct Dataset Pipeline: Generates instruction-following datasets ... Training encompasses SFT and DPO phases”
Provenance
- Added to catalog:
- Last updated:
- Verification status: verified