Recipe · Recipes

Production LLM Platform

Type: recipe

Stand up a production LLM/RAG system whose data pipeline, model pipeline, and inference path scale and deploy independently.

Description. End-to-end production architecture for an LLM application with RAG over a continuously-changing corpus and (optionally) fine-tuned models. Decomposed into feature, training, and inference pipelines (FTI) communicating only via a feature store and a model registry; the inference path itself is split into a CPU business service and a GPU LLM service; the feature pipeline is fed by change-data-capture events streamed through typed stages; production traffic is monitored by sampled LLM-judge evaluation.

Patterns this recipe composes

Provenance

  • Last updated: