Recipe · Recipes

Production LLM Platform

Stand up a production LLM/RAG system whose data pipeline, model pipeline, and inference path scale and deploy independently.

Description

End-to-end production architecture for an LLM application with RAG over a continuously-changing corpus and (optionally) fine-tuned models. Decomposed into feature, training, and inference pipelines (FTI) communicating only via a feature store and a model registry; the inference path itself is split into a CPU business service and a GPU LLM service; the feature pipeline is fed by change-data-capture events streamed through typed stages; production traffic is monitored by sampled LLM-judge evaluation.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.