Production LLM Platform
Type: recipe
Stand up a production LLM/RAG system whose data pipeline, model pipeline, and inference path scale and deploy independently.
Description. End-to-end production architecture for an LLM application with RAG over a continuously-changing corpus and (optionally) fine-tuned models. Decomposed into feature, training, and inference pipelines (FTI) communicating only via a feature store and a model registry; the inference path itself is split into a CPU business service and a GPU LLM service; the feature pipeline is fed by change-data-capture events streamed through typed stages; production traffic is monitored by sampled LLM-judge evaluation.
Patterns this recipe composes —
- ★★FTI LLM Pipeline Split
Three independent pipelines; feature store + model registry as the only integration surfaces.
- ★★Business + LLM Microservice Split
CPU business service + GPU LLM service behind one REST contract.
- ★★CDC-Driven Vector Sync
Source-of-truth store emits CDC events; feature pipeline consumes them.
- ★Streaming Feature Pipeline
Typed per-stage models (raw → cleaned → chunked → embedded) on a streaming framework.
- ★★Crawler Dispatcher
URL pattern → crawler class registry for heterogeneous source ingestion.
- ★★Vector Memory
- ★★Naive RAG
- ★Sampled Prompt Trace Eval
Bounded-cost production-quality monitoring via random + slice-weighted sampling.
- ★Dimensional Synthetic Eval Set
Mode-collapse-resistant offline eval coverage.
- [evaluation-driven-development]
- ★★Prompt Caching
- ★Agent-as-a-Judge