Production LLM Platform

Type: recipe

Stand up a production LLM/RAG system whose data pipeline, model pipeline, and inference path scale and deploy independently.

Description. End-to-end production architecture for an LLM application with RAG over a continuously-changing corpus and (optionally) fine-tuned models. Decomposed into feature, training, and inference pipelines (FTI) communicating only via a feature store and a model registry; the inference path itself is split into a CPU business service and a GPU LLM service; the feature pipeline is fed by change-data-capture events streamed through typed stages; production traffic is monitored by sampled LLM-judge evaluation.

Patterns this recipe composes —

★★FTI LLM Pipeline Split
core
Three independent pipelines; feature store + model registry as the only integration surfaces.
★★Business + LLM Microservice Split
core
CPU business service + GPU LLM service behind one REST contract.
★★CDC-Driven Vector Sync
core
Source-of-truth store emits CDC events; feature pipeline consumes them.
★Streaming Feature Pipeline
core
Typed per-stage models (raw → cleaned → chunked → embedded) on a streaming framework.
★★Crawler Dispatcher
core
URL pattern → crawler class registry for heterogeneous source ingestion.
★★Vector Memory
core
★★Naive RAG
core
★Sampled Prompt Trace Eval
hardening
Bounded-cost production-quality monitoring via random + slice-weighted sampling.
★Dimensional Synthetic Eval Set
hardening
Mode-collapse-resistant offline eval coverage.
[evaluation-driven-development]
hardening
★Tenant-Scoped Tool Binding
hardening
Multi-tenant isolation at the tool and retrieval layer.
★★Prompt Caching
optional
★Agent-as-a-Judge
optional
★Canonical-Entity Grounding
optional
Ground identifiers against the system of record before acting.

Neighbourhood

Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.

Instantiated by (1)

full-code · framework
Amazon Bedrock AgentCore
Full-Code

Provenance

Last updated: 2026-06-10