LangSmith
Type: app · Vendor: LangChain Inc. · Language: TypeScript · License: proprietary · Status: active · Status in practice: mature · First released: 2023
LangSmith is a hosted platform for tracing, evaluating, and monitoring LLM applications across development and production.
Description. LangSmith captures traces of LLM and agent runs and evaluates them with code rules, human review, and LLM-as-judge evaluators. It runs offline experiments to gate during development and online evaluators that run automatically on production traces for monitoring and alerting. Failing production traces can be added back to datasets to drive targeted offline experiments.
Agent loop shape. LangSmith does not run the application's agent loop; it observes and evaluates it. Application runs send traces to LangSmith, where evaluators score them. In the offline track, evaluators run over curated datasets as experiments. In the online track, evaluators run automatically on incoming production runs or threads to provide monitoring and alerting, and flagged runs feed back into datasets.
Primary use cases
- tracing LLM and agent runs
- offline evaluation experiments before deploy
- online evaluation and monitoring of production traffic
Key concepts
- Trace and run → decision-log (docs) — A run is a single span of work (an LLM call, a retrieval, a formatting step) and a trace is the collection of runs for one end-to-end operation; together they form the observability record LangSmith stores per request.
- Experiment → dual-evaluation-offline-online (docs) — An offline evaluation run that applies a set of evaluators over a curated dataset to score an application version before deploy, the gate side of LangSmith's two evaluation tracks.
- Online evaluator → dual-evaluation-offline-online (docs) — An evaluator configured to run automatically on incoming production runs or threads (optionally sampled) to provide live monitoring, anomaly detection, and alerting on production traffic.
- Annotation queue (docs) — A human-review surface where runs are queued for people to label or score, complementing the automated code-rule and LLM-as-judge evaluators.
Patterns this app implements —
- ★Dual Evaluation (Offline + Online)
LangSmith provides offline experiments to gate during development and online evaluators that run automatically on production traces for real-time monitoring, feeding failing production traces back in…
- ★★LLM-as-Judge
LangSmith offers LLM-as-judge evaluators that use an LLM to score application outputs against criteria, available alongside code-rule and human-review evaluators in both the offline and online tracks.
- ★★Decision Log
Tracing records each application run as a tree of spans capturing the LLM calls, retrieval calls, and intermediate steps, so a past execution can be inspected after the fact to explain what the appli…
- ★★Prompt Versioning
Prompts stored in LangSmith are versioned: every saved update creates a new immutable commit with a unique hash, and a specific commit or tag can be pulled into application code, so prompts are deplo…
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.