Framework · Enterprise Platforms

Arize Phoenix / Arize AX

Arize Phoenix and Arize AX trace agent runs and evaluate them with LLM judges, scoring both individual steps and the full trajectory of tool calls.

Description

Arize provides open-source Phoenix and the hosted Arize AX platform for tracing and evaluating LLM and agent applications. It captures spans and traces of agent runs and evaluates them with LLM-based judges. Trajectory evaluations send the ordered list of an agent's tool calls to an LLM judge that classifies the run as correct or incorrect, catching mistakes that single-step evaluations miss. Phoenix Evals runs these LLM classifications against captured traces.

Solution

Arize observes the agent rather than running its loop. It ingests spans and traces from an instrumented agent, then evaluation jobs send the captured artifacts — an individual span, or the ordered list of tool calls forming a trajectory — to an LLM judge that classifies them as correct or incorrect against criteria, surfacing failures at the step or trajectory level for the developer to act on.

Primary use cases

tracing LLM and agent runs
LLM-as-judge evaluation of agent trajectories
scoring individual tool-calling spans
detecting mistakes an agent makes between steps

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.