Intermediate Artifact Evaluation
Evaluate intermediate artifacts (plans, tool-call traces, guardrail reactions) not only final outputs; isolates failure to a specific pipeline node.
Problem
Final-output-only eval is coarse — it indicates something failed but not where. When pipelines have many nodes (plan, tools, guardrails, reflection), the team cannot improve any specific node without per-node signal. Differs from eval-harness (full-run eval) and eval-as-contract (boundary contract).
Solution
Each pipeline node emits a named artifact (plan, tool-call trace, guardrail decision, reflection output). Eval suite has per-artifact rubrics. Per-artifact pass/fail rates inform which node to improve. Pair with eval-harness, eval-as-contract, llm-as-judge, agent-evaluator, dual-evaluation-offline-online.
When to use
- Multi-node pipelines where failure attribution matters.
- Engineering capacity for per-node instrumentation and rubrics.
- Improvement work benefits from targeted node-level signal.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.