Promptfoo
Type: full-code · Vendor: Promptfoo, Inc. · Language: TypeScript · License: MIT · Status: active · Status in practice: mature · First released: 2023
Promptfoo is an open-source command-line tool that runs declarative assertion-based test suites against prompts, models, and RAG or agent systems, and can red-team them for vulnerabilities.
Description. Promptfoo evaluates prompts, models, and RAG or agent pipelines against a YAML test suite of assertions, returning pass or fail and a non-zero exit code in CI when a test fails. Assertions include deterministic checks and model-graded checks such as llm-rubric, where an LLM grades the output against custom criteria. It also provides a red-teaming mode that generates simulated adversarial inputs to find vulnerabilities before deployment.
Agent loop shape. Promptfoo has no agent loop of its own. It is run from the command line over a configuration that lists prompts, providers, and test cases with assertions. For each test case it calls the configured provider, applies each assertion to the output, and aggregates pass or fail results, exiting non-zero in CI on any failure. In red-team mode it instead generates adversarial inputs and runs them against the target to surface failures.
Primary use cases
- assertion-based prompt and model evaluation in CI
- model-graded scoring of open-ended outputs
- red-teaming LLM applications for vulnerabilities
Key concepts
- Assertion → eval-as-contract (docs) — A declarative output check attached to a test case; deterministic assertions (contains, cost, latency) and model-graded assertions (llm-rubric) together decide whether the case passes.
- llm-rubric → llm-as-judge (docs) — A model-graded assertion type where an LLM grades a free-form output against stated criteria, used when no exact-match check fits the expected behaviour.
- Provider → prompt-variant-evaluation (docs) — A configured model or endpoint under test; listing several providers lets one test suite run across multiple models for side-by-side comparison.
- Red team (Promptfoo) → red-team-sandbox-reproduction (docs) — A mode that curates and generates a diverse set of malicious intents targeting potential vulnerabilities and runs them against the application, either as one-off scans or continuously in the deployment pipeline.
Patterns this full-code implements —
- ★★Eval as Contract
Promptfoo runs declarative assertion-based eval suites that return pass/fail and exit non-zero in CI, so a release is blocked unless the evals pass — the eval suite literally is the release gate.
- ★★LLM-as-Judge
Promptfoo supports model-graded assertions such as llm-rubric that use an LLM to evaluate outputs against custom criteria or rubrics when no exact-match metric applies.
- ★Red-Team Sandbox Reproduction
Promptfoo's red-team mode generates simulated adversarial inputs to find vulnerabilities in an LLM application before deployment, running an adversarial regression suite against the target; note Prom…
- ★★Prompt Variant Evaluation
Promptfoo runs the same test cases across many prompts, models, and providers and produces a matrix view that compares the outputs side by side, so competing prompt or model variants are scored as a…
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.