Full-Code · Enterprise Platformsactive

Promptfoo

Type: full-code · Vendor: Promptfoo, Inc. · Language: TypeScript · License: MIT · Status: active · Status in practice: mature · First released: 2023

Links: homepage docs repo

Promptfoo is an open-source command-line tool that runs declarative assertion-based test suites against prompts, models, and RAG or agent systems, and can red-team them for vulnerabilities.

Description. Promptfoo evaluates prompts, models, and RAG or agent pipelines against a YAML test suite of assertions, returning pass or fail and a non-zero exit code in CI when a test fails. Assertions include deterministic checks and model-graded checks such as llm-rubric, where an LLM grades the output against custom criteria. It also provides a red-teaming mode that generates simulated adversarial inputs to find vulnerabilities before deployment.

Agent loop shape. Promptfoo has no agent loop of its own. It is run from the command line over a configuration that lists prompts, providers, and test cases with assertions. For each test case it calls the configured provider, applies each assertion to the output, and aggregates pass or fail results, exiting non-zero in CI on any failure. In red-team mode it instead generates adversarial inputs and runs them against the target to surface failures.

Primary use cases

  • assertion-based prompt and model evaluation in CI
  • model-graded scoring of open-ended outputs
  • red-teaming LLM applications for vulnerabilities

Key concepts

  • Assertion eval-as-contract (docs)A declarative output check attached to a test case; deterministic assertions (contains, cost, latency) and model-graded assertions (llm-rubric) together decide whether the case passes.
  • llm-rubric llm-as-judge (docs)A model-graded assertion type where an LLM grades a free-form output against stated criteria, used when no exact-match check fits the expected behaviour.
  • Provider prompt-variant-evaluation (docs)A configured model or endpoint under test; listing several providers lets one test suite run across multiple models for side-by-side comparison.
  • Red team (Promptfoo) red-team-sandbox-reproduction (docs)A mode that curates and generates a diverse set of malicious intents targeting potential vulnerabilities and runs them against the application, either as one-off scans or continuously in the deployment pipeline.

Patterns this full-code implements —

Neighbourhood

Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.