Framework · Orchestration Frameworks

Ragas (synthetic testset generation)

Ragas generates synthetic test sets across named question dimensions and scores RAG and agent outputs with LLM-based evaluation metrics.

Description

Ragas is a Python library for evaluating retrieval-augmented and agentic LLM applications. Its synthetic test data generator enumerates named question types (such as reasoning, conditioning, and multi-context) with a configurable distribution and named personas to produce a diverse evaluation set instead of relying on free-form LLM prompting that mode-collapses. Its metrics include LLM-based metrics that use a configured LLM to score outputs against criteria. Ragas is released under Apache 2.0.

Solution

From a set of source documents, Ragas synthesises test questions by enumerating question evolution types across a configurable distribution and seeding generation with named personas, producing a diverse held-out set. At evaluation time, each sample is scored by metrics, where LLM-based metrics issue one or more LLM calls to grade the output against criteria, yielding scores aligned with human judgement.

Primary use cases

  • synthetic test set generation for RAG
  • LLM-based evaluation of RAG and agent outputs
  • measuring retrieval and answer quality
  • regression testing of LLM pipelines

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.