Ragas (synthetic testset generation)
Ragas generates synthetic test sets across named question dimensions and scores RAG and agent outputs with LLM-based evaluation metrics.
Description
Ragas is a Python library for evaluating retrieval-augmented and agentic LLM applications. Its synthetic test data generator enumerates named question types (such as reasoning, conditioning, and multi-context) with a configurable distribution and named personas to produce a diverse evaluation set instead of relying on free-form LLM prompting that mode-collapses. Its metrics include LLM-based metrics that use a configured LLM to score outputs against criteria. Ragas is released under Apache 2.0.
Solution
From a set of source documents, Ragas synthesises test questions by enumerating question evolution types across a configurable distribution and seeding generation with named personas, producing a diverse held-out set. At evaluation time, each sample is scored by metrics, where LLM-based metrics issue one or more LLM calls to grade the output against criteria, yielding scores aligned with human judgement.
Primary use cases
- synthetic test set generation for RAG
- LLM-based evaluation of RAG and agent outputs
- measuring retrieval and answer quality
- regression testing of LLM pipelines
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.