Ragas

Stale

GitHub Python Apache-2.0

Description

Ragas is a framework for evaluating RAG (Retrieval Augmented Generation) systems. It provides various evaluation metrics including faithfulness, answer relevance, context precision, helping developers optimize RAG application performance.

Key Features

Objective evaluation metrics using both LLM-based and traditional approaches for precise LLM app assessment
Automatic test data generation covering a wide range of scenarios for RAG systems
Seamless integrations with LangChain, major observability tools, and popular LLM frameworks
Production-aligned feedback loops leveraging real data to continually improve LLM applications
Pre-built quickstart templates for RAG evaluation, agent evaluation, and LLM benchmarking
DiscreteMetric support for custom aspect evaluation with fine-grained scoring and reasoning

Use Cases

💡 Evaluating RAG pipeline quality with faithfulness, relevance, and context precision metrics

💡 Benchmarking different LLM prompts and configurations to find optimal settings

💡 Generating synthetic test datasets for stress-testing retrieval and generation components

💡 Building CI/CD evaluation gates for LLM-powered applications in production

Quick Start

pip install ragas && ragas quickstart rag_eval -o ./my-project

Visit GitHub Visit Website

Related Projects

TruLens

3.4k · Python

Active

TruLens is an open-source tool for evaluating and tracking LLM apps. It provides specialized evaluation for RAG applications including context relevance, groundedness, and answer relevance.

llmevaluationobservability +1

DeepEval

16.6k · Python

Active

DeepEval is an open-source evaluation framework for LLM applications. It provides rich evaluation metrics and tools, supporting unit testing and integration testing to help developers build reliable LLM applications.

llmevaluationtesting +1

PromptTools

3.0k · Python

Stale

PromptTools provides open-source tools for prompt testing and experimentation, supporting multiple LLMs (OpenAI, LLaMA) and vector databases (Chroma, Weaviate, LanceDB) to help developers systematically evaluate and optimize RAG systems.

prompt-testingragevaluation +3

Production Agentic RAG Course

7.3k · Python

Active

A production-focused Agentic RAG course teaching how to build scalable, reliable RAG agent systems with indexing strategies, retrieval optimization, and monitoring.

ragproductioncourse +2

RAG评估Ragas

RAG System Evaluation in Practice: Building High-Quality RAG Apps with Ragas and DeepEval

Learn how to evaluate RAG systems using Ragas and DeepEval, including measuring key metrics like faithfulness, answer relevance, and context precision.

Ragas

Description

Key Features

Use Cases

Tags

Categories

Quick Start

Related Projects

TruLens

DeepEval

PromptTools

Production Agentic RAG Course

Related Articles

RAG System Evaluation in Practice: Building High-Quality RAG Apps with Ragas and DeepEval