DeepEval

Active

GitHub Python Apache-2.0

Description

DeepEval is an open-source evaluation framework for LLM applications. It provides rich evaluation metrics and tools, supporting unit testing and integration testing to help developers build reliable LLM applications.

Related Projects

Ragas

13.9k · Python

Normal

Ragas is a framework for evaluating RAG (Retrieval Augmented Generation) systems. It provides various evaluation metrics including faithfulness, answer relevance, context precision, helping developers optimize RAG application performance.

ragevaluationllm +1

AgentLabs

548 · TypeScript

Stale

AgentLabs is a toolkit for agent development and testing, focused on experimentation, replay, and workflow support to improve iteration speed.

testingdeveloper-toolsevaluation +1

Prompt Ops

809 · Python

Active

An open-source tool from Meta for LLM prompt optimization. Automates the process of continuously improving and refining LLM prompts.

prompt-engineeringllmtools +2

PromptTools

3.0k · Python

Normal

PromptTools provides open-source tools for prompt testing and experimentation, supporting multiple LLMs (OpenAI, LLaMA) and vector databases (Chroma, Weaviate, LanceDB) to help developers systematically evaluate and optimize RAG systems.

prompt-testingragevaluation +3

RAG评估Ragas

RAG System Evaluation in Practice: Building High-Quality RAG Apps with Ragas and DeepEval

Learn how to evaluate RAG systems using Ragas and DeepEval, including measuring key metrics like faithfulness, answer relevance, and context precision.

DeepEval

Description

Tags

Categories