DeepEval
ActiveDescription
DeepEval is an open-source evaluation framework for LLM applications. It provides rich evaluation metrics and tools, supporting unit testing and integration testing to help developers build reliable LLM applications.
Key Features
- Pytest-compatible LLM evaluation framework with ready-to-use metrics for agents, RAG, and chatbots
- Agentic metrics including Task Completion, Tool Correctness, Step Efficiency, and Plan Adherence
- RAG metrics covering Answer Relevancy, Faithfulness, Contextual Recall/Precision/Relevancy, and RAGAS
- Multi-turn metrics for Knowledge Retention, Conversation Completeness, and Turn Relevancy
- MCP metrics for evaluating Model Context Protocol agent task completion and tool usage
- G-Eval and DAG metrics for custom criteria evaluation using LLM-as-a-judge with human-like accuracy
Use Cases
Tags
Categories
Quick Start
Install via `pip install deepeval`, write test cases using metrics like `AnswerRelevancyMetric` and `FaithfulnessMetric`, run with `deepeval test run` just like pytest, and view results in the terminal or on the Confident AI platform.