AWS Agent Evaluation
NormalDescription
Amazon's AI agent evaluation tool for automated quality assessment of Bedrock Agents and other LLM agents with multi-dimensional metrics and benchmarks.
Amazon's AI agent evaluation tool for automated quality assessment of Bedrock Agents and other LLM agents with multi-dimensional metrics and benchmarks.
An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.
AgentLabs is a toolkit for agent development and testing, focused on experimentation, replay, and workflow support to improve iteration speed.
DeepEval is an open-source evaluation framework for LLM applications. It provides rich evaluation metrics and tools, supporting unit testing and integration testing to help developers build reliable LLM applications.
Ragas is a framework for evaluating RAG (Retrieval Augmented Generation) systems. It provides various evaluation metrics including faithfulness, answer relevance, context precision, helping developers optimize RAG application performance.