AgentList
HomeProjectsArticlesAbout
Explore Projects
HomeProjectsArticlesAbout
Explore Projects
Home / Projects / AWS Agent Evaluation

AWS Agent Evaluation

Normal
GitHub Python Apache-2.0

Description

Amazon's AI agent evaluation tool for automated quality assessment of Bedrock Agents and other LLM agents with multi-dimensional metrics and benchmarks.

Tags

aws evaluation benchmark agent-quality testing

Categories

📊 Observability
Visit GitHub

Project Metrics

Stars 358
Forks 65
Watchers 10
Issues 18
Created May 1, 2025
Last commit March 5, 2026

Deployment

Local

Related Projects

Giskard

5.3k · Python
Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

AgentLabs

546 · TypeScript
Stale

AgentLabs is a toolkit for agent development and testing, focused on experimentation, replay, and workflow support to improve iteration speed.

testingdeveloper-toolsevaluation +1

DeepEval

14.9k · Python
Active

DeepEval is an open-source evaluation framework for LLM applications. It provides rich evaluation metrics and tools, supporting unit testing and integration testing to help developers build reliable LLM applications.

llmevaluationtesting +1

Ragas

13.6k · Python
Normal

Ragas is a framework for evaluating RAG (Retrieval Augmented Generation) systems. It provides various evaluation metrics including faithfulness, answer relevance, context precision, helping developers optimize RAG application performance.

ragevaluationllm +1
AgentList

Curated directory of open-source AI agent projects

Quick Links

  • Project List
  • Featured Articles
  • Browse Categories

Contact

  • About
  • Privacy Policy
  • Contact Us

© 2026 AgentList. All rights reserved.

Made with for the open source community