Coval

Active

GitHub Python Apache-2.0

Description

Coval is an evaluation tool for voice and conversational agents, helping teams test response quality, interaction stability, and real dialog behavior.

Related Projects

Giskard

5.3k · Python

Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

PrompToMatix

948 · Python

Stale

An automatic prompt optimization framework by Salesforce AI Research that leverages LLMs to search for and refine prompts for improved model performance.

prompt-engineeringevaluationllm +1

AgentBench

3.3k · Python

Normal

A comprehensive benchmark to evaluate LLMs as agents (ICLR 2024), covering operating systems, databases, knowledge graphs, digital card games and more.

evaluationpythonagent +1

AgentLabs

546 · TypeScript

Stale

AgentLabs is a toolkit for agent development and testing, focused on experimentation, replay, and workflow support to improve iteration speed.

testingdeveloper-toolsevaluation +1