Coval
ActiveDescription
Coval is an evaluation tool for voice and conversational agents, helping teams test response quality, interaction stability, and real dialog behavior.
Coval is an evaluation tool for voice and conversational agents, helping teams test response quality, interaction stability, and real dialog behavior.
An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.
An automatic prompt optimization framework by Salesforce AI Research that leverages LLMs to search for and refine prompts for improved model performance.
A comprehensive benchmark to evaluate LLMs as agents (ICLR 2024), covering operating systems, databases, knowledge graphs, digital card games and more.
AgentLabs is a toolkit for agent development and testing, focused on experimentation, replay, and workflow support to improve iteration speed.