Coval
NormalDescription
Coval is an evaluation tool for voice and conversational agents, helping teams test response quality, interaction stability, and real dialog behavior.
Coval is an evaluation tool for voice and conversational agents, helping teams test response quality, interaction stability, and real dialog behavior.
An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.
An automatic prompt optimization framework by Salesforce AI Research that leverages LLMs to search for and refine prompts for improved model performance.
An open-source, modern-design AI training tracking and visualization tool. Supports PyTorch, Transformers and more. Monitor and evaluate AI agent training processes.
A comprehensive benchmark to evaluate LLMs as agents (ICLR 2024), covering operating systems, databases, knowledge graphs, digital card games and more.