Hugging Face Evaluate
ActiveDescription
A library by Hugging Face for easily evaluating machine learning models and datasets, providing a wide range of metrics and evaluation methods.
A library by Hugging Face for easily evaluating machine learning models and datasets, providing a wide range of metrics and evaluation methods.
Argilla is a collaboration platform for AI engineers and domain experts to build high-quality datasets, collect human feedback, and evaluate models.
A toolkit by Weights & Biases for developing AI-powered applications, providing LLM call tracing, evaluation experiment management, and versioning from prototype to production.
An automatic prompt optimization framework by Salesforce AI Research that leverages LLMs to search for and refine prompts for improved model performance.
A comprehensive benchmark to evaluate LLMs as agents (ICLR 2024), covering operating systems, databases, knowledge graphs, digital card games and more.