Inspect AI
ActiveDescription
A framework for large language model evaluations developed by the UK AI Safety Institute (AISI), providing comprehensive model capability assessment tools with support for safety and alignment testing.
A framework for large language model evaluations developed by the UK AI Safety Institute (AISI), providing comprehensive model capability assessment tools with support for safety and alignment testing.
CVS Health's open-source uncertainty quantification library for language models, providing UQ-based hallucination detection with confidence scoring and mitigation tools to identify and reduce unreliable LLM outputs.
LangEvals aggregates various language model evaluators into a single platform, providing a standardized LLM evaluation interface with safety checks.
A framework for few-shot evaluation of language models by EleutherAI, providing standardized evaluation pipelines supporting hundreds of benchmark tasks and widely adopted as a core LLM evaluation tool in the community.
HuggingFace's all-in-one toolkit for evaluating LLMs across multiple backends, deeply integrated with the HuggingFace ecosystem and providing flexible evaluation metrics and benchmark configuration.