Inspect AI

Active

Description

A framework for large language model evaluations developed by the UK AI Safety Institute (AISI), providing comprehensive model capability assessment tools with support for safety and alignment testing.

Related Projects

UQLM

1.2k · Python

Active

CVS Health's open-source uncertainty quantification library for language models, providing UQ-based hallucination detection with confidence scoring and mitigation tools to identify and reduce unreliable LLM outputs.

hallucination-detectionuncertainty-quantificationllm-evaluation +2

LangEvals

72 · Unknown

Stale

LangEvals aggregates various language model evaluators into a single platform, providing a standardized LLM evaluation interface with safety checks.

llm-evaluationsafety-evaluationguardrails +1

LM Evaluation Harness

12.8k · Python

Active

A framework for few-shot evaluation of language models by EleutherAI, providing standardized evaluation pipelines supporting hundreds of benchmark tasks and widely adopted as a core LLM evaluation tool in the community.

llm-evaluationbenchmarkevaluation-framework +2

Lighteval

2.4k · Python

Active

HuggingFace's all-in-one toolkit for evaluating LLMs across multiple backends, deeply integrated with the HuggingFace ecosystem and providing flexible evaluation metrics and benchmark configuration.

llm-evaluationevaluation-frameworkhuggingface +2

Inspect AI

Description

Tags

Categories

Related Projects

UQLM

LangEvals

LM Evaluation Harness

Lighteval