UpTrain

Stale

GitHub Python Apache-2.0

Description

An evaluation and monitoring tool for LLM applications that checks response quality, context relevance, factuality, and user feedback for agent systems.

Related Projects

LM Evaluation Harness

12.5k · Python

Active

A framework for few-shot evaluation of language models by EleutherAI, providing standardized evaluation pipelines supporting hundreds of benchmark tasks and widely adopted as a core LLM evaluation tool in the community.

llm-evaluationbenchmarkevaluation-framework +2

Giskard

5.3k · Python

Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

Garak

7.8k · Python

Active

NVIDIA's open-source LLM vulnerability scanner that automatically detects security issues in language models including safety vulnerabilities, hallucination tendencies, jailbreak risks, and prompt injection attacks.

llm-securityvulnerability-scannerllm-evaluation +2

Inspect AI

2.0k · Python

Active

A framework for large language model evaluations developed by the UK AI Safety Institute (AISI), providing comprehensive model capability assessment tools with support for safety and alignment testing.

llm-evaluationai-safetyevaluation-framework +2

UpTrain

Description

Tags

Categories

Related Projects

LM Evaluation Harness

Giskard

Garak

Inspect AI