UpTrain
StaleDescription
An evaluation and monitoring tool for LLM applications that checks response quality, context relevance, factuality, and user feedback for agent systems.
An evaluation and monitoring tool for LLM applications that checks response quality, context relevance, factuality, and user feedback for agent systems.
A framework for few-shot evaluation of language models by EleutherAI, providing standardized evaluation pipelines supporting hundreds of benchmark tasks and widely adopted as a core LLM evaluation tool in the community.
An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.
NVIDIA's open-source LLM vulnerability scanner that automatically detects security issues in language models including safety vulnerabilities, hallucination tendencies, jailbreak risks, and prompt injection attacks.
A framework for large language model evaluations developed by the UK AI Safety Institute (AISI), providing comprehensive model capability assessment tools with support for safety and alignment testing.