UpTrain

Stale

GitHub Python Apache-2.0

Description

An evaluation and monitoring tool for LLM applications that checks response quality, context relevance, factuality, and user feedback for agent systems.

Key Features

LLM response quality evaluation with automated scoring across multiple dimensions
Context relevance checking to verify retrieved information matches queries
Factuality verification to detect hallucinations and unsupported claims
User feedback integration for continuous improvement of agent outputs
One-click evaluation dashboard for visualizing evaluation results over time
Support for evaluating multi-step agent workflows end-to-end

Use Cases

💡 Monitor and improve LLM-powered customer support agents in production

💡 Evaluate prompt engineering iterations before deploying to users

💡 Detect quality regressions in retrieval-augmented generation pipelines

💡 Benchmark different LLM providers for specific agent tasks

Quick Start

Install via `pip install uptrain`. Initialize an UpTrain evaluation object, define your checks (response quality, context relevance, factuality), and run evaluations against your LLM outputs. Results appear in a local dashboard for analysis.

Visit GitHub

Related Projects

Deepchecks

4.0k · Python

Stale

Testing and monitoring platform for ML and LLM applications — unit tests for AI.

testingmonitoringllm-eval +1

LM Evaluation Harness

13.1k · Python

Active

A framework for few-shot evaluation of language models by EleutherAI, providing standardized evaluation pipelines supporting hundreds of benchmark tasks and widely adopted as a core LLM evaluation tool in the community.

llm-evaluationbenchmarkevaluation-framework +2

Giskard

5.5k · Python

Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

Garak

8.3k · Python

Active

NVIDIA's open-source LLM vulnerability scanner that automatically detects security issues in language models including safety vulnerabilities, hallucination tendencies, jailbreak risks, and prompt injection attacks.

llm-securityvulnerability-scannerllm-evaluation +2

RAGhallucination-detectionagent-evaluation

Agent Hallucination Defense: Practical Mitigation Patterns Beyond Guardrails

Why do LLM agents hallucinate? This article traces root causes and systematically reviews practical mitigation patterns: retrieval augmentation, confidence scoring, multi-agent cross-validation, forced citation backtracking, and observability with UpTrain, Giskard, RagaAI Catalyst, Comet Opik, and NVIDIA Garak.

UpTrain

Description

Key Features

Use Cases

Tags

Categories

Quick Start

Related Projects

Deepchecks

LM Evaluation Harness

Giskard

Garak

Related Articles

Agent Hallucination Defense: Practical Mitigation Patterns Beyond Guardrails