Deepchecks

Stale

GitHub Python NOASSERTION

Description

Testing and monitoring platform for ML and LLM applications — unit tests for AI.

Key Features

ML testing — Auto-check data drift, label leakage, and model performance pre/post training
LLM evals — Built-in checks for hallucination, bias, and toxicity
CI-friendly — Wire into pytest in a few lines
Visualization — HTML reports make check results intuitive
OSS and self-host — Data stays local; suitable for sensitive industries
Extensible — Custom Checks and Suites for business needs

Use Cases

💡 Establish regression tests for ML teams before model rollout.

💡 Auto-check LLM outputs for hallucination and toxicity.

💡 Run data-drift checks in CI to prevent model degradation.

Quick Start

# Install
pip install deepchecks
# LLM eval example
from deepchecks.llm.checks import Toxicity
result = Toxicity().run(
    production_samples={'text': ['I hate this product']},
)
result.show()

Visit GitHub

Related Projects

UpTrain

2.4k · Python

Stale

An evaluation and monitoring tool for LLM applications that checks response quality, context relevance, factuality, and user feedback for agent systems.

llm-evaluationmonitoringtesting

Giskard

5.5k · Python

Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

AgentOps

5.7k · Python

Active

AgentOps is an observability platform for AI agents, providing monitoring, debugging, and evaluation to help developers optimize agent performance.

observabilitymonitoringdebugging +1

Crucix

10.4k · JavaScript

Normal

Crucix is a personal intelligence agent that watches the world from multiple data sources and pings you when something changes, helping you stay on top of information in real time.

agentautomationmonitoring +2