TruLens

Active

Description

TruLens is an open-source tool for evaluating and tracking LLM apps. It provides specialized evaluation for RAG applications including context relevance, groundedness, and answer relevance.

Key Features

OpenTelemetry-based tracing with structured OTEL spans
7 agentic evaluators: consistency, efficiency, plan adherence, quality, tool selection, tool calling, tool quality
Batch and inline evaluation with configurable workers
MCP tool call instrumentation for latency and output tracking
RAG Triad evaluation: context relevance, groundedness, answer relevance
Multi-provider support: OpenAI, Anthropic, Google, Bedrock, Snowflake, HuggingFace

Use Cases

💡 Systematically evaluating LLM application quality during development

💡 Monitoring RAG pipeline performance with the RAG Triad metrics

💡 Instrumenting agentic workflows for failure mode detection

💡 Running batch evaluations on datasets to compare model versions

💡 Integrating observability into existing OpenTelemetry infrastructure

Quick Start

pip install trulens-core, then pip install trulens-providers-openai (or your provider). Import instrument decorator, wrap your RAG functions with @instrument, define feedback functions, and run evaluations via the dashboard or Python API.

Visit GitHub Visit Website View Docs

Related Projects

Ragas

14.6k · Python

Stale

Ragas is a framework for evaluating RAG (Retrieval Augmented Generation) systems. It provides various evaluation metrics including faithfulness, answer relevance, context precision, helping developers optimize RAG application performance.

ragevaluationllm +1

SwanLab

4.0k · Python

Active

An open-source, modern-design AI training tracking and visualization tool. Supports PyTorch, Transformers and more. Monitor and evaluate AI agent training processes.

pythonobservabilityevaluation +2

OpenInference

1.1k · Python

Active

OpenTelemetry instrumentation for AI observability, providing standardized tracing, metrics collection, and span definitions for LLM inference processes to help developers monitor and debug AI agent systems.

observabilitypythonllm +2

DeepEval

16.6k · Python

Active

DeepEval is an open-source evaluation framework for LLM applications. It provides rich evaluation metrics and tools, supporting unit testing and integration testing to help developers build reliable LLM applications.

llmevaluationtesting +1

RAG评估Ragas

RAG System Evaluation in Practice: Building High-Quality RAG Apps with Ragas and DeepEval

Learn how to evaluate RAG systems using Ragas and DeepEval, including measuring key metrics like faithfulness, answer relevance, and context precision.

TruLens

Description

Key Features

Use Cases

Tags

Categories

Quick Start

Related Projects

Ragas

SwanLab

OpenInference

DeepEval

Related Articles

RAG System Evaluation in Practice: Building High-Quality RAG Apps with Ragas and DeepEval