Empirica

Active

Description

A toolkit for making AI agents and workflows measurably reliable, with epistemic measurement, Noetic RAG, sentinel gating, and grounded calibration.

Related Projects

Ragas

13.9k · Python

Normal

Ragas is a framework for evaluating RAG (Retrieval Augmented Generation) systems. It provides various evaluation metrics including faithfulness, answer relevance, context precision, helping developers optimize RAG application performance.

ragevaluationllm +1

TruLens

3.3k · Python

Active

TruLens is an open-source tool for evaluating and tracking LLM apps. It provides specialized evaluation for RAG applications including context relevance, groundedness, and answer relevance.

llmevaluationobservability +1

AutoRAG

4.8k · Python

Active

AutoRAG is an open-source RAG evaluation and optimization framework using AutoML-style automation to help developers automatically find the best RAG pipeline configurations and benchmark them.

ragevaluationoptimization +2

PromptTools

3.0k · Python

Normal

PromptTools provides open-source tools for prompt testing and experimentation, supporting multiple LLMs (OpenAI, LLaMA) and vector databases (Chroma, Weaviate, LanceDB) to help developers systematically evaluate and optimize RAG systems.

prompt-testingragevaluation +3

Empirica

Description

Tags

Categories

Related Projects

Ragas

TruLens

AutoRAG

PromptTools