Empirica
ActiveDescription
A toolkit for making AI agents and workflows measurably reliable, with epistemic measurement, Noetic RAG, sentinel gating, and grounded calibration.
A toolkit for making AI agents and workflows measurably reliable, with epistemic measurement, Noetic RAG, sentinel gating, and grounded calibration.
Ragas is a framework for evaluating RAG (Retrieval Augmented Generation) systems. It provides various evaluation metrics including faithfulness, answer relevance, context precision, helping developers optimize RAG application performance.
TruLens is an open-source tool for evaluating and tracking LLM apps. It provides specialized evaluation for RAG applications including context relevance, groundedness, and answer relevance.
AutoRAG is an open-source RAG evaluation and optimization framework using AutoML-style automation to help developers automatically find the best RAG pipeline configurations and benchmark them.
PromptTools provides open-source tools for prompt testing and experimentation, supporting multiple LLMs (OpenAI, LLaMA) and vector databases (Chroma, Weaviate, LanceDB) to help developers systematically evaluate and optimize RAG systems.