📊

Best Observability Top 20

Top 20 most popular open-source Observability projects, ranked by GitHub Stars.

1

Kong

43.7k Stars

The cloud-native API and AI Gateway providing LLM request routing, rate limiting, load balancing and observability for AI agent applications.

observabilityapiagentlua
2

Prompt Optimizer

31.6k Stars

An AI prompt optimizer that helps users write better prompts and achieve improved AI results.

prompt-engineeringevaluationllmtypescript
3

Langfuse

30.2k Stars

Open-source LLM engineering platform providing tracing, evaluations, prompt management, and dataset management with integrations for LangChain, OpenAI, Anthropic, and more.

observabilitytracingllm-evaluationprompt-management
4

Langfuse

30.2k Stars

Open-source LLM observability: tracing, evals, prompt management.

langfuseobservabilitytracingevals
5

Langfuse

30.2k Stars

Langfuse is an open-source observability platform for LLM applications, supporting tracing, evaluation, prompt versioning, and cost analytics.

observabilitytracingllmanalytics
6

MLflow

26.8k Stars

MLflow is the open-source AI engineering platform for debugging, evaluating, monitoring, and optimizing AI agents and LLM applications, with model and data access management.

mlflowllmopsevaluationobservability
7

12 Factor Agents

23.9k Stars

What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?

agentframeworkevaluationobservability
8

Promptfoo

22.8k Stars

CLI tool that combines LLM prompt testing with red-teaming.

promptfootestingred-teamcli
9

Promptfoo

22.8k Stars

Test and evaluate LLM prompts, agents, and RAG pipelines. Built-in red teaming and security evaluation for reliable AI applications.

testingevaluationred-teamingprompt-testing
10

Promptfoo

22.8k Stars

Promptfoo is an evaluation and regression testing tool for LLM apps and agents, useful for comparing prompts, tool-call results, and model outputs over time.

evaluationtestingpromptstypescript
11

Agents Towards Production

20.9k Stars

End-to-end, code-first tutorials for building production-grade GenAI agents. From prototype to enterprise deployment.

agentframeworkevaluationobservability
12

Opik

20.2k Stars

Opik is an open-source LLM observability platform providing agent tracing, evaluation testing, and prompt experiment management to help developers monitor and optimize AI agent systems.

observabilityllm-evaluationtracingprompt-testing
13

openobserve

19.6k Stars

OpenObserve is a high-performance observability platform for logs, metrics, and traces, well suited for monitoring AI agent runtimes and tool calls.

observabilitylogsmetricstracing
14

OpenAI Evals

18.8k Stars

OpenAI's framework for evaluating LLMs and LLM systems, providing an open-source registry of benchmarks and tools for systematic model assessment.

llm-evaluationbenchmarkevalsred-teaming
15

ccusage

16.7k Stars

Analyze coding (agent) CLI token usage and costs from local data.

token-usagecost-analysisclirust
16

DeepEval

16.6k Stars

DeepEval is an open-source evaluation framework for LLM applications. It provides rich evaluation metrics and tools, supporting unit testing and integration testing to help developers build reliable LLM applications.

llmevaluationtestingrag
17

RagaAI Catalyst

16.1k Stars

RagaAI Catalyst is an observability, monitoring, and evaluation framework for Agent AI, supporting agent/LLM/tool tracing, multi-agent debugging, and self-hosted dashboard analytics.

observabilitytracingevaluationagent-monitoring
18

Ragas

14.6k Stars

Ragas is a framework for evaluating RAG (Retrieval Augmented Generation) systems. It provides various evaluation metrics including faithfulness, answer relevance, context precision, helping developers optimize RAG application performance.

ragevaluationllmtesting
19

OpenMetadata

14.4k Stars

OpenMetadata is a unified metadata platform for data and AI, providing data asset discovery, lineage, governance, and agent context retrieval capabilities.

observabilitymetadatadata-governancelineage
20

LM Evaluation Harness

13.1k Stars

A framework for few-shot evaluation of language models by EleutherAI, providing standardized evaluation pipelines supporting hundreds of benchmark tasks and widely adopted as a core LLM evaluation tool in the community.

llm-evaluationbenchmarkevaluation-frameworklanguage-model

Related Articles