Arize Phoenix

Active

GitHub Python NOASSERTION

Description

Phoenix is an open-source observability and evaluation tool for LLM and agent applications, supporting online tracing and offline diagnosis.

Key Features

OpenTelemetry-based tracing for LLM application runtime observability
LLM-powered evaluation for response and retrieval quality benchmarking
Versioned datasets for experimentation, evaluation, and fine-tuning
Prompt management with version control, tagging, and experimentation
Playground to optimize prompts, compare models, and replay traced calls
Built-in PXI agent for debugging traces and navigating Phoenix

Use Cases

💡 Trace and debug LLM calls across LangChain, LlamaIndex, and OpenAI SDK

💡 Evaluate RAG pipeline retrieval quality with built-in evals

💡 Compare prompt versions and model variants systematically

💡 Monitor production LLM performance and detect regressions

💡 Manage prompt libraries with version control and A/B testing

Quick Start

pip install arize-phoenix → import phoenix as px → px.launch_app() → open http://localhost:6006 → instrument your LLM code with OpenTelemetry

Visit GitHub Visit Website View Docs

Related Projects

Braintrust

3.7k · TypeScript

Active

Braintrust is an evaluation and observability platform for AI applications, providing experiment tracking, scoring, prompt management, and production monitoring for LLM-powered systems.

observabilityevalprompt-management +1

Opik

20.2k · Python

Active

Opik is an open-source LLM observability platform providing agent tracing, evaluation testing, and prompt experiment management to help developers monitor and optimize AI agent systems.

observabilityllm-evaluationtracing +2

Langfuse

30.2k · TypeScript

Active

Open-source LLM engineering platform providing tracing, evaluations, prompt management, and dataset management with integrations for LangChain, OpenAI, Anthropic, and more.

observabilitytracingllm-evaluation +2

Langfuse

30.2k · TypeScript

Active

Open-source LLM observability: tracing, evals, prompt management.

langfuseobservabilitytracing +1

AI Agent可观测性链路追踪

Building Agent Observability: From Distributed Tracing to Automated Evaluation

A systematic guide to the three pillars of agent observability — distributed tracing, metrics monitoring, and automated evaluation — for building production-grade agent monitoring.

Langfuse可观测性Tracing

Agent Observability Playbook: End-to-End Tracing with Langfuse

Based on real production experience, this guide explains how to build a closed loop of tracing, evaluation, and cost analytics for AI agents with Langfuse.

Arize Phoenix

Description

Key Features

Use Cases

Tags

Categories

Quick Start

Related Projects

Braintrust

Opik

Langfuse

Langfuse

Related Articles

Building Agent Observability: From Distributed Tracing to Automated Evaluation

Agent Observability Playbook: End-to-End Tracing with Langfuse