TensorZero

Active

GitHub Rust Apache-2.0

Description

TensorZero is an open-source inference gateway and optimization platform for LLM apps and agent systems, focused on high-performance serving, experimentation, routing, and production observability.

Key Features

Unified LLM gateway: access every major LLM provider through a single API with <1ms p99 latency overhead at 10k+ QPS
Production observability: store inferences and feedback in your own database with full UI and programmatic access
Automated optimization (Autopilot): analyzes observability data, sets up evals, optimizes prompts, and runs A/B tests automatically
Evaluation framework: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, and custom metrics
Experimentation built-in: A/B testing, routing, fallbacks, retries, load balancing, and granular rate limits for production reliability
OpenAI SDK compatible: integrate with one line change — works with existing OpenAI, OpenTelemetry, and provider SDKs

Use Cases

💡 Unified API gateway replacing per-provider integrations with centralized routing and fallbacks

💡 Automated prompt optimization and A/B testing for LLM agents and chatbots in production

💡 Cost tracking, usage monitoring, and custom rate limits across teams and use cases

💡 Performance benchmarking of prompt changes and model swaps with built-in evaluation pipelines

Quick Start

Deploy TensorZero Gateway via Docker (`docker compose up`), point your OpenAI-compatible client to `http://localhost:3000/openai/v1`, and change the model name to `tensorzero::model_name::provider::model`. Full setup takes ~5 minutes per the quickstart guide.

Visit GitHub Visit Website View Docs

TensorZero

Description

Key Features

Use Cases

Tags

Categories

Quick Start

Related Projects

TensorZero

DeepEval

Harbor

Plano