TensorZero

Active
GitHub Rust Apache-2.0

Description

TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and A/B testing, designed for production agents.

Key Features

  • Unified LLM gateway - one API for Anthropic, OpenAI, Bedrock, Gemini, vLLM, and 20+ providers
  • Sub-1ms p99 overhead at 10k+ QPS - Rust core built for production-grade throughput
  • Inference and feedback storage - own your data in your own database
  • OpenTelemetry export - feed OTLP traces and Prometheus metrics into your existing stack
  • Built-in A/B testing, routing, retries, and fallbacks for confident rollouts
  • Optimization flywheel - SFT, RLHF, MIPRO, and GEPA turn production data into better models

Use Cases

💡 Routing a fleet of AI agents through one OpenAI-compatible gateway
💡 Debugging and replaying LLM calls in production with full OpenTelemetry traces
💡 Running LLM-as-a-judge evals on production data to detect regressions early
💡 A/B testing two prompt variants with adaptive statistical routing
💡 Fine-tuning models with SFT/RLHF on captured production data and feedback

Quick Start

# 1. Deploy the TensorZero Gateway (one Docker container)
docker run -d -p 3000:3000 \
  -e TENSORZERO_CLICKHOUSE_URL=http://clickhouse:8123 \
  -e TENSORZERO_POSTGRES_URL=postgresql://user:pass@postgres:5432/db \
  tensorzero/gateway

# 2. Point your OpenAI client at the gateway
from openai import OpenAI
client = OpenAI(base_url="http://localhost:3000/openai/v1", api_key="not-used")

response = client.chat.completions.create(
  model="tensorzero::model_name::anthropic::claude-sonnet-4-6",
  messages=[{"role": "user", "content": "Share a fun fact about TensorZero."}],
)
print(response.choices[0].message.content)

Related Projects