TensorZero

Active
GitHub Rust Apache-2.0

Description

TensorZero is an open-source inference gateway and optimization platform for LLM apps and agent systems, focused on high-performance serving, experimentation, routing, and production observability.

Key Features

  • Unified LLM gateway: access every major LLM provider through a single API with <1ms p99 latency overhead at 10k+ QPS
  • Production observability: store inferences and feedback in your own database with full UI and programmatic access
  • Automated optimization (Autopilot): analyzes observability data, sets up evals, optimizes prompts, and runs A/B tests automatically
  • Evaluation framework: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, and custom metrics
  • Experimentation built-in: A/B testing, routing, fallbacks, retries, load balancing, and granular rate limits for production reliability
  • OpenAI SDK compatible: integrate with one line change — works with existing OpenAI, OpenTelemetry, and provider SDKs

Use Cases

💡 Unified API gateway replacing per-provider integrations with centralized routing and fallbacks
💡 Automated prompt optimization and A/B testing for LLM agents and chatbots in production
💡 Cost tracking, usage monitoring, and custom rate limits across teams and use cases
💡 Performance benchmarking of prompt changes and model swaps with built-in evaluation pipelines

Quick Start

Deploy TensorZero Gateway via Docker (`docker compose up`), point your OpenAI-compatible client to `http://localhost:3000/openai/v1`, and change the model name to `tensorzero::model_name::provider::model`. Full setup takes ~5 minutes per the quickstart guide.

Related Projects