TensorZero
ActiveDescription
TensorZero is an open-source inference gateway and optimization platform for LLM apps and agent systems, focused on high-performance serving, experimentation, routing, and production observability.
Key Features
- Unified LLM gateway: access every major LLM provider through a single API with <1ms p99 latency overhead at 10k+ QPS
- Production observability: store inferences and feedback in your own database with full UI and programmatic access
- Automated optimization (Autopilot): analyzes observability data, sets up evals, optimizes prompts, and runs A/B tests automatically
- Evaluation framework: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, and custom metrics
- Experimentation built-in: A/B testing, routing, fallbacks, retries, load balancing, and granular rate limits for production reliability
- OpenAI SDK compatible: integrate with one line change — works with existing OpenAI, OpenTelemetry, and provider SDKs
Use Cases
Tags
Categories
Quick Start
Deploy TensorZero Gateway via Docker (`docker compose up`), point your OpenAI-compatible client to `http://localhost:3000/openai/v1`, and change the model name to `tensorzero::model_name::provider::model`. Full setup takes ~5 minutes per the quickstart guide.