Braintrust
ActiveDescription
Braintrust is an evaluation and observability platform for AI applications, providing experiment tracking, scoring, prompt management, and production monitoring for LLM-powered systems.
Key Features
- Experiment tracking and comparison — record LLM inputs, outputs, params, and results for version comparison
- Auto and human scoring — supports LLM-as-judge, manual labeling, and custom evaluators
- Dataset management with versioning and reusability
- Prompt management with version control and A/B experimentation
- Production monitoring — track latency, error rates, and quality metrics of online LLM calls
- SDK ecosystem — Python/JS/TS SDKs with deep integrations for LangChain, LlamaIndex, and Vercel AI SDK
Use Cases
Categories
Quick Start
pip install braintrust
import braintrust
from braintrust import Eval
Eval("my-eval", data=lambda: [...], task=lambda x: openai_call(x), scores=[...]).run()
# Or stream production traces via the Braintrust proxy.