Weights & Biases

Active
GitHub Python MIT

Description

Weights & Biases is an experiment tracking, visualization, and collaboration platform for ML and LLM applications, covering agent training evaluation, hyperparameter management, and model registry workflows.

Key Features

  • Experiment tracking — Automatically log hyperparameters, metrics, system resources, and code versions with side-by-side comparison
  • W&B Models — Provides model artifacts registry, versioning, and promotion to production
  • W&B Weave — LLM and agent tracing tool with prompt evaluation, conversation replay, and quality scoring
  • Sweeps hyperparameter search — Built-in Bayesian and grid search to find the best hyperparameter combinations at scale
  • Team collaboration — Shareable experiment reports and dashboards with comments and access control
  • Reports and dashboards — Drag-and-drop authoring of publishable experiment reports with embedded charts and interactive components

Use Cases

💡 Track agent training and fine-tuning experiments, comparing different models and hyperparameter combinations
💡 Use Weave to record LLM call traces, debug agent decision chains, and evaluate output quality
💡 Manage prompt engineering experiments for agents with prompt versioning and evaluation scores
💡 Share experiment reports and dashboards across teams to standardize agent R&D workflows
💡 Register trained models in W&B Artifacts and publish them to production inference services

Quick Start

pip install wandb
wandb login
import wandb
wandb.init(project='agent-eval', config={'lr': 0.001, 'model': 'claude-sonnet-4-6'})
for step in range(100):
  wandb.log({'loss': 0.1 * step, 'accuracy': 0.9 + 0.001 * step})
wandb.finish()

Related Projects