SwanLab

Active

GitHub Python Apache-2.0

Description

An open-source, modern-design AI training tracking and visualization tool. Supports PyTorch, Transformers and more. Monitor and evaluate AI agent training processes.

Key Features

Seamless integration with 50+ mainstream frameworks: native support for PyTorch, Transformers, HuggingFace Accelerate, PaddleNLP, NVIDIA NeMo RL and more, with two lines of code to connect training pipelines
Rich visualization system: supports line charts, scalar plots, PR curves, ROC curves, confusion matrices, 3D point clouds, molecular structures, ECharts custom charts and 20+ chart types
Multi-dimensional hardware monitoring: real-time monitoring of GPU (NVIDIA/AMD ROCm/Hygon DCU/Cambricon MLU/Moore Threads/Muxi/Iluvatar/Kunlun), disk utilization, network traffic and other hardware metrics
LightningBoard dashboard: high-performance dashboard built for massive chart volumes, supporting chart grouping, local zoom, relative time display, and regex search
Flexible self-hosted deployment: one-click Docker and Kubernetes deployment, plus online cloud version, with full data ownership
Experiment collaboration and management: supports project pinning, experiment grouping, experiment copying, baseline comparison, parallel mode recording, and collaborator invitations

Use Cases

💡 LLM training process monitoring: real-time tracking of loss, learning rate, gradients and other key metrics across pre-training, SFT, and RLHF stages, with early anomaly detection

💡 AI experiment comparison and hyperparameter tuning: quickly evaluate the impact of different hyperparameter combinations on model performance through baseline comparison and multi-experiment grouping

💡 Team training project management: multi-organization, multi-project management with experiment tags, pinning, filtering and sorting for collaborative training progress tracking

💡 Training automation alerts: webhook integration with Slack, Discord, Feishu/Lark, email and other notification channels, automatic alerting on training completion or anomalies

💡 Model evaluation and visualization reports: integrates with EvalScope and other evaluation frameworks to visualize model evaluation results and generate GitHub training project badges

Quick Start

Install: pip install swanlab; add to training code: import swanlab; swanlab.init(project="my-project"); swanlab.log({"loss": loss.item()}); run training and visit swanlab.cn to view the visualization dashboard

Visit GitHub

SwanLab

Description

Key Features

Use Cases

Tags

Categories

Quick Start

Related Projects

OpenInference

Purple Llama

Weave

Agents Towards Production