SwanLab

Active
GitHub Python Apache-2.0

Description

An open-source, modern-design AI training tracking and visualization tool. Supports PyTorch, Transformers and more. Monitor and evaluate AI agent training processes.

Key Features

  • Seamless integration with 50+ mainstream frameworks: native support for PyTorch, Transformers, HuggingFace Accelerate, PaddleNLP, NVIDIA NeMo RL and more, with two lines of code to connect training pipelines
  • Rich visualization system: supports line charts, scalar plots, PR curves, ROC curves, confusion matrices, 3D point clouds, molecular structures, ECharts custom charts and 20+ chart types
  • Multi-dimensional hardware monitoring: real-time monitoring of GPU (NVIDIA/AMD ROCm/Hygon DCU/Cambricon MLU/Moore Threads/Muxi/Iluvatar/Kunlun), disk utilization, network traffic and other hardware metrics
  • LightningBoard dashboard: high-performance dashboard built for massive chart volumes, supporting chart grouping, local zoom, relative time display, and regex search
  • Flexible self-hosted deployment: one-click Docker and Kubernetes deployment, plus online cloud version, with full data ownership
  • Experiment collaboration and management: supports project pinning, experiment grouping, experiment copying, baseline comparison, parallel mode recording, and collaborator invitations

Use Cases

💡 LLM training process monitoring: real-time tracking of loss, learning rate, gradients and other key metrics across pre-training, SFT, and RLHF stages, with early anomaly detection
💡 AI experiment comparison and hyperparameter tuning: quickly evaluate the impact of different hyperparameter combinations on model performance through baseline comparison and multi-experiment grouping
💡 Team training project management: multi-organization, multi-project management with experiment tags, pinning, filtering and sorting for collaborative training progress tracking
💡 Training automation alerts: webhook integration with Slack, Discord, Feishu/Lark, email and other notification channels, automatic alerting on training completion or anomalies
💡 Model evaluation and visualization reports: integrates with EvalScope and other evaluation frameworks to visualize model evaluation results and generate GitHub training project badges

Quick Start

Install: pip install swanlab; add to training code: import swanlab; swanlab.init(project="my-project"); swanlab.log({"loss": loss.item()}); run training and visit swanlab.cn to view the visualization dashboard

Related Projects