HolmesGPT
ActiveDescription
A CNCF Sandbox SRE Agent that automatically analyzes infrastructure logs and metrics to assist with incident diagnosis and system operations.
Key Features
- Agentic loop for querying live observability data and identifying root causes
- Deep integrations with Prometheus, Grafana, Datadog, Kubernetes, and more
- Operator mode for 24/7 background monitoring with Slack alerts and auto-PRs
- Bidirectional alert integration with AlertManager, PagerDuty, OpsGenie, Jira
- Petabyte-scale data handling with server-side filtering and memory-safe execution
- Supports any LLM provider: OpenAI, Anthropic, Azure, Bedrock, Gemini
Use Cases
Tags
Categories
Quick Start
1. Install: pip install holmesgpt
2. Configure data sources in config.yaml (Kubernetes, Prometheus, etc.)
3. Set your LLM API key (e.g. OPENAI_API_KEY)
4. Run: holmes investigate "Why is my service unhealthy?"
5. Holmes will query connected data sources and report root causes