Bananalyzer

Normal

GitHub Python MIT

Description

Open source AI Agent evaluation framework for web tasks to measure and compare AI agent performance on web operations.

Related Projects

LM Evaluation Harness

12.3k · Python

Active

A framework for few-shot evaluation of language models by EleutherAI, providing standardized evaluation pipelines supporting hundreds of benchmark tasks and widely adopted as a core LLM evaluation tool in the community.

llm-evaluationbenchmarkevaluation-framework +2

HolmesGPT

2.2k · Python

Active

A CNCF Sandbox SRE Agent that automatically analyzes infrastructure logs and metrics to assist with incident diagnosis and system operations.

observabilitypythonagent +2

SwanLab

3.8k · Python

Active

An open-source, modern-design AI training tracking and visualization tool. Supports PyTorch, Transformers and more. Monitor and evaluate AI agent training processes.

pythonobservabilityevaluation +2

AgentDiff

27 · Python

Active

Interactive sandboxes for AI agent evaluations and reinforcement learning on third-party APIs like Slack, LinkedIn, and more.

agent-evaluationsandboxreinforcement-learning +2

Bananalyzer

Description

Tags

Categories

Related Projects

LM Evaluation Harness

HolmesGPT

SwanLab

AgentDiff