AgentLab

Normal

GitHub Python NOASSERTION

Description

An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.

Related Projects

Mind2Web

988 · Jupyter Notebook

Stale

The first LLM-based web agent and benchmark for generalist web agents, providing datasets, evaluation frameworks and baseline methods for building agents that operate on real websites.

web-agentbenchmarkllm +2

WebArena

1.5k · Python

Stale

WebArena is a realistic benchmark environment for evaluating autonomous web agents. It provides Gym-like interactive website simulations covering e-commerce, forums, CMS, and more, enabling end-to-end task evaluation as a standard framework for web agent research.

benchmarkweb-agentevaluation +3

Cappuccino

44 · Python

Normal

A research project exploring how models understand web interfaces, decompose action steps, and complete complex online tasks through browser agent capabilities.

web-agentbrowser-automationbenchmark

AWS Agent Evaluation

360 · Python

Stale

Amazon's AI agent evaluation tool for automated quality assessment of Bedrock Agents and other LLM agents with multi-dimensional metrics and benchmarks.

awsevaluationbenchmark +2

AgentLab

Description

Tags

Categories

Related Projects

Mind2Web

WebArena

Cappuccino

AWS Agent Evaluation