Mind2Web
StaleDescription
The first LLM-based web agent and benchmark for generalist web agents, providing datasets, evaluation frameworks and baseline methods for building agents that operate on real websites.
The first LLM-based web agent and benchmark for generalist web agents, providing datasets, evaluation frameworks and baseline methods for building agents that operate on real websites.
An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.
WebArena is a realistic benchmark environment for evaluating autonomous web agents. It provides Gym-like interactive website simulations covering e-commerce, forums, CMS, and more, enabling end-to-end task evaluation as a standard framework for web agent research.
A research project exploring how models understand web interfaces, decompose action steps, and complete complex online tasks through browser agent capabilities.
A system for generalist web agents that autonomously carry out tasks on any given website, leveraging large multimodal models like GPT-4V.