Windows Agent Arena
NormalDescription
Windows Agent Arena (WAA) πͺ is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
Windows Agent Arena (WAA) πͺ is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
Open source AI Agent evaluation framework for web tasks to measure and compare AI agent performance on web operations.
An autonomous web browser QA agent that evaluates performance, functionality, and user experience through GUI or CLI workflows.
A framework for few-shot evaluation of language models by EleutherAI, providing standardized evaluation pipelines supporting hundreds of benchmark tasks and widely adopted as a core LLM evaluation tool in the community.
Windows MCP is an MCP server for the Windows desktop, providing AI agents with computer-use capabilities for desktop automation and system operations.