AIPex
AI browser automation assistant as a Chrome extension, privacy-first with MCP support, alternative to Claude Chrome and Manus Browser Operator
Browser and web automation agents
AI browser automation assistant as a Chrome extension, privacy-first with MCP support, alternative to Claude Chrome and Manus Browser Operator
MCP server providing Chrome DevTools capabilities to coding agents, enabling web debugging, performance analysis, and DOM manipulation automation.
An adaptive web scraping framework that intelligently handles anti-bot measures, from single requests to full-scale crawls, designed for AI agent data collection.
Fully local Manus AI alternative that autonomously browses the web, writes code, and interacts via voice, with no API costs
Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider, using Stream's edge network for ultra-low latency realtime interactions.
Open-sourced computer use agents that can operate on cross-platform environments including Windows, macOS, Ubuntu, and Android. ICLR 2026 Oral paper project.
Give your AI agent eyes to see the entire internet. Read and search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu with one CLI and zero API fees.
A Claude Skill that gives your AI coding agent the ability to use a web browser for browser automation.
A lightweight AI browser automation agent framework providing a clean API for building web interaction automation tools.
Open-source Computer-Use-Agent that automates GUI interactions through natural language instructions, enabling intelligent desktop automation.
Browser automation tool for AI agents and humans, providing high-performance web interaction capabilities built in Go
The first open-source Artificial Narrow Intelligence generalist agent that fully operates GUIs using only natural language. Uses Visualization-of-Thought and Chain-of-Thought reasoning for spatial perception and HID simulation.
The Zero-Server Code Intelligence Engine — a client-side knowledge graph creator running entirely in your browser with a built-in Graph RAG Agent for code exploration.
Let your AI agent use your browser. Actionbook makes browser automation actually work through natural language instructions.
Next generation agentic proxy for AI agents and MCP servers. Provides unified traffic management, routing, and security control.
Page Agent is a JavaScript in-page GUI agent by Alibaba that controls web interfaces with natural language, enabling automated form filling, page navigation, and element interaction.
A browser runtime and control platform for AI agents, providing programmatic access to web sessions, page interactions, and automation workflows.
Powerful MCP server providing all-in-one public web access for AI agents with web scraping and structured data extraction.
Browser Use Agent SDK is an agent SDK provided by the browser-use team, offering a toolkit for building browser automation agents, enabling developers to quickly create web-interacting AI agents.
Browser Harness | Self-healing harness that enables LLMs to complete any task.
browser-use enables browser automation for agents, allowing LLMs to understand pages and perform complex web interactions.
Automated QA testing MCP tool using Browser-Use agents, leveraging AI agents for browser-based automated quality assurance testing.
A web interface for running AI agents in the browser, providing a visual experience for browser automation operations.
An automation workflow project in the browser-use ecosystem that enables AI agents to operate browsers and complete multi-step web tasks.
Browserable is a self-hostable browser automation tool purpose-built for AI agents. It provides secure Docker-based browser environments with a JavaScript SDK, achieving 90.4% accuracy on the Web Voyager benchmark for autonomous web navigation.
Browserbase MCP server allows LLMs to control a browser with Browserbase and Stagehand, providing cloud-based browser automation capabilities for AI agents including web interaction, data scraping, and automated testing.
An open-source template for building web agents with Stagehand on Browserbase, providing serverless browser automation for AI agents to safely execute web tasks in the cloud.
The SDK for browser agents by Browserbase. Provides act, extract, and observe primitives for AI agents to naturally browse and interact with web pages.
Deploy headless browsers in Docker. Run on cloud or bring your own infrastructure. Provides powerful web automation and rendering capabilities for AI agents. Free for non-commercial uses.
BrowserMCP is a browser extension-based MCP server that allows AI applications like Claude and Cursor to directly control and automate your browser.
The open-source Agentic browser that transforms your browser into an AI-powered operating system. Alternative to ChatGPT Atlas, Perplexity Comet, and Dia.
BrowserWing turns browser actions into MCP commands or Claude Skills, allowing AI agents to control browsers efficiently and reliably with reduced dependency on heavy LLM interactions.
ByteDance's open-source multimodal AI agent stack connecting cutting-edge AI models with agent infrastructure for GUI automation and computer control.
Open-source AI sandbox infrastructure for code execution, browser use, and AI agent runtimes.
State of the Art 82% OSWorld Verified Computer Using Agent, fully open-source, safe, auditable, and production-ready for desktop automation.
Windows MCP is an MCP server for the Windows desktop, providing AI agents with computer-use capabilities for desktop automation and system operations.
A Python SDK for AI browser automation that enables models to locate elements, perform web actions, and extract structured data from web pages.
DO Browser is a browser-task agent tool focused on page understanding, action planning, and automation, serving as a lighter alternative to browser-use or Stagehand.
AI-powered research assistant that performs iterative deep research on any topic by combining search engines, web scraping, and LLMs
AI computer use powered by open source LLMs and E2B Desktop Sandbox.
An MCP server and CLI that turns the browser into an API, allowing AI agents to control Chrome with existing login sessions for web operations, data scraping, and automation tasks without re-authentication.
Playwright Model Context Protocol server for automating browsers and APIs in Claude Desktop, Cline, Cursor IDE and other AI coding tools
Firecrawl is a web scraping and search engine designed for AI agents, converting any webpage into structured Markdown data with search, scrape, and clean capabilities for building web-data-powered AI applications.
Open-source web data agent optimized for structured web research, capable of autonomously browsing websites and extracting structured data.
A research project exploring how models understand web interfaces, decompose action steps, and complete complex online tasks through browser agent capabilities.
AI-powered PPT generation tool that creates natively editable PPTX from any document, producing real PowerPoint shapes instead of images.
HyperAgent is a Playwright-based AI browser automation framework offering high-level APIs like page.ai(), page.perform(), and page.extract(). It features built-in MCP client support and action caching, enabling AI agents to browse, interact, and extract data using natural language.
Camofox Browser is a headless browser automation server powered by Camoufox, a Firefox fork with C++-level fingerprint spoofing. It bypasses Google, Cloudflare, and most bot detection, providing token-efficient accessibility snapshots and stable element references for AI agents.
LaVague is a Large Action Model (LAM) framework for developing AI web agents, combining RAG techniques for natural-language-driven browser automation.
A lightweight browser runtime designed for automation and scraping scenarios, offering lower overhead than traditional browsers for headless tasks.
The SOTA open-source browser agent for autonomously performing complex tasks on the web with natural language-driven web automation.
An MCP-native browser agent that gives AI systems a real browser for web tasks while keeping a human in the loop.
An open-source, vision-first browser agent that drives web automation through visual understanding, supporting complex web interaction tasks for QA testing and workflow automation.
Playwright for Windows desktop automation, enabling AI agents to control desktop applications through natural language
UFO is a Windows GUI automation agent by Microsoft that understands screen interfaces and executes complex OS tasks through natural language commands.
A simple SWE style browser agent framework that achieves SOTA results on long horizon web tasks.
Microsoft's open-source browser and web task agent that uses large models to understand pages, plan actions, and complete real web workflows.
A research prototype of a human-centered web agent from Microsoft Research, emphasizing human-in-the-loop interaction for collaborative web browsing and data collection tasks.
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
An autonomous web browser QA agent that evaluates performance, functionality, and user experience through GUI or CLI workflows.
Framework enabling AI agents to use real Android and iOS apps just like a human, supporting autonomous operation and interaction with mobile interfaces.
NanoBrowser is an open-source Chrome extension for AI-powered multi-agent browser automation, supporting web task workflows with your own LLM API key.
Notte is a framework for building web agents and deploying serverless browser automation functions, providing reliable browser infrastructure and web-aware agent capabilities.
AI-powered autonomous web browsing framework that enables agents to click, type, navigate, and extract data like a human, with support for OpenAI, Anthropic, and Google models.
OpenAdapt is an open-source agent tool for desktop automation and computer-use scenarios, capturing user interactions, replaying tasks, and enabling GUI automation workflows.
Official sample application for OpenAI Computer Using Agent (CUA). Learn how to use CUA via the API on multiple computer environments.
The first LLM-based web agent and benchmark for generalist web agents, providing datasets, evaluation frameworks and baseline methods for building agents that operate on real websites.
A system for generalist web agents that autonomously carry out tasks on any given website, leveraging large multimodal models like GPT-4V.
Oxylabs AI Studio Python SDK provides an all-in-one AI-powered web scraping toolkit integrating an AI scraper, crawler, browser agent, search engine, and sitemap tool for structured data extraction driven by natural language instructions.
An advanced browser AI tool developed by Oxylabs AI Studio that automates real user browsing tasks using natural language instructions.
Give AI agents access to your live Chrome session. Works out of the box, connects to tabs you already have open.
Open-source framework for building browser agents for real-world tasks, learning from user demonstrations to automate web interactions.
A self-hosted AI chat platform with a web UI and terminal CLI, supporting any model, web search, browser-agent automation, persistent memory, and analytics.
Anti-detection patches for Playwright and browser automation scenarios, helping automated browsers appear more like real user sessions.
Chrome extension & CLI to let agents control your browser. Runs Playwright snippets in a stateful sandbox. Available as CLI or MCP.
Open source AI Agent evaluation framework for web tasks to measure and compare AI agent performance on web operations.
An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.
A curated list of papers and resources for multi-modal Graphical User Interface agents, systematically covering computer use, mobile interaction and more.
Out-of-the-box (OOTB) GUI Agent for Windows and macOS.
ShowUI is an open-source, end-to-end Vision-Language-Action model for GUI agents and computer use, capable of understanding screenshots and executing precise interface interactions.
Open-source agentic framework that uses computers like a human, capable of completing complex GUI tasks with autonomous learning and experience accumulation.
Skyvern is an agent platform for browser task automation, using page understanding and action planning to complete complex web workflows such as forms and back-office tasks.
A project combining browser-use agent control with Steel's cloud browser infrastructure for scalable web automation.
Steel Browser is an open-source browser sandbox purpose-built for AI agents and applications. It provides a full browser API with session management, proxy integration, and built-in anti-detection, enabling web automation without infrastructure headaches.
Desktop app to control your computer with AI using your terminal, browser, mouse & keyboard.
AppAgent is an LLM-based multimodal agent framework designed to operate smartphone apps like a human, supporting touch interaction and autonomous exploration.
The first open-source testing agent that enables UI, API, security, accessibility, and visual validations without writing code or maintaining tests
Next-generation personal AI assistant powered by LLM, RAG and agent loops, supporting computer-use, browser-use and coding agent capabilities.
A suite of tools for connecting AI to the web with a query language and Playwright integrations for precise, scalable web element interaction and data extraction.
CUA provides open-source infrastructure for Computer-Use Agents, including sandboxes, SDKs, and benchmarks to train and evaluate AI agents that control full desktops (macOS, Linux, Windows).
An open-source browser automation CLI for AI agents by Vercel, built with Rust for high performance and programmability.
WebArena is a realistic benchmark environment for evaluating autonomous web agents. It provides Gym-like interactive website simulations covering e-commerce, forums, CMS, and more, enabling end-to-end task evaluation as a standard framework for web agent research.
AI-powered vision-driven UI automation that lets you describe actions in natural language instead of writing selectors, supporting browser and mobile platforms
A macOS browser agent that completes web tasks through autonomous execution, chat-based clarification, and resumable local workflows.
Open Foundations for Computer-Use Agents. Provides datasets, benchmarks, and foundation models for training and evaluating AI agents that control desktop environments.
An AI-driven local automation assistant like Manus, a computer use agent that uses natural language to make computers work autonomously.
Open-AutoGLM is an open phone agent model and framework enabling AI to autonomously operate smartphone interfaces, unlocking the AI Phone experience for everyone.
A deep architectural comparison of seven open-source coding agents across three paradigms — CLI-first, IDE-integrated, and fully autonomous — examining context management, tool access, and autonomy levels to help you pick the right tool for each development scenario.
Breaking down three abstraction layers for browser automation—from raw Playwright to structured extraction—with production patterns, runnable code, and common pitfalls.
A practical breakdown of browser-use strengths and limits in web task automation, with strategies for stable execution and failure recovery.