browser-use
ActiveDescription
browser-use enables browser automation for agents, allowing LLMs to understand pages and perform complex web interactions.
browser-use enables browser automation for agents, allowing LLMs to understand pages and perform complex web interactions.
Open-sourced computer use agents that can operate on cross-platform environments including Windows, macOS, Ubuntu, and Android. ICLR 2026 Oral paper project.
Open-source Computer-Use-Agent that automates GUI interactions through natural language instructions, enabling intelligent desktop automation.
The first open-source Artificial Narrow Intelligence generalist agent that fully operates GUIs using only natural language. Uses Visualization-of-Thought and Chain-of-Thought reasoning for spatial perception and HID simulation.
AI-powered PPT generation tool that creates natively editable PPTX from any document, producing real PowerPoint shapes instead of images.
Breaking down three abstraction layers for browser automation—from raw Playwright to structured extraction—with production patterns, runnable code, and common pitfalls.
Read more →A practical breakdown of browser-use strengths and limits in web task automation, with strategies for stable execution and failure recovery.
Read more →