SeeAct
StaleDescription
A system for generalist web agents that autonomously carry out tasks on any given website, leveraging large multimodal models like GPT-4V.
A system for generalist web agents that autonomously carry out tasks on any given website, leveraging large multimodal models like GPT-4V.
The first LLM-based web agent and benchmark for generalist web agents, providing datasets, evaluation frameworks and baseline methods for building agents that operate on real websites.
AppAgent is an LLM-based multimodal agent framework designed to operate smartphone apps like a human, supporting touch interaction and autonomous exploration.
Fully local Manus AI alternative that autonomously browses the web, writes code, and interacts via voice, with no API costs
Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider, using Stream's edge network for ultra-low latency realtime interactions.