SWE-bench
NormalDescription
SWE-bench is a benchmark for evaluating language models on real-world GitHub issue resolution, featuring genuine problems from popular Python repositories, now a core standard for measuring AI coding agent capabilities.
SWE-bench is a benchmark for evaluating language models on real-world GitHub issue resolution, featuring genuine problems from popular Python repositories, now a core standard for measuring AI coding agent capabilities.
Augment SWE-bench Agent is the number one open-source SWE-bench Verified implementation, demonstrating how to build high-performance software engineering agents to automatically resolve GitHub issues.
AutoCodeRover is a project structure-aware autonomous software engineer agent that achieves automated program repair and issue resolution by understanding the overall codebase architecture.
An AI agent that writes actually useful code for you by writing tests first, then generating code to pass them.
DeepCode is an open agentic coding platform supporting Paper2Code, Text2Web, and Text2Backend, leveraging agent technology for automated software development workflows.