SWE-bench

Normal

Description

SWE-bench is a benchmark for evaluating language models on real-world GitHub issue resolution, featuring genuine problems from popular Python repositories, now a core standard for measuring AI coding agent capabilities.

Related Projects

Augment SWE-bench Agent

873 · Python

Stale

Augment SWE-bench Agent is the number one open-source SWE-bench Verified implementation, demonstrating how to build high-performance software engineering agents to automatically resolve GitHub issues.

codingpythonagent +2

AutoCodeRover

3.1k · Python

Stale

AutoCodeRover is a project structure-aware autonomous software engineer agent that achieves automated program repair and issue resolution by understanding the overall codebase architecture.

codingpythonagent +2

Micro Agent

4.3k · TypeScript

Stale

An AI agent that writes actually useful code for you by writing tests first, then generating code to pass them.

typescriptcodingagent +2

DeepCode

15.8k · Python

Active

DeepCode is an open agentic coding platform supporting Paper2Code, Text2Web, and Text2Backend, leveraging agent technology for automated software development workflows.

codingpythonllm +2

SWE-bench

Description

Tags

Categories

Related Projects

Augment SWE-bench Agent

AutoCodeRover

Micro Agent

DeepCode