Multi-SWE-bench

Stale

GitHub Python Apache-2.0

Description

A multilingual benchmark for issue resolving. Extends SWE-bench to multiple programming languages for evaluating AI agent capabilities across diverse codebases.

Related Projects

SWE-smith

664 · Python

Active

Scaling data for SWE-agents (NeurIPS 2025 D&B Spotlight). A toolkit for automatically generating large-scale training datasets for software engineering agents.

swe-agenttraining-databenchmark +3

SWE-bench

5.1k · Python

Normal

SWE-bench is a benchmark for evaluating language models on real-world GitHub issue resolution, featuring genuine problems from popular Python repositories, now a core standard for measuring AI coding agent capabilities.

evaluationpythoncoding +2

Augment SWE-bench Agent

873 · Python

Stale

Augment SWE-bench Agent is the number one open-source SWE-bench Verified implementation, demonstrating how to build high-performance software engineering agents to automatically resolve GitHub issues.

codingpythonagent +2

Trae Agent

11.6k · Python

Stale

Trae Agent is an LLM-based agent for general purpose software engineering tasks.

coding-agentsoftware-engineeringllm +1

Multi-SWE-bench

Description

Tags

Categories

Related Projects

SWE-smith

SWE-bench

Augment SWE-bench Agent

Trae Agent