Multi-SWE-bench
StaleDescription
A multilingual benchmark for issue resolving. Extends SWE-bench to multiple programming languages for evaluating AI agent capabilities across diverse codebases.
A multilingual benchmark for issue resolving. Extends SWE-bench to multiple programming languages for evaluating AI agent capabilities across diverse codebases.
Scaling data for SWE-agents (NeurIPS 2025 D&B Spotlight). A toolkit for automatically generating large-scale training datasets for software engineering agents.
SWE-bench is a benchmark for evaluating language models on real-world GitHub issue resolution, featuring genuine problems from popular Python repositories, now a core standard for measuring AI coding agent capabilities.
Augment SWE-bench Agent is the number one open-source SWE-bench Verified implementation, demonstrating how to build high-performance software engineering agents to automatically resolve GitHub issues.
An open-source asynchronous coding agent by LangChain built on LangGraph, autonomously handling software engineering tasks including code generation, debugging, and file editing.