Computer Code

Code Generation

Generating code from natural language descriptions (HumanEval, MBPP).

8 datasets49 results

Code Generation is a key task in computer code. Below you will find the standard benchmarks used to evaluate models, along with current state-of-the-art results.

Benchmarks & SOTA

SWE-Bench Verified

SWE-bench Verified Subset

202429 results

500 manually verified GitHub issues confirmed solvable by human engineers. High-quality subset of SWE-bench.

State of the Art

Claude Opus 4.5

Anthropic

80.9

resolve-rate

HumanEval

HumanEval: Hand-Written Evaluation Set

202118 results

164 hand-crafted Python programming problems with function signatures, docstrings, and unit tests. Standard benchmark for code generation.

State of the Art

o4-mini

OpenAI

97.3

pass@1

MBPP

Mostly Basic Python Problems

20212 results

974 crowd-sourced Python programming problems suitable for beginners. Covers programming fundamentals and standard library.

State of the Art

Claude 3.5 Sonnet

Anthropic

89.2

pass@1

HumanEval+

HumanEval+ Extended Version

20230 results

Extended HumanEval with 80x more test cases. Tests code robustness and edge case handling.

No results tracked yet

APPS

Automated Programming Progress Standard

20210 results

10,000 coding problems from Codewars, AtCoder, Kattis, and CodeForces. Ranges from introductory to competition level.

No results tracked yet

MBPP+

MBPP+ Extended Version

20230 results

Extended MBPP with additional test cases. Uses 399 hand-verified problems from MBPP-sanitized.

No results tracked yet

SWE-Bench

SWE-bench: Software Engineering Benchmark

20230 results

2,294 real GitHub issues from popular Python repositories. Tests ability to resolve real-world software engineering tasks.

No results tracked yet

CodeContests

CodeContests Competitive Programming

20220 results

13,610 competitive programming problems from CodeForces. ~200 private test cases per problem. 12+ programming languages.

No results tracked yet

Related Tasks

Code Generation Benchmarks - Computer Code - CodeSOTA | CodeSOTA