Who leads the CodeContests benchmark?

GPT-4 + AlphaCodium currently leads CodeContests with a score of 44 on pass@1.

What is the state-of-the-art score on CodeContests?

The state-of-the-art result on CodeContests is 44 (pass@1), achieved by GPT-4 + AlphaCodium as of 2024.

How many models are tracked on CodeContests?

Codesota tracks 3 models on CodeContests.

When was the CodeContests leaderboard last updated?

The CodeContests leaderboard on Codesota includes results through 2024.

Codesota · Benchmark · CodeContestsHome/Leaderboards/Code & Software Engineering/Code Generation/CodeContests

Unknown

CodeContests.

Name: CodeContests Benchmark Results
Creator: Unknown
Published: 2024-01-01
License: https://creativecommons.org/licenses/by/4.0/

13,610 competitive programming problems from CodeForces. ~200 private test cases per problem. 12+ programming languages.

Paper ↗Leaderboard ↓Lineage

§ 01 · Leaderboard

Results by metric.

Only 3 models on this benchmark

Help build the community leaderboard — submit your model results.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

pass@1

Pass@1 is the reported evaluation metric for CodeContests. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for pass@1verifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	GPT-4 + AlphaCodium GPT-4 with AlphaCodium multi-stage flow. pass@5 44% on CodeContests validation set. COLM 2024.	verified	44	2024	Source ↗	Looks wrong?
02	AlphaCode 2 AlphaCode 2 (Google DeepMind, Dec 2023). 43% solve rate within 10 samples on 77 recent Codeforces problems. Note: this is pass@10 not strict pass@1.	verified	43	2024	Source ↗	Looks wrong?
03	GPT-4 GPT-4 direct prompting. pass@1 19% on CodeContests validation set. From AlphaCodium paper (COLM 2024), Table 1.	verified	19	2024	Source ↗	Looks wrong?