Codesota · Benchmark · CodeContestsHome/Leaderboards/Code & Software Engineering/Code Generation/CodeContests
Unknown

CodeContests.

13,610 competitive programming problems from CodeForces. ~200 private test cases per problem. 12+ programming languages.

Paper Leaderboard Lineage
§ 01 · Leaderboard

Results by metric.

Only 3 models on this benchmark
Help build the community leaderboard — submit your model results.
Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

pass@1

Pass@1 is the reported evaluation metric for CodeContests. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for pass@1verifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

RankModelTrustScoreYearLinksFix
01GPT-4 + AlphaCodium
GPT-4 with AlphaCodium multi-stage flow. pass@5 44% on CodeContests validation set. COLM 2024.
verified442024Source ↗Looks wrong?
02AlphaCode 2
AlphaCode 2 (Google DeepMind, Dec 2023). 43% solve rate within 10 samples on 77 recent Codeforces problems. Note: this is pass@10 not strict pass@1.
verified432024Source ↗Looks wrong?
03GPT-4
GPT-4 direct prompting. pass@1 19% on CodeContests validation set. From AlphaCodium paper (COLM 2024), Table 1.
verified192024Source ↗Looks wrong?
Lineage

CodeContests in context.

See full coding benchmarks lineage →
This benchmark (1)
active2022-02
CodeContests
None yet — this is the current frontier.
§ 04 · Submit a result

Add to the leaderboard.

← Back to Code Generation