Codesota · Benchmark · CodeContestsHome/Leaderboards/Code & Software Engineering/Code Generation/CodeContests
Unknown

CodeContests.

13,610 competitive programming problems from CodeForces. ~200 private test cases per problem. 12+ programming languages.

Paper Leaderboard Lineage
§ 01 · SOTA history

Year over year.

Not enough data to show trend.
§ 02 · Leaderboard

Results by metric.

Only 3 models on this benchmark
Help build the community leaderboard — submit your model results.

pass@1

Pass@1 is the reported evaluation metric for CodeContests. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for pass@1verifiedpapervendorcommunityunverified
RankModelTrustScoreYearSource
01GPT-4 + AlphaCodium
GPT-4 with AlphaCodium multi-stage flow. pass@5 44% on CodeContests validation set. COLM 2024.
verified442024Source ↗
02AlphaCode 2
AlphaCode 2 (Google DeepMind, Dec 2023). 43% solve rate within 10 samples on 77 recent Codeforces problems. Note: this is pass@10 not strict pass@1.
verified432024Source ↗
03GPT-4
GPT-4 direct prompting. pass@1 19% on CodeContests validation set. From AlphaCodium paper (COLM 2024), Table 1.
verified192024Source ↗
Lineage

CodeContests in context.

See full coding benchmarks lineage →
This benchmark (1)
active2022-02
CodeContests
None yet — this is the current frontier.
§ 04 · Submit a result

Add to the leaderboard.

← Back to Code Generation