13,610 competitive programming problems from CodeForces. ~200 private test cases per problem. 12+ programming languages.
Pass@1 is the reported evaluation metric for CodeContests. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Muted rows were not state of the art when published — an earlier or same-year result already scored better.
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | GPT-4 + AlphaCodium | verified | 44 | 2024 | Source ↗ | Looks wrong? |
| 02 | AlphaCode 2 | verified | 43 | 2024 | Source ↗ | Looks wrong? |
| 03 | GPT-4 | verified | 19 | 2024 | Source ↗ | Looks wrong? |