Code Generation2024en
LiveCodeBench
Contamination-free coding benchmark collecting new problems from LeetCode, AtCoder, and CodeForces after model knowledge cutoffs. Updated continuously with fresh problems. Primary metric is pass@1 on the full test set.
Current State of the Art
DeepSeek R1-0528
DeepSeek
73.3
pass@1
pass@1 Progress Over Time
Showing 4 breakthroughs from Sep 2024 to May 2025
Key Milestones
Total Improvement
133.4%
Time Span
9m
Breakthroughs
4
Current SOTA
73.3
Top Models Performance Comparison
Top 10 models ranked by pass@1
Best Score
73.3
Top Model
DeepSeek R1-0528
Models Compared
10
Score Range
15.5
pass@1Primary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | DeepSeek R1-0528Open Source DeepSeek | 73.3 | May 2025 | |
| 2 | o4-miniAPI OpenAI | 72.8 | Mar 2024 | |
| 3 | Qwen3-235B-A22B Alibaba | 70.7 | May 2025 | |
| 4 | o3-miniAPI OpenAI | 66.9 | Mar 2024 | |
| 5 | DeepSeek-R1Open Source DeepSeek | 65.9 | Jan 2025 | |
| 6 | o3API OpenAI | 65.3 | Mar 2024 | |
| 7 | DeepSeek-R1-Distill-Llama-70BOpen Source DeepSeek | 65.2 | Jan 2025 | |
| 8 | Kimi k1.5API Moonshot AI | 62.5 | Jan 2025 | |
| 9 | DeepSeek-R1-Distill-Qwen-32BOpen Source DeepSeek | 62.1 | Jan 2025 | |
| 10 | Claude Opus 4API Anthropic | 57.8 | Mar 2024 | |
| 11 | GPT-4.1API OpenAI | 54.4 | Mar 2024 | |
| 12 | Claude Sonnet 4API Anthropic | 52.8 | Mar 2024 | |
| 13 | DeepSeek V3Open Source DeepSeek | 49.2 | Mar 2024 | |
| 14 | DeepSeek-V3-0324Open Source DeepSeek | 49.2 | Mar 2025 | |
| 15 | Qwen2.5-Coder-32B-InstructOpen Source Alibaba | 47.8 | Mar 2024 | |
| 16 | DeepSeek-Coder-V2-InstructOpen Source DeepSeek | 43.4 | Mar 2024 | |
| 17 | Llama 4 MaverickOpen Source Meta | 43.4 | Apr 2025 | |
| 18 | GPT-4oAPI OpenAI | 40.8 | Mar 2024 | |
| 19 | DeepSeek V3Open Source DeepSeek | 40.5 | Dec 2024 | |
| 20 | Gemma 3 27B IT Google DeepMind | 39 | Mar 2025 | |
| 21 | Llama 4 ScoutOpen Source Meta | 32.8 | Apr 2025 | |
| 22 | Gemma 3 12B IT Google DeepMind | 32 | Mar 2025 | |
| 23 | Qwen2.5-Coder-32B-InstructOpen Source Alibaba | 31.4 | Nov 2024 | |
| 24 | Codestral 22B Mistral | 29.5 | Mar 2024 | |
| 25 | Gemma 3 4B IT Google DeepMind | 23 | Mar 2025 |
Related Papers1
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Mar 2024Models: o4-mini, o3-mini, o3 +8 more