Benchmark Stats
SOTA History
pass@1
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | DeepSeek-R1-0528 LCB window Aug 2024–May 2025 (pass@1-COT) | Community | 73.3 | 2025 | Source |
| 2 | Qwen3-235B-A22B Qwen3 tech report, LCB v5 | Community | 70.7 | 2025 | Source |
| 3 | DeepSeek-R1 LCB window Aug 2024–Jan 2025 (pass@1-COT) | Community | 65.9 | 2025 | Source |
| 4 | DeepSeek-R1-Distill-Llama-70B DeepSeek-R1 tech report Table 2 | Community | 65.2 | 2025 | Source |
| 5 | OpenAI o1 (Dec 2024) LCB window Aug 2024–Jan 2025; from DeepSeek-R1 paper | Community | 63.4 | 2024 | Source |
| 6 | Kimi k1.5 (long-CoT) Kimi k1.5 tech report Table 2, long-CoT setting | Community | 62.5 | 2025 | Source |
| 7 | DeepSeek-R1-Distill-Qwen-32B DeepSeek-R1 tech report Table 2 | Community | 62.1 | 2025 | Source |
| 8 | DeepSeek-R1-Distill-Qwen-14B DeepSeek-R1 tech report Table 2 | Community | 59.1 | 2025 | Source |
| 9 | o1-mini LCB window Aug 2024–Jan 2025; from DeepSeek-R1 paper | Community | 53.8 | 2024 | Source |
| 10 | DeepSeek-V3-0324 Improvement of +10 over DeepSeek-V3 (39.2 → 49.2) | Community | 49.2 | 2025 | Source |
| 11 | DeepSeek-R1-Distill-Qwen-7B DeepSeek-R1 tech report Table 2 | Community | 49.1 | 2025 | Source |
| 12 | DeepSeek-R1-Distill-Llama-8B DeepSeek-R1 tech report Table 2 | Community | 49 | 2025 | Source |
| 13 | Kimi k1.5 (short-CoT) Kimi k1.5 tech report Table 3, short-CoT setting | Community | 47.3 | 2025 | Source |
| 14 | Llama 4 Maverick (17B-128E) LCB window Oct 2024–Feb 2025 | Community | 43.4 | 2025 | Source |
| 15 | DeepSeek-V3 LCB window Aug 2024–Jan 2025, pass@1-COT | Community | 40.5 | 2024 | Source |
| 16 | Gemma 3 27B IT Gemma 3 tech report | Community | 39 | 2025 | Source |
| 17 | Claude 3.5 Sonnet LCB window Aug 2024–Jan 2025; from DeepSeek-R1 paper | Community | 38.9 | 2024 | Source |
| 18 | GPT-4o LCB window Aug 2024–Jan 2025; from DeepSeek-R1 paper | Community | 32.9 | 2024 | Source |
| 19 | Llama 4 Scout (17B-16E) LCB window Oct 2024–Feb 2025 | Community | 32.8 | 2025 | Source |
| 20 | Gemma 3 12B IT Gemma 3 tech report | Community | 32 | 2025 | Source |
| 21 | Qwen2.5-Coder-32B-Instruct Qwen2.5-Coder tech report Table 16 | Community | 31.4 | 2024 | Source |
| 22 | Gemma 3 4B IT Gemma 3 tech report | Community | 23 | 2025 | Source |