Codesota · Computer Code · Code Generation · LiveCodeBenchTasks/Computer Code/Code Generation
Code Generation · benchmark dataset · 2024 · EN

LiveCodeBench.

Contamination-free coding benchmark collecting new problems from LeetCode, AtCoder, and CodeForces after model knowledge cutoffs. Updated continuously with fresh problems. Primary metric is pass@1 on the full test set.

Paper Submit a result
§ 01 · Leaderboard

Best published scores.

54 results indexed across 2 metrics. Shaded row marks current SOTA; ties broken by submission date.


Primary
pass@1 · higher is better
All metrics
pass-1, pass@1
pass-1
24 rows
#ModelOrgSubmittedPaper / codepass-1
01DeepSeek-V4-Pro MaxDeepSeekApr 2026pwc-dump · code93.50
02DeepSeek-V4-Flash MaxDeepSeekApr 2026pwc-dump · code91.60
03Kimi K2.6Apr 2026pwc-dump89.60
04DeepSeek-V3.2-SpecialeOpenDeepSeekDec 2025DeepSeek-V3.2: Pushing the Frontier of Open Large Langua…88.70
05Kimi-K2.5OpenMoonshot.AIFeb 2026Kimi K2.5: Visual Agentic Intelligence · code85
06Qwen3.6-27BApr 2026pwc-dump · code83.90
07Qwen3.5-397B-A17BOpenAlibabaFeb 2026pwc-dump · code83.60
08DeepSeek-V3.2OpenDeepSeekDec 2025DeepSeek-V3.2: Pushing the Frontier of Open Large Langua…83.30
09NVIDIA-Nemotron-3-Super-120B-A12B-BF16Dec 2025NVIDIA Nemotron 3: Efficient and Open Intelligence81.19
10Qwen3.6-35B-A3BApr 2026pwc-dump · code80.40
11Gemma 4 31BGoogleApr 2026pwc-dump80
12Intern-S1-ProShanghai AI LabMar 2026Intern-S1-Pro: Scientific Multimodal Foundation Model at…74.30
13Gemini 2.5 ProJul 2025Gemini 2.5: Pushing the Frontier with Advanced Reasoning…74.20
14GLM-4.5OpenZhipu AIAug 2025GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation… · code72.90
15GLM-4.5-AirOpenZhipu AIAug 2025GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation… · code70.70
16Qwen3-235B-A22BOpenAlibabaMay 2025Qwen3 Technical Report · code70.70
17Qwen3-VL-235B-A22B-ThinkingQwenNov 2025Qwen3-VL Technical Report · code70.10
18NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Dec 2025Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybr… · code68.30
19Gemini 2.5 FlashJul 2025Gemini 2.5: Pushing the Frontier with Advanced Reasoning…59.30
20Qwen3-Coder-NextQwenFeb 2026Qwen3-Coder-Next Technical Report · code58.93
21Qwen2.5-72B-InstructDec 2024Qwen2.5 Technical Report · code55.50
22Qwen3-VL-235B-A22B-InstructQwenNov 2025Qwen3-VL Technical Report · code54.30
23Qwen3-VL-8B-InstructQwenNov 2025Qwen3-VL Technical Report · code39.30
24Gemma 3 (27B, IT)Mar 2025Gemma 3 Technical Report · code29.70
pass@1· primary
30 rows
#ModelOrgSubmittedPaper / codepass@1
01Gemini 3 Pro PreviewGoogleMar 2026vendor91.70
02Gemini 3 FlashAPIGoogleMar 2026vendor90.80
03GPT-5OpenAIApr 2026artificial-analysis85
04Grok 4APIxAIApr 2026xai-grok-4-announcement79
05Gemini 2.5 ProGoogleApr 2026google-io-202575.60
06DeepSeek-R1-0528OpenDeepSeekMay 2025deepseek-model-card73.30
07o4-miniOpenAIMar 2024LiveCodeBench: Holistic and Contamination Free Evaluatio… · code72.80
08Qwen3-235B-A22BOpenAlibabaMay 2025arxiv-2505.0938870.70
09o3-miniAPIOpenAIMar 2024LiveCodeBench: Holistic and Contamination Free Evaluatio… · code66.90
10DeepSeek R1OpenDeepSeekJan 2025arxiv-2501.1294865.90
11o3OpenAIMar 2024LiveCodeBench: Holistic and Contamination Free Evaluatio… · code65.30
12DeepSeek-R1-Distill-Llama-70BOpenDeepSeekJan 2025arxiv-2501.1294865.20
13Gemini 2.5 FlashGoogleApr 2026llm-stats63.90
14Kimi k1.5APIMoonshot AIJan 2025arxiv-2501.1259962.50
15DeepSeek-R1-Distill-Qwen-32BOpenDeepSeekJan 2025arxiv-2501.1294862.10
16Claude Opus 4AnthropicMar 2024LiveCodeBench: Holistic and Contamination Free Evaluatio… · code57.80
17GPT-4.1OpenAIMar 2024LiveCodeBench: Holistic and Contamination Free Evaluatio… · code54.40
18Claude Sonnet 4AnthropicMar 2024LiveCodeBench: Holistic and Contamination Free Evaluatio… · code52.80
19DeepSeek-V3OpenDeepSeekMar 2024LiveCodeBench: Holistic and Contamination Free Evaluatio… · code49.20
20DeepSeek-v3-0324OpenDeepSeekMar 2025deepseek-model-card49.20
21GPT-4.1 miniAPIOpenAIApr 2026pricepertoken-leaderboard48.30
22Qwen2.5-Coder 32BOpenAlibabaMar 2024LiveCodeBench: Holistic and Contamination Free Evaluatio… · code47.80
23DeepSeek-Coder-V2-InstructOpenDeepSeekMar 2024LiveCodeBench: Holistic and Contamination Free Evaluatio… · code43.40
24Llama 4 MaverickOpenMetaApr 2025meta-model-card43.40
25GPT-4oAPIOpenAIMar 2024LiveCodeBench: Holistic and Contamination Free Evaluatio… · code40.80
26Gemma-3-27bOpenGoogleMar 2025arxiv-2503.1978639
27Llama-4-ScoutOpenMetaApr 2025meta-model-card32.80
28Gemma 3 12B ITOpenGoogle DeepMindMar 2025arxiv-2503.1978632
29Codestral 22BOpenMistralMar 2024LiveCodeBench: Holistic and Contamination Free Evaluatio… · code29.50
30Gemma 3 4B ITOpenGoogle DeepMindMar 2025arxiv-2503.1978623
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 03 · Progress

3 steps
of state of the art.

Each row below marks a model that broke the previous record on pass@1. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · pass@1
  1. Mar 12, 2024o4-miniOpenAI72.80
  2. May 28, 2025DeepSeek-R1-0528DeepSeek73.30
  3. Mar 15, 2026Gemini 3 Pro PreviewGoogle91.70
Fig 3 · SOTA-setting models only. 3 entries span Mar 2024 Mar 2026.
§ 04 · Literature

13 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies