Who leads the LiveCodeBench benchmark?

Gemini 3 Pro Preview currently leads LiveCodeBench with a score of 91.70 on pass@1.

What is the state-of-the-art score on LiveCodeBench?

The state-of-the-art result on LiveCodeBench is 91.70 (pass@1), achieved by Gemini 3 Pro Preview as of 2026.

How many models are tracked on LiveCodeBench?

Codesota tracks 51 models on LiveCodeBench across 2 metrics.

When was the LiveCodeBench leaderboard last updated?

The LiveCodeBench leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2024.

Codesota · Computer Code · Code Generation · LiveCodeBenchTasks/Computer Code/Code Generation

Code Generation · benchmark dataset · 2024 · EN

LiveCodeBench.

Name: LiveCodeBench Benchmark Results
Creator: Codesota
Published: 2024-01-01
License: https://creativecommons.org/licenses/by/4.0/

Contamination-free coding benchmark collecting new problems from LeetCode, AtCoder, and CodeForces after model knowledge cutoffs. Updated continuously with fresh problems. Primary metric is pass@1 on the full test set.

Paper ↗Submit a result ↵

§ 01 · Leaderboard

Best published scores.

54 results indexed across 2 metrics. Shaded row marks current SOTA; ties broken by submission date.

Primary: pass@1 · higher is better
All metrics: pass-1, pass@1

pass-1

24 rows

#	Model	Org	Submitted	Paper / code	pass-1
01	DeepSeek-V4-Pro Max	DeepSeek	Apr 2026	pwc-dump · code	93.50
02	DeepSeek-V4-Flash Max	DeepSeek	Apr 2026	pwc-dump · code	91.60
03	Kimi K2.6	—	Apr 2026	pwc-dump	89.60
04	DeepSeek-V3.2-SpecialeOpen	DeepSeek	Dec 2025	DeepSeek-V3.2: Pushing the Frontier of Open Large Langua…	88.70
05	Kimi-K2.5Open	Moonshot.AI	Feb 2026	Kimi K2.5: Visual Agentic Intelligence · code	85
06	Qwen3.6-27B	—	Apr 2026	pwc-dump · code	83.90
07	Qwen3.5-397B-A17BOpen	Alibaba	Feb 2026	pwc-dump · code	83.60
08	DeepSeek-V3.2Open	DeepSeek	Dec 2025	DeepSeek-V3.2: Pushing the Frontier of Open Large Langua…	83.30
09	NVIDIA-Nemotron-3-Super-120B-A12B-BF16	—	Dec 2025	NVIDIA Nemotron 3: Efficient and Open Intelligence	81.19
10	Qwen3.6-35B-A3B	—	Apr 2026	pwc-dump · code	80.40
11	Gemma 4 31B	Google	Apr 2026	pwc-dump	80
12	Intern-S1-Pro	Shanghai AI Lab	Mar 2026	Intern-S1-Pro: Scientific Multimodal Foundation Model at…	74.30
13	Gemini 2.5 Pro	—	Jul 2025	Gemini 2.5: Pushing the Frontier with Advanced Reasoning…	74.20
14	GLM-4.5Open	Zhipu AI	Aug 2025	GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation… · code	72.90
15	GLM-4.5-AirOpen	Zhipu AI	Aug 2025	GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation… · code	70.70
16	Qwen3-235B-A22BOpen	Alibaba	May 2025	Qwen3 Technical Report · code	70.70
17	Qwen3-VL-235B-A22B-Thinking	Qwen	Nov 2025	Qwen3-VL Technical Report · code	70.10
18	NVIDIA-Nemotron-3-Nano-30B-A3B-BF16	—	Dec 2025	Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybr… · code	68.30
19	Gemini 2.5 Flash	—	Jul 2025	Gemini 2.5: Pushing the Frontier with Advanced Reasoning…	59.30
20	Qwen3-Coder-Next	Qwen	Feb 2026	Qwen3-Coder-Next Technical Report · code	58.93
21	Qwen2.5-72B-Instruct	—	Dec 2024	Qwen2.5 Technical Report · code	55.50
22	Qwen3-VL-235B-A22B-Instruct	Qwen	Nov 2025	Qwen3-VL Technical Report · code	54.30
23	Qwen3-VL-8B-Instruct	Qwen	Nov 2025	Qwen3-VL Technical Report · code	39.30
24	Gemma 3 (27B, IT)	—	Mar 2025	Gemma 3 Technical Report · code	29.70

pass@1· primary

30 rows

#	Model	Org	Submitted	Paper / code	pass@1
01	Gemini 3 Pro Preview	Google	Mar 2026	vendor	91.70
02	Gemini 3 FlashAPI	Google	Mar 2026	vendor	90.80
03	GPT-5	OpenAI	Apr 2026	artificial-analysis	85
04	Grok 4API	xAI	Apr 2026	xai-grok-4-announcement	79
05	Gemini 2.5 Pro	Google	Apr 2026	google-io-2025	75.60
06	DeepSeek-R1-0528Open	DeepSeek	May 2025	deepseek-model-card	73.30
07	o4-mini	OpenAI	Mar 2024	LiveCodeBench: Holistic and Contamination Free Evaluatio… · code	72.80
08	Qwen3-235B-A22BOpen	Alibaba	May 2025	arxiv-2505.09388	70.70
09	o3-miniAPI	OpenAI	Mar 2024	LiveCodeBench: Holistic and Contamination Free Evaluatio… · code	66.90
10	DeepSeek R1Open	DeepSeek	Jan 2025	arxiv-2501.12948	65.90
11	o3	OpenAI	Mar 2024	LiveCodeBench: Holistic and Contamination Free Evaluatio… · code	65.30
12	DeepSeek-R1-Distill-Llama-70BOpen	DeepSeek	Jan 2025	arxiv-2501.12948	65.20
13	Gemini 2.5 Flash	Google	Apr 2026	llm-stats	63.90
14	Kimi k1.5API	Moonshot AI	Jan 2025	arxiv-2501.12599	62.50
15	DeepSeek-R1-Distill-Qwen-32BOpen	DeepSeek	Jan 2025	arxiv-2501.12948	62.10
16	Claude Opus 4	Anthropic	Mar 2024	LiveCodeBench: Holistic and Contamination Free Evaluatio… · code	57.80
17	GPT-4.1	OpenAI	Mar 2024	LiveCodeBench: Holistic and Contamination Free Evaluatio… · code	54.40
18	Claude Sonnet 4	Anthropic	Mar 2024	LiveCodeBench: Holistic and Contamination Free Evaluatio… · code	52.80
19	DeepSeek-V3Open	DeepSeek	Mar 2024	LiveCodeBench: Holistic and Contamination Free Evaluatio… · code	49.20
20	DeepSeek-v3-0324Open	DeepSeek	Mar 2025	deepseek-model-card	49.20
21	GPT-4.1 miniAPI	OpenAI	Apr 2026	pricepertoken-leaderboard	48.30
22	Qwen2.5-Coder 32BOpen	Alibaba	Mar 2024	LiveCodeBench: Holistic and Contamination Free Evaluatio… · code	47.80
23	DeepSeek-Coder-V2-InstructOpen	DeepSeek	Mar 2024	LiveCodeBench: Holistic and Contamination Free Evaluatio… · code	43.40
24	Llama 4 MaverickOpen	Meta	Apr 2025	meta-model-card	43.40
25	GPT-4oAPI	OpenAI	Mar 2024	LiveCodeBench: Holistic and Contamination Free Evaluatio… · code	40.80
26	Gemma-3-27bOpen	Google	Mar 2025	arxiv-2503.19786	39
27	Llama-4-ScoutOpen	Meta	Apr 2025	meta-model-card	32.80
28	Gemma 3 12B ITOpen	Google DeepMind	Mar 2025	arxiv-2503.19786	32
29	Codestral 22BOpen	Mistral	Mar 2024	LiveCodeBench: Holistic and Contamination Free Evaluatio… · code	29.50
30	Gemma 3 4B ITOpen	Google DeepMind	Mar 2025	arxiv-2503.19786	23

Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.

§ 03 · Progress

3 steps
of state of the art.

Each row below marks a model that broke the previous record on pass@1. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · pass@1

Mar 12, 2024o4-miniOpenAI72.80
May 28, 2025DeepSeek-R1-0528DeepSeek73.30
Mar 15, 2026Gemini 3 Pro PreviewGoogle91.70

Fig 3 · SOTA-setting models only. 3 entries span Mar 2024 → Mar 2026.

§ 04 · Literature

13 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale
Mar 2026·Intern-S1-Pro
arXiv ↗
Qwen3-Coder-Next Technical Report
Feb 2026·Qwen3-Coder-Next
arXiv ↗Code
Kimi K2.5: Visual Agentic Intelligence
Feb 2026·Kimi-K2.5
arXiv ↗Code
NVIDIA Nemotron 3: Efficient and Open Intelligence
Dec 2025·NVIDIA-Nemotron-3-Super-120B-A12B-BF16
arXiv ↗
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Dec 2025·NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
arXiv ↗Code
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Dec 2025·DeepSeek-V3.2-Speciale, DeepSeek-V3.2
arXiv ↗
Qwen3-VL Technical Report
Nov 2025·Qwen3-VL-235B-A22B-Thinking, Qwen3-VL-235B-A22B-Instruct, Qwen3-VL-8B-Instruct
arXiv ↗Code
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Aug 2025·GLM-4.5, GLM-4.5-Air
arXiv ↗Code
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Jul 2025·Gemini 2.5 Pro, Gemini 2.5 Flash
arXiv ↗
Qwen3 Technical Report
May 2025·Qwen3-235B-A22B
arXiv ↗Code
Gemma 3 Technical Report
Mar 2025·Gemma 3 (27B, IT)
arXiv ↗Code
Qwen2.5 Technical Report
Dec 2024·Qwen2.5-72B-Instruct
arXiv ↗Code
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Mar 2024·o4-mini, o3-mini, o3 +8
arXiv ↗Code

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

LiveCodeBench.

Best published scores.

3 stepsof state of the art.

13 paperstied to this benchmark.

Neighbouring benchmarks.

Have a score that beatsthis table?

3 steps
of state of the art.

13 papers
tied to this benchmark.

Have a score that beats
this table?