Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Benchmark · GPQAHome/Leaderboards/GPQA
Unknown

GPQA.

448 expert-level questions in biology, physics, and chemistry. Designed to be unsearchable.

Paper Leaderboard Lineage
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

accuracy

Accuracy is the reported evaluation metric for GPQA. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksEdit
01Gemini 3 Prounverified91.92026Source ↗Edit result
02Claude Opus 4.6unverified91.32026Source ↗Edit result
03Kimi K2.6unverified90.52026Paper ↗Edit result
04Gemini 3 Flashunverified90.42026Source ↗Edit result
05DeepSeek-V4-Pro Maxunverified90.12026Paper ↗Code ↗Edit result
06Claude Sonnet 4.6unverified89.92026Source ↗Edit result
07GPT-5unverified892026Source ↗Edit result
08Qwen3.5-397B-A17Bunverified88.42026Paper ↗Code ↗Edit result
09DeepSeek-V4-Flash Maxunverified88.12026Paper ↗Code ↗Edit result
10Grok 4unverified882026Source ↗Edit result
11Qwen3.6-27Bunverified87.82026Paper ↗Code ↗Edit result
12Kimi-K2.5unverified87.62026Paper ↗Code ↗Edit result
13Qwen3.5-122B-A10Bunverified86.62026Paper ↗Code ↗Source ↗Edit result
14Gemini 2.5 Prounverified86.42025Paper ↗Edit result
15GLM-5.1unverified86.22026Paper ↗Code ↗Edit result
16Qwen3.6-35B-A3Bunverified862026Paper ↗Code ↗Edit result
17GLM-5unverified862026Paper ↗Code ↗Source ↗Edit result
18GLM-4.7unverified85.72025Paper ↗Code ↗Source ↗Edit result
19DeepSeek-V3.2-Specialeunverified85.72025Paper ↗Source ↗Edit result
20Qwen3.5-27Bunverified85.52026Paper ↗Code ↗Source ↗Edit result
21MiniMax-M2.5unverified85.22026Paper ↗Code ↗Edit result
22Step-3.5-Flash PaCoReunverified852026Paper ↗Code ↗Edit result
23Gemma 4 31Bunverified84.32026Paper ↗Edit result
24Qwen3.5-35B-A3Bunverified84.22026Paper ↗Code ↗Source ↗Edit result
25Qwen3.5-Omni-Plusunverified83.92026Paper ↗Edit result
26Step-3.5-Flashunverified83.52026Paper ↗Code ↗Edit result
27o3paper82.82026Source ↗Edit result
28Gemini 2.5 Flashunverified82.82026Source ↗Edit result
29DeepSeek-V3.2unverified82.42025Paper ↗Source ↗Edit result
30NVIDIA-Nemotron-3-Super-120B-A12B-BF16unverified79.232025Paper ↗Source ↗Edit result
31GLM-4.5unverified79.12025Paper ↗Code ↗Edit result
32o4-minipaper77.62026Source ↗Edit result
33Qwen3-VL-235B-A22B-Thinkingunverified77.12025Paper ↗Code ↗Edit result
34Claude Opus 4
GPQA Diamond, 0-shot CoT. Source: Claude Opus 4 model card, Anthropic (2025).
verified76.72026Source ↗Edit result
35o1paper75.72026Source ↗Edit result
36GLM-4.5-Airunverified752025Paper ↗Code ↗Source ↗Edit result
37o3-mini
Zero-shot CoT, pass@1. Default reasoning effort.
unverified74.92026Source ↗Edit result
38Claude Opus 4.5
GPQA Diamond, 0-shot CoT. Source: Claude Opus 4.5 model card, Anthropic (2025).
verified74.92026Source ↗Edit result
39Qwen3-Coder-Nextunverified74.492026Paper ↗Code ↗Edit result
40Qwen3-VL-235B-A22B-Instructunverified74.32025Paper ↗Code ↗Edit result
41o1-previewpaper73.32026Source ↗Edit result
42Qwen3-Omni-Flash-Thinkingunverified73.12025Paper ↗Code ↗Edit result
43NVIDIA-Nemotron-3-Nano-30B-A3B-BF16unverified732025Paper ↗Code ↗Source ↗Edit result
44DeepSeek R1
GPQA Diamond, 0-shot CoT. Source: DeepSeek-R1 paper Table 3, arxiv:2501.12948 (Jan 2025).
verified71.52026Source ↗Edit result
45Qwen3-235B-A22Bunverified71.12025Paper ↗Code ↗Edit result
46ZAYA1-8Bunverified712026Paper ↗Source ↗Edit result
47Claude Sonnet 4
GPQA Diamond, 0-shot CoT. Source: Claude Sonnet 4 model card, Anthropic (2025).
verified702026Source ↗Edit result
48Llama-4-Maverick
GPQA Diamond, 0-shot CoT. Source: Meta Llama 4 blog post (April 2025).
verified69.82026Source ↗Edit result
49gpt-45-previewpaper69.52026Source ↗Edit result
50GPT-4.5 Preview
Zero-shot CoT.
unverified69.52026Source ↗Edit result
51MiMo-V2.5-Prounverified66.72026Paper ↗Edit result
52GPT-4.1 miniunverified66.42026Source ↗Edit result
53gpt-41paper66.32026Source ↗Edit result
54GPT-4.1
Zero-shot CoT.
unverified66.32026Source ↗Edit result
55Trinity Large Previewunverified63.322026Paper ↗Code ↗Edit result
56o1-mini
Zero-shot CoT, pass@1.
unverified602026Source ↗Edit result
57claude-35-sonnetpaper59.42026Source ↗Edit result
58Claude 3.5 Sonnet
Third-party reported.
unverified59.42026Source ↗Edit result
59grok-2paper562026Source ↗Edit result
60Grok 2
Third-party reported.
unverified562026Source ↗Edit result
61MiniMax-Text-01unverified54.42025Paper ↗Code ↗Edit result
62Llama 3 (405B, Instruct)unverified51.12024Paper ↗Code ↗Edit result
63llama-31-405bpaper50.72026Source ↗Edit result
64Llama 3.1 405B
Third-party reported.
unverified50.72026Source ↗Edit result
65Claude 3 Opus
Third-party reported.
unverified50.42026Source ↗Edit result
66claude-3-opuspaper50.42026Source ↗Edit result
67GPT-4o
Zero-shot CoT. gpt-4o-2024-05-13.
unverified49.92026Source ↗Edit result
68Qwen2.5-Plusunverified49.72024Paper ↗Code ↗Edit result
69GPT-4 Turbo
Zero-shot CoT.
unverified49.32026Source ↗Edit result
70gpt-4-turbopaper49.32026Source ↗Edit result
71Qwen2.5-72B-Instruct
Qwen2.5-72B-Instruct. GPQA Diamond. Table 6 in Qwen2.5 Technical Report.
verified492026Source ↗Edit result
72Qwen2.5-VL-72Bunverified492025Paper ↗Code ↗Edit result
73Gemini 1.5 Pro
From Google blog.
unverified46.22026Source ↗Edit result
74gemini-15-propaper46.22026Source ↗Edit result
75Gemma 3 (27B, IT)unverified42.42025Paper ↗Code ↗Edit result
76llama-31-70bpaper41.72026Source ↗Edit result
77Step-3.5-Flash Baseunverified41.72026Paper ↗Code ↗Edit result
78Llama 3.1 70B
Third-party reported.
unverified41.72026Source ↗Edit result
79GPT-4o mini
Zero-shot CoT.
unverified40.22026Source ↗Edit result
80gpt-4o-minipaper40.22026Source ↗Edit result
81Qwen3-VL-8B-Instructunverified34.72025Paper ↗Code ↗Edit result
Lineage

GPQA in context.

See full reasoning benchmarks lineage →
This benchmark (1)
active2023-11
GPQA
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards