| 01 | Gemini 3 Pro | unverified | 91.9 | 2026 | Source ↗ | Edit result |
| 02 | Claude Opus 4.6 | unverified | 91.3 | 2026 | Source ↗ | Edit result |
| 03 | Kimi K2.6 | unverified | 90.5 | 2026 | Paper ↗ | Edit result |
| 04 | Gemini 3 Flash | unverified | 90.4 | 2026 | Source ↗ | Edit result |
| 05 | DeepSeek-V4-Pro Max | unverified | 90.1 | 2026 | Paper ↗Code ↗ | Edit result |
| 06 | Claude Sonnet 4.6 | unverified | 89.9 | 2026 | Source ↗ | Edit result |
| 07 | GPT-5 | unverified | 89 | 2026 | Source ↗ | Edit result |
| 08 | Qwen3.5-397B-A17B | unverified | 88.4 | 2026 | Paper ↗Code ↗ | Edit result |
| 09 | DeepSeek-V4-Flash Max | unverified | 88.1 | 2026 | Paper ↗Code ↗ | Edit result |
| 10 | Grok 4 | unverified | 88 | 2026 | Source ↗ | Edit result |
| 11 | Qwen3.6-27B | unverified | 87.8 | 2026 | Paper ↗Code ↗ | Edit result |
| 12 | Kimi-K2.5 | unverified | 87.6 | 2026 | Paper ↗Code ↗ | Edit result |
| 13 | Qwen3.5-122B-A10B | unverified | 86.6 | 2026 | Paper ↗Code ↗Source ↗ | Edit result |
| 14 | Gemini 2.5 Pro | unverified | 86.4 | 2025 | Paper ↗ | Edit result |
| 15 | GLM-5.1 | unverified | 86.2 | 2026 | Paper ↗Code ↗ | Edit result |
| 16 | Qwen3.6-35B-A3B | unverified | 86 | 2026 | Paper ↗Code ↗ | Edit result |
| 17 | GLM-5 | unverified | 86 | 2026 | Paper ↗Code ↗Source ↗ | Edit result |
| 18 | GLM-4.7 | unverified | 85.7 | 2025 | Paper ↗Code ↗Source ↗ | Edit result |
| 19 | DeepSeek-V3.2-Speciale | unverified | 85.7 | 2025 | Paper ↗Source ↗ | Edit result |
| 20 | Qwen3.5-27B | unverified | 85.5 | 2026 | Paper ↗Code ↗Source ↗ | Edit result |
| 21 | MiniMax-M2.5 | unverified | 85.2 | 2026 | Paper ↗Code ↗ | Edit result |
| 22 | Step-3.5-Flash PaCoRe | unverified | 85 | 2026 | Paper ↗Code ↗ | Edit result |
| 23 | Gemma 4 31B | unverified | 84.3 | 2026 | Paper ↗ | Edit result |
| 24 | Qwen3.5-35B-A3B | unverified | 84.2 | 2026 | Paper ↗Code ↗Source ↗ | Edit result |
| 25 | Qwen3.5-Omni-Plus | unverified | 83.9 | 2026 | Paper ↗ | Edit result |
| 26 | Step-3.5-Flash | unverified | 83.5 | 2026 | Paper ↗Code ↗ | Edit result |
| 27 | o3 | paper | 82.8 | 2026 | Source ↗ | Edit result |
| 28 | Gemini 2.5 Flash | unverified | 82.8 | 2026 | Source ↗ | Edit result |
| 29 | DeepSeek-V3.2 | unverified | 82.4 | 2025 | Paper ↗Source ↗ | Edit result |
| 30 | NVIDIA-Nemotron-3-Super-120B-A12B-BF16 | unverified | 79.23 | 2025 | Paper ↗Source ↗ | Edit result |
| 31 | GLM-4.5 | unverified | 79.1 | 2025 | Paper ↗Code ↗ | Edit result |
| 32 | o4-mini | paper | 77.6 | 2026 | Source ↗ | Edit result |
| 33 | Qwen3-VL-235B-A22B-Thinking | unverified | 77.1 | 2025 | Paper ↗Code ↗ | Edit result |
| 34 | Claude Opus 4 GPQA Diamond, 0-shot CoT. Source: Claude Opus 4 model card, Anthropic (2025). | verified | 76.7 | 2026 | Source ↗ | Edit result |
| 35 | o1 | paper | 75.7 | 2026 | Source ↗ | Edit result |
| 36 | GLM-4.5-Air | unverified | 75 | 2025 | Paper ↗Code ↗Source ↗ | Edit result |
| 37 | o3-mini Zero-shot CoT, pass@1. Default reasoning effort. | unverified | 74.9 | 2026 | Source ↗ | Edit result |
| 38 | Claude Opus 4.5 GPQA Diamond, 0-shot CoT. Source: Claude Opus 4.5 model card, Anthropic (2025). | verified | 74.9 | 2026 | Source ↗ | Edit result |
| 39 | Qwen3-Coder-Next | unverified | 74.49 | 2026 | Paper ↗Code ↗ | Edit result |
| 40 | Qwen3-VL-235B-A22B-Instruct | unverified | 74.3 | 2025 | Paper ↗Code ↗ | Edit result |
| 41 | o1-preview | paper | 73.3 | 2026 | Source ↗ | Edit result |
| 42 | Qwen3-Omni-Flash-Thinking | unverified | 73.1 | 2025 | Paper ↗Code ↗ | Edit result |
| 43 | NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | unverified | 73 | 2025 | Paper ↗Code ↗Source ↗ | Edit result |
| 44 | DeepSeek R1 GPQA Diamond, 0-shot CoT. Source: DeepSeek-R1 paper Table 3, arxiv:2501.12948 (Jan 2025). | verified | 71.5 | 2026 | Source ↗ | Edit result |
| 45 | Qwen3-235B-A22B | unverified | 71.1 | 2025 | Paper ↗Code ↗ | Edit result |
| 46 | ZAYA1-8B | unverified | 71 | 2026 | Paper ↗Source ↗ | Edit result |
| 47 | Claude Sonnet 4 GPQA Diamond, 0-shot CoT. Source: Claude Sonnet 4 model card, Anthropic (2025). | verified | 70 | 2026 | Source ↗ | Edit result |
| 48 | Llama-4-Maverick GPQA Diamond, 0-shot CoT. Source: Meta Llama 4 blog post (April 2025). | verified | 69.8 | 2026 | Source ↗ | Edit result |
| 49 | gpt-45-preview | paper | 69.5 | 2026 | Source ↗ | Edit result |
| 50 | GPT-4.5 Preview Zero-shot CoT. | unverified | 69.5 | 2026 | Source ↗ | Edit result |
| 51 | MiMo-V2.5-Pro | unverified | 66.7 | 2026 | Paper ↗ | Edit result |
| 52 | GPT-4.1 mini | unverified | 66.4 | 2026 | Source ↗ | Edit result |
| 53 | gpt-41 | paper | 66.3 | 2026 | Source ↗ | Edit result |
| 54 | GPT-4.1 Zero-shot CoT. | unverified | 66.3 | 2026 | Source ↗ | Edit result |
| 55 | Trinity Large Preview | unverified | 63.32 | 2026 | Paper ↗Code ↗ | Edit result |
| 56 | o1-mini Zero-shot CoT, pass@1. | unverified | 60 | 2026 | Source ↗ | Edit result |
| 57 | claude-35-sonnet | paper | 59.4 | 2026 | Source ↗ | Edit result |
| 58 | Claude 3.5 Sonnet Third-party reported. | unverified | 59.4 | 2026 | Source ↗ | Edit result |
| 59 | grok-2 | paper | 56 | 2026 | Source ↗ | Edit result |
| 60 | Grok 2 Third-party reported. | unverified | 56 | 2026 | Source ↗ | Edit result |
| 61 | MiniMax-Text-01 | unverified | 54.4 | 2025 | Paper ↗Code ↗ | Edit result |
| 62 | Llama 3 (405B, Instruct) | unverified | 51.1 | 2024 | Paper ↗Code ↗ | Edit result |
| 63 | llama-31-405b | paper | 50.7 | 2026 | Source ↗ | Edit result |
| 64 | Llama 3.1 405B Third-party reported. | unverified | 50.7 | 2026 | Source ↗ | Edit result |
| 65 | Claude 3 Opus Third-party reported. | unverified | 50.4 | 2026 | Source ↗ | Edit result |
| 66 | claude-3-opus | paper | 50.4 | 2026 | Source ↗ | Edit result |
| 67 | GPT-4o Zero-shot CoT. gpt-4o-2024-05-13. | unverified | 49.9 | 2026 | Source ↗ | Edit result |
| 68 | Qwen2.5-Plus | unverified | 49.7 | 2024 | Paper ↗Code ↗ | Edit result |
| 69 | GPT-4 Turbo Zero-shot CoT. | unverified | 49.3 | 2026 | Source ↗ | Edit result |
| 70 | gpt-4-turbo | paper | 49.3 | 2026 | Source ↗ | Edit result |
| 71 | Qwen2.5-72B-Instruct Qwen2.5-72B-Instruct. GPQA Diamond. Table 6 in Qwen2.5 Technical Report. | verified | 49 | 2026 | Source ↗ | Edit result |
| 72 | Qwen2.5-VL-72B | unverified | 49 | 2025 | Paper ↗Code ↗ | Edit result |
| 73 | Gemini 1.5 Pro From Google blog. | unverified | 46.2 | 2026 | Source ↗ | Edit result |
| 74 | gemini-15-pro | paper | 46.2 | 2026 | Source ↗ | Edit result |
| 75 | Gemma 3 (27B, IT) | unverified | 42.4 | 2025 | Paper ↗Code ↗ | Edit result |
| 76 | llama-31-70b | paper | 41.7 | 2026 | Source ↗ | Edit result |
| 77 | Step-3.5-Flash Base | unverified | 41.7 | 2026 | Paper ↗Code ↗ | Edit result |
| 78 | Llama 3.1 70B Third-party reported. | unverified | 41.7 | 2026 | Source ↗ | Edit result |
| 79 | GPT-4o mini Zero-shot CoT. | unverified | 40.2 | 2026 | Source ↗ | Edit result |
| 80 | gpt-4o-mini | paper | 40.2 | 2026 | Source ↗ | Edit result |
| 81 | Qwen3-VL-8B-Instruct | unverified | 34.7 | 2025 | Paper ↗Code ↗ | Edit result |