| 01 | Gemini 3 ProAPI | Google | Apr 2026 | google-blog | 91.90 |
| 02 | Claude Opus 4.6 | Anthropic | Apr 2026 | anthropic-opus-4-6-announcement | 91.30 |
| 03 | Kimi K2.6 | — | Apr 2026 | pwc-dump | 90.50 |
| 04 | Gemini 3 FlashAPI | Google | Apr 2026 | google-blog | 90.40 |
| 05 | DeepSeek-V4-Pro Max | DeepSeek | Apr 2026 | pwc-dump · code | 90.10 |
| 06 | Claude Sonnet 4.6API | Anthropic | Apr 2026 | anthropic-sonnet-4-6-system-card | 89.90 |
| 07 | GPT-5 | OpenAI | Apr 2026 | openai-gpt-5-launch | 89 |
| 08 | Qwen3.5-397B-A17BOpen | Alibaba | Feb 2026 | pwc-dump · code | 88.40 |
| 09 | DeepSeek-V4-Flash Max | DeepSeek | Apr 2026 | pwc-dump · code | 88.10 |
| 10 | Grok 4API | xAI | Apr 2026 | xai-grok-4-announcement | 88 |
| 11 | Qwen3.6-27B | — | Apr 2026 | pwc-dump · code | 87.80 |
| 12 | Kimi-K2.5Open | Moonshot.AI | Feb 2026 | Kimi K2.5: Visual Agentic Intelligence · code | 87.60 |
| 13 | Qwen3.5-122B-A10BOpen | Alibaba | Feb 2026 | pwc-dump · code | 86.60 |
| 14 | Gemini 2.5 Pro | — | Jul 2025 | Gemini 2.5: Pushing the Frontier with Advanced Reasoning… | 86.40 |
| 15 | GLM-5.1 | — | Feb 2026 | GLM-5: from Vibe Coding to Agentic Engineering · code | 86.20 |
| 16 | Qwen3.6-35B-A3B | — | Apr 2026 | pwc-dump · code | 86 |
| 17 | GLM-5Open | Zhipu AI | Feb 2026 | GLM-5: from Vibe Coding to Agentic Engineering · code | 86 |
| 18 | GLM-4.7Open | Zhipu AI | Aug 2025 | GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation… · code | 85.70 |
| 19 | DeepSeek-V3.2-SpecialeOpen | DeepSeek | Dec 2025 | DeepSeek-V3.2: Pushing the Frontier of Open Large Langua… | 85.70 |
| 20 | Qwen3.5-27BOpen | Alibaba | Feb 2026 | pwc-dump · code | 85.50 |
| 21 | MiniMax-M2.5Open | MiniMaxAI | Feb 2026 | pwc-dump · code | 85.20 |
| 22 | Step-3.5-Flash PaCoRe | — | Feb 2026 | Step 3.5 Flash: Open Frontier-Level Intelligence with 11… · code | 85 |
| 23 | Gemma 4 31B | Google | Apr 2026 | pwc-dump | 84.30 |
| 24 | Qwen3.5-35B-A3BOpen | Alibaba | Feb 2026 | pwc-dump · code | 84.20 |
| 25 | Gemini 2.5 ProAPI | Google | Mar 2026 | google-technical-report | 84 |
| 26 | Qwen3.5-Omni-Plus | — | Apr 2026 | Qwen3.5-Omni Technical Report | 83.90 |
| 27 | Step-3.5-Flash | — | Feb 2026 | Step 3.5 Flash: Open Frontier-Level Intelligence with 11… · code | 83.50 |
| 28 | Gemini 2.5 Flash | Google | Apr 2026 | google-model-card | 82.80 |
| 29 | o3 | OpenAI | Mar 2026 | openai-simple-evals | 82.80 |
| 30 | Gemini 2.5 Flash | — | Jul 2025 | Gemini 2.5: Pushing the Frontier with Advanced Reasoning… | 82.80 |
| 31 | DeepSeek-V3.2Open | DeepSeek | Dec 2025 | DeepSeek-V3.2: Pushing the Frontier of Open Large Langua… | 82.40 |
| 32 | NVIDIA-Nemotron-3-Super-120B-A12B-BF16 | — | Dec 2025 | NVIDIA Nemotron 3: Efficient and Open Intelligence | 79.23 |
| 33 | GLM-4.5Open | Zhipu AI | Aug 2025 | GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation… · code | 79.10 |
| 34 | o4-mini | OpenAI | Mar 2026 | openai-simple-evals | 77.60 |
| 35 | Qwen3-VL-235B-A22B-Thinking | Qwen | Nov 2025 | Qwen3-VL Technical Report · code | 77.10 |
| 36 | Claude Opus 4 | Anthropic | Mar 2026 | anthropic-model-card | 76.70 |
| 37 | o1API | OpenAI | Mar 2026 | openai-simple-evals | 75.70 |
| 38 | GLM-4.5-AirOpen | Zhipu AI | Aug 2025 | GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation… · code | 75 |
| 39 | Claude Opus 4.5API | Anthropic | Mar 2026 | anthropic-model-card | 74.90 |
| 40 | o3-miniAPI | OpenAI | Mar 2026 | openai-simple-evals | 74.90 |
| 41 | Qwen3-Coder-Next | Qwen | Feb 2026 | Qwen3-Coder-Next Technical Report · code | 74.49 |
| 42 | Qwen3-VL-235B-A22B-Instruct | Qwen | Nov 2025 | Qwen3-VL Technical Report · code | 74.30 |
| 43 | o1-previewAPI | OpenAI | Mar 2026 | openai-simple-evals | 73.30 |
| 44 | Qwen3-Omni-Flash-Thinking | — | Sep 2025 | Qwen3-Omni Technical Report · code | 73.10 |
| 45 | NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | — | Dec 2025 | Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybr… · code | 73 |
| 46 | DeepSeek R1Open | DeepSeek | Mar 2026 | arxiv | 71.50 |
| 47 | Qwen3-235B-A22BOpen | Alibaba | Apr 2026 | qwen-model-card | 71.10 |
| 48 | Qwen3-235B-A22BOpen | Alibaba | May 2025 | Qwen3 Technical Report · code | 71.10 |
| 49 | ZAYA1-8B | Z.ai | May 2026 | ZAYA1-8B Technical Report | 71 |
| 50 | Claude Sonnet 4 | Anthropic | Mar 2026 | anthropic-model-card | 70 |
| 51 | Llama 4 MaverickOpen | Meta | Mar 2026 | meta-blog | 69.80 |
| 52 | GPT-4.5 PreviewAPI | OpenAI | Mar 2026 | openai-simple-evals | 69.50 |
| 53 | MiMo-V2.5-Pro | — | Apr 2026 | pwc-dump | 66.70 |
| 54 | GPT-4.1 miniAPI | OpenAI | Apr 2026 | pricepertoken-leaderboard | 66.40 |
| 55 | GPT-4.1 | OpenAI | Mar 2026 | openai-simple-evals | 66.30 |
| 56 | Trinity Large Preview | Arcee AI | Feb 2026 | Arcee Trinity Large Technical Report · code | 63.32 |
| 57 | o1-miniAPI | OpenAI | Mar 2026 | openai-simple-evals | 60 |
| 58 | Claude 3.5 SonnetAPI | Anthropic | Mar 2026 | openai-simple-evals | 59.40 |
| 59 | Grok 2API | xAI | Mar 2026 | openai-simple-evals | 56 |
| 60 | MiniMax-Text-01 | MiniMax | Jan 2025 | MiniMax-01: Scaling Foundation Models with Lightning Att… · code | 54.40 |
| 61 | Llama 3 (405B, Instruct) | Meta | Jul 2024 | The Llama 3 Herd of Models · code | 51.10 |
| 62 | Llama 3.1 405BOpen | Meta | Mar 2026 | openai-simple-evals | 50.70 |
| 63 | Claude 3 OpusAPI | Anthropic | Mar 2026 | openai-simple-evals | 50.40 |
| 64 | GPT-4oAPI | OpenAI | Mar 2026 | openai-simple-evals | 49.90 |
| 65 | Qwen2.5-Plus | — | Dec 2024 | Qwen2.5 Technical Report · code | 49.70 |
| 66 | GPT-4 TurboAPI | OpenAI | Mar 2026 | openai-simple-evals | 49.30 |
| 67 | Qwen2.5-VL-72B | — | Feb 2025 | Qwen2.5-VL Technical Report · code | 49 |
| 68 | Qwen2.5-72B-InstructOpen | Alibaba | Mar 2026 | qwen25-tech-report | 49 |
| 69 | Gemini 1.5 ProAPI | Google | Mar 2026 | openai-simple-evals | 46.20 |
| 70 | Gemma 3 (27B, IT) | — | Mar 2025 | Gemma 3 Technical Report · code | 42.40 |
| 71 | Step-3.5-Flash Base | — | Feb 2026 | Step 3.5 Flash: Open Frontier-Level Intelligence with 11… · code | 41.70 |
| 72 | Llama 3.1 70BOpen | Meta | Mar 2026 | openai-simple-evals | 41.70 |
| 73 | GPT-4o mini | OpenAI | Mar 2026 | openai-simple-evals | 40.20 |
| 74 | Qwen3-VL-8B-Instruct | Qwen | Nov 2025 | Qwen3-VL Technical Report · code | 34.70 |