Hard web-browsing QA benchmark with short factual answers that require persistent search over many online sources.
Accuracy is the reported evaluation metric for BrowseComp. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Edit |
|---|---|---|---|---|---|---|
| 01 | DeepSeek-V4-Pro Max | unverified | 83.4 | 2026 | Paper ↗Code ↗ | Edit result |
| 02 | Kimi K2.6 | unverified | 83.2 | 2026 | Paper ↗ | Edit result |
| 03 | MiniMax-M2.5 | unverified | 76.3 | 2026 | Paper ↗Code ↗ | Edit result |
| 04 | DeepSeek-V4-Flash Max | unverified | 73.2 | 2026 | Paper ↗Code ↗ | Edit result |
| 05 | Qwen3.5-397B-A17B | unverified | 69 | 2026 | Paper ↗Code ↗ | Edit result |
| 06 | GLM-5.1 | unverified | 68 | 2026 | Paper ↗Code ↗ | Edit result |
| 07 | Qwen3.5-122B-A10B | unverified | 63.8 | 2026 | Paper ↗Code ↗Source ↗ | Edit result |
| 08 | GLM-5 | unverified | 62 | 2026 | Paper ↗Code ↗Source ↗ | Edit result |
| 09 | Qwen3.5-35B-A3B | unverified | 61 | 2026 | Paper ↗Code ↗Source ↗ | Edit result |
| 10 | Qwen3.5-27B | unverified | 61 | 2026 | Paper ↗Code ↗Source ↗ | Edit result |
| 11 | Kimi-K2.5 | unverified | 60.6 | 2026 | Paper ↗Code ↗ | Edit result |
| 12 | Step-3.5-Flash | unverified | 51.6 | 2026 | Paper ↗Code ↗ | Edit result |
| 13 | DeepSeek-V3.2 | unverified | 51.4 | 2025 | Paper ↗Source ↗ | Edit result |
| 14 | NVIDIA-Nemotron-3-Super-120B-A12B-BF16 | unverified | 31.28 | 2025 | Paper ↗Source ↗ | Edit result |
| 15 | GLM-4.5 | unverified | 26.4 | 2025 | Paper ↗Code ↗ | Edit result |
| 16 | GLM-4.5-Air | unverified | 21.3 | 2025 | Paper ↗Code ↗Source ↗ | Edit result |