Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Benchmark · BrowseCompHome/Leaderboards/Language & Knowledge/Question Answering/BrowseComp
Unknown

BrowseComp.

Hard web-browsing QA benchmark with short factual answers that require persistent search over many online sources.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Accuracy

Accuracy is the reported evaluation metric for BrowseComp. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksEdit
01DeepSeek-V4-Pro Maxunverified83.42026Paper ↗Code ↗Edit result
02Kimi K2.6unverified83.22026Paper ↗Edit result
03MiniMax-M2.5unverified76.32026Paper ↗Code ↗Edit result
04DeepSeek-V4-Flash Maxunverified73.22026Paper ↗Code ↗Edit result
05Qwen3.5-397B-A17Bunverified692026Paper ↗Code ↗Edit result
06GLM-5.1unverified682026Paper ↗Code ↗Edit result
07Qwen3.5-122B-A10Bunverified63.82026Paper ↗Code ↗Source ↗Edit result
08GLM-5unverified622026Paper ↗Code ↗Source ↗Edit result
09Qwen3.5-35B-A3Bunverified612026Paper ↗Code ↗Source ↗Edit result
10Qwen3.5-27Bunverified612026Paper ↗Code ↗Source ↗Edit result
11Kimi-K2.5unverified60.62026Paper ↗Code ↗Edit result
12Step-3.5-Flashunverified51.62026Paper ↗Code ↗Edit result
13DeepSeek-V3.2unverified51.42025Paper ↗Source ↗Edit result
14NVIDIA-Nemotron-3-Super-120B-A12B-BF16unverified31.282025Paper ↗Source ↗Edit result
15GLM-4.5unverified26.42025Paper ↗Code ↗Edit result
16GLM-4.5-Airunverified21.32025Paper ↗Code ↗Source ↗Edit result
§ 04 · Submit a result

Add to the leaderboard.

← Back to Question Answering