Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Benchmark · Collider-BenchHome/Leaderboards/Collider-Bench
Unknown

Collider-Bench.

Benchmark for autonomous coding/scientific agents reproducing Large Hadron Collider analyses. Public CodeSOTA score is Acc_tau at tau=0.33: the percent of simulation tasks whose relative-L2 error is below 0.33, derived from Table 2 and Eq. 4 of arXiv:2605.13950.

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Acc Tau 0 33

Acc Tau 0 33 is the reported evaluation metric for Collider-Bench. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Acc Tau 0 33verifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksEdit
01Codex CLI (GPT-5.5)
Derived from Table 2: 3 of 10 simulation tasks have relative-L2 < 0.33; value stored as percent.
verified302026Paper ↗Edit result
02Claude Code (Opus 4.7)
Derived from Table 2: 2 of 10 simulation tasks have relative-L2 < 0.33; value stored as percent.
verified202026Paper ↗Edit result
03Claude Code (Sonnet 4.6)
Derived from Table 2: 1 of 10 simulation tasks have relative-L2 < 0.33; value stored as percent.
verified102026Paper ↗Edit result
04Claude Code (Haiku 4.5)
Derived from Table 2: 0 of 10 simulation tasks have relative-L2 < 0.33; value stored as percent.
verified0.002026Paper ↗Edit result
05Codex CLI (GPT-5.4-mini)
Derived from Table 2: 0 of 10 simulation tasks have relative-L2 < 0.33; value stored as percent.
verified0.002026Paper ↗Edit result
06ForgeCode (DeepSeek-V4)
Derived from Table 2: 0 of 10 simulation tasks have relative-L2 < 0.33; value stored as percent.
verified0.002026Paper ↗Edit result
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards