Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Models · Qwen2.5-Plus6 results · 6 benchmarks
Model card

Qwen2.5-Plus.

unknown
§ 02 · Benchmarks

Every benchmark Qwen2.5-Plus has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01HumanEvalComputer Code · Code Generationpass-187.8%#2/3source ↗
02HumanEval+Computer Code · Code Generationpass-187.8%#2/12source ↗
03GSM8KReasoning · Mathematical Reasoningaccuracy96.0%#20/48source ↗
04MATHReasoning · Mathematical Reasoningaccuracy84.7%#24/46source ↗
05MMLU-ProReasoning · Commonsense Reasoningaccuracy72.5%#57/73source ↗
06GPQA DiamondReasoning · Multi-step Reasoningaccuracy49.7%#65/74source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Qwen2.5-Plus actually performs.

Computer Code
2
benchmarks
avg rank #2.0
Reasoning
4
benchmarks
avg rank #41.5
§ 04 · Papers

1 paper with results for Qwen2.5-Plus.

  1. 2024-12-19· 6 results

    Qwen2.5 Technical Report

§ 06 · Sources & freshness

Where these numbers come from.

pwc-dump
6
results
0 of 6 rows marked verified.