Codesota · Natural Language Processing · Polish Text Understanding · CPTU-BenchTasks/Natural Language Processing/Polish Text Understanding
Polish Text Understanding · benchmark dataset · 2025 · PL

Complex Polish Text Understanding Benchmark.

Evaluates LLMs on understanding Polish text across 4 dimensions: sentiment analysis, language understanding (implicatures, author intent), phraseology (idioms, phraseological compounds), and tricky questions (logic, ambiguity, hallucination resistance). Score range 0-5 per category. Created by SpeakLeash/Spichlerz.

Paper Download datasetSubmit a result
§ 01 · Leaderboard

Best published scores.

465 results indexed across 5 metrics. Shaded row marks current SOTA; ties broken by submission date.


Primary
average · higher is better
All metrics
average, language-understanding, phraseology, sentiment, tricky-questions
average· primary
93 rows
#ModelOrgSubmittedPaper / codeaverage
01Qwen/Qwen3.5-27B thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.34
02gemini-2.0-flash-001APIGoogleMay 2026SpeakLeash/CPTU-Bench4.29
03Qwen/Qwen3.5-27B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.27
04Qwen/Qwen3.5-35B-A3B thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.22
05Qwen/Qwen3.5-35B-A3B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.18
06deepseek-ai/DeepSeek-V3.2 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.14
07deepseek-ai/DeepSeek-R1 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.14
08gemini-2.0-flash-lite-001APIGoogleMay 2026SpeakLeash/CPTU-Bench4.09
09deepseek-ai/DeepSeek-V3-0324 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.03
10deepseek-ai/DeepSeek-V3.1 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.03
11deepseek-ai/DeepSeek-V3 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.02
12mistralai/Mistral-Large-Instruct-2411OpenmistralaiMay 2026SpeakLeash/CPTU-Bench4.00
13moonshotai/Kimi-K2-Instruct-0905 (API)APImoonshotaiMay 2026SpeakLeash/CPTU-Bench3.98
14Qwen/Qwen2.5-72B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.95
15mistralai/Mistral-Large-Instruct-2407OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.93
16meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)APImeta-llamaMay 2026SpeakLeash/CPTU-Bench3.93
17Qwen/Qwen3-235B-A22B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.91
18mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)APImistralaiMay 2026SpeakLeash/CPTU-Bench3.90
19mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)APImistralaiMay 2026SpeakLeash/CPTU-Bench3.83
20openai/gpt-oss-120b (API)APIopenaiMay 2026SpeakLeash/CPTU-Bench3.82
21google/gemma-3-27b-it (API)APIgoogleMay 2026SpeakLeash/CPTU-Bench3.81
22meta-llama/Meta-Llama-3-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.78
23Qwen/Qwen2.5-32B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.75
24meta-llama/Llama-4-Scout-17B-16E-Instruct (API)APImeta-llamaMay 2026SpeakLeash/CPTU-Bench3.75
25speakleash/Bielik-11B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.73
26Qwen/Qwen3-32B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.71
27mistralai/Mistral-Small-24B-Instruct-2501OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.71
28alpindale/WizardLM-2-8x22B (API)APIalpindaleMay 2026SpeakLeash/CPTU-Bench3.70
29CYFRAGOVPL/pllum-12b-nc-chat-250715OpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.67
30Qwen/Qwen2-72B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.65
31meta-llama/Llama-3.3-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.64
32speakleash/Bielik-11B-v2.6-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.64
33speakleash/Bielik-11B-v2.3-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.63
34meta-llama/Meta-Llama-3.1-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.62
35speakleash/Bielik-11B-v2.1-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.61
36mistralai/Mixtral-8x22B-Instruct-v0.1 (API)APImistralaiMay 2026SpeakLeash/CPTU-Bench3.56
37Qwen/Qwen2.5-14B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.55
38Qwen/Qwen3-30B-A3B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.54
39CYFRAGOVPL/Llama-PLLuM-70B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.53
40Qwen/Qwen3-14B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.51
41speakleash/Bielik-11B-v2.5-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.48
42speakleash/Bielik-11B-v2.2-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.46
43speakleash/Bielik-Minitron-7B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.38
44speakleash/Bielik-4.5B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.38
45CYFRAGOVPL/Llama-PLLuM-70B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.33
46CYFRAGOVPL/pllum-12b-nc-instruct-250715OpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.33
47microsoft/phi-4OpenmicrosoftMay 2026SpeakLeash/CPTU-Bench3.30
48Qwen/Qwen3.5-9B non-thinking (API, FP8)APIQwenMay 2026SpeakLeash/CPTU-Bench3.28
49speakleash/Bielik-11B-v2.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.26
50nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16OpennvidiaMay 2026SpeakLeash/CPTU-Bench3.25
51Qwen/Qwen1.5-72B-ChatOpenQwenMay 2026SpeakLeash/CPTU-Bench3.16
52CYFRAGOVPL/PLLuM-12B-nc-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.15
53utter-project/EuroLLM-9B-InstructOpenutter-projectMay 2026SpeakLeash/CPTU-Bench3.15
54CYFRAGOVPL/PLLuM-12B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.14
55CYFRAGOVPL/PLLuM-8x7B-nc-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.11
56CYFRAGOVPL/PLLuM-12B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.09
57Qwen/Qwen2.5-7B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.07
58Qwen/Qwen3-8B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.06
59CYFRAGOVPL/PLLuM-8x7B-nc-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.03
60meta-llama/Meta-Llama-3.1-8B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.01
61CYFRAGOVPL/PLLuM-8x7B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.01
62CYFRAGOVPL/PLLuM-8x7B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.01
63meta-llama/Meta-Llama-3-8B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.00
64CYFRAGOVPL/PLLuM-12B-nc-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench2.96
65THUDM/glm-4-9b-chatOpenTHUDMMay 2026SpeakLeash/CPTU-Bench2.95
66mistralai/Mistral-Nemo-Instruct-2407OpenmistralaiMay 2026SpeakLeash/CPTU-Bench2.94
67CYFRAGOVPL/Llama-PLLuM-8B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench2.92
68speakleash/Bielik-7B-Instruct-v0.1OpenspeakleashMay 2026SpeakLeash/CPTU-Bench2.88
69upstage/SOLAR-10.7B-Instruct-v1.0OpenupstageMay 2026SpeakLeash/CPTU-Bench2.88
70CYFRAGOVPL/Llama-PLLuM-8B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench2.82
71mistralai/Mistral-7B-Instruct-v0.3OpenmistralaiMay 2026SpeakLeash/CPTU-Bench2.76
72openchat/openchat-3.5-0106-gemmaOpenopenchatMay 2026SpeakLeash/CPTU-Bench2.73
73mistralai/Mixtral-8x7B-Instruct-v0.1OpenmistralaiMay 2026SpeakLeash/CPTU-Bench2.73
74google/gemma-2-2b-itOpengoogleMay 2026SpeakLeash/CPTU-Bench2.65
75berkeley-nest/Starling-LM-7B-alphaOpenberkeley-nestMay 2026SpeakLeash/CPTU-Bench2.63
76openchat/openchat-3.5-0106OpenopenchatMay 2026SpeakLeash/CPTU-Bench2.63
77Qwen/Qwen2.5-3B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench2.50
78speakleash/Bielik-1.5B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench2.36
7901-ai/Yi-1.5-34B-ChatOpen01-aiMay 2026SpeakLeash/CPTU-Bench2.33
80Voicelab/trurl-2-13b-academicOpenVoicelabMay 2026SpeakLeash/CPTU-Bench2.31
81NousResearch/Hermes-3-Llama-3.2-3BOpenNousResearchMay 2026SpeakLeash/CPTU-Bench2.31
82microsoft/Phi-4-mini-instructOpenmicrosoftMay 2026SpeakLeash/CPTU-Bench2.17
83internlm/internlm2-chat-20bOpeninternlmMay 2026SpeakLeash/CPTU-Bench2.15
84microsoft/Phi-3.5-mini-instructOpenmicrosoftMay 2026SpeakLeash/CPTU-Bench2.01
85meta-llama/Llama-3.2-3B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench2.00
86ibm-granite/granite-3.1-2b-instructOpenibm-graniteMay 2026SpeakLeash/CPTU-Bench1.95
87meta-llama/Llama-3.2-1B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench1.92
88utter-project/EuroLLM-1.7B-InstructOpenutter-projectMay 2026SpeakLeash/CPTU-Bench1.76
89Qwen/Qwen2.5-1.5B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench1.76
90LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpenLGAI-EXAONEMay 2026SpeakLeash/CPTU-Bench1.67
91h2oai/h2o-danube2-1.8b-chatOpenh2oaiMay 2026SpeakLeash/CPTU-Bench1.64
92HuggingFaceTB/SmolLM2-1.7B-InstructOpenHuggingFaceTBMay 2026SpeakLeash/CPTU-Bench1.50
93Qwen/Qwen2.5-0.5B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench1.40
language-understanding
93 rows
#ModelOrgSubmittedPaper / codelanguage-understanding
01deepseek-ai/DeepSeek-V3.2 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.36
02deepseek-ai/DeepSeek-R1 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.34
03deepseek-ai/DeepSeek-V3.1 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.33
04gemini-2.0-flash-001APIGoogleMay 2026SpeakLeash/CPTU-Bench4.32
05deepseek-ai/DeepSeek-V3 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.22
06Qwen/Qwen3.5-27B thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.21
07deepseek-ai/DeepSeek-V3-0324 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.20
08moonshotai/Kimi-K2-Instruct-0905 (API)APImoonshotaiMay 2026SpeakLeash/CPTU-Bench4.18
09Qwen/Qwen3.5-27B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.17
10Qwen/Qwen3-235B-A22B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.16
11meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)APImeta-llamaMay 2026SpeakLeash/CPTU-Bench4.11
12gemini-2.0-flash-lite-001APIGoogleMay 2026SpeakLeash/CPTU-Bench4.05
13Qwen/Qwen3.5-35B-A3B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.05
14mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)APImistralaiMay 2026SpeakLeash/CPTU-Bench4.00
15mistralai/Mistral-Large-Instruct-2407OpenmistralaiMay 2026SpeakLeash/CPTU-Bench4.00
16mistralai/Mistral-Large-Instruct-2411OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.98
17openai/gpt-oss-120b (API)APIopenaiMay 2026SpeakLeash/CPTU-Bench3.97
18Qwen/Qwen2.5-72B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.97
19CYFRAGOVPL/pllum-12b-nc-chat-250715OpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.96
20speakleash/Bielik-11B-v2.6-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.94
21Qwen/Qwen3.5-35B-A3B thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.94
22speakleash/Bielik-11B-v2.1-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.92
23Qwen/Qwen3-32B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.91
24speakleash/Bielik-11B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.91
25meta-llama/Meta-Llama-3.1-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.91
26Qwen/Qwen2-72B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.89
27mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)APImistralaiMay 2026SpeakLeash/CPTU-Bench3.88
28meta-llama/Llama-3.3-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.87
29speakleash/Bielik-11B-v2.5-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.86
30speakleash/Bielik-Minitron-7B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.83
31meta-llama/Meta-Llama-3-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.82
32alpindale/WizardLM-2-8x22B (API)APIalpindaleMay 2026SpeakLeash/CPTU-Bench3.81
33meta-llama/Llama-4-Scout-17B-16E-Instruct (API)APImeta-llamaMay 2026SpeakLeash/CPTU-Bench3.81
34google/gemma-3-27b-it (API)APIgoogleMay 2026SpeakLeash/CPTU-Bench3.79
35speakleash/Bielik-11B-v2.3-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.79
36speakleash/Bielik-11B-v2.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.75
37speakleash/Bielik-11B-v2.2-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.73
38CYFRAGOVPL/pllum-12b-nc-instruct-250715OpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.73
39mistralai/Mixtral-8x22B-Instruct-v0.1 (API)APImistralaiMay 2026SpeakLeash/CPTU-Bench3.67
40CYFRAGOVPL/Llama-PLLuM-70B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.63
41speakleash/Bielik-4.5B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.61
42CYFRAGOVPL/Llama-PLLuM-70B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.61
43mistralai/Mistral-Small-24B-Instruct-2501OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.60
44CYFRAGOVPL/PLLuM-8x7B-nc-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.59
45Qwen/Qwen2.5-32B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.56
46Qwen/Qwen2.5-14B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.56
47Qwen/Qwen3-14B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.56
48microsoft/phi-4OpenmicrosoftMay 2026SpeakLeash/CPTU-Bench3.54
49Qwen/Qwen1.5-72B-ChatOpenQwenMay 2026SpeakLeash/CPTU-Bench3.52
50CYFRAGOVPL/PLLuM-8x7B-nc-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.48
51speakleash/Bielik-7B-Instruct-v0.1OpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.48
52CYFRAGOVPL/PLLuM-8x7B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.47
53THUDM/glm-4-9b-chatOpenTHUDMMay 2026SpeakLeash/CPTU-Bench3.46
54CYFRAGOVPL/PLLuM-8x7B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.45
55Qwen/Qwen3-30B-A3B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.39
56meta-llama/Meta-Llama-3.1-8B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.38
57CYFRAGOVPL/PLLuM-12B-nc-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.31
58utter-project/EuroLLM-9B-InstructOpenutter-projectMay 2026SpeakLeash/CPTU-Bench3.30
59mistralai/Mistral-Nemo-Instruct-2407OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.29
60nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16OpennvidiaMay 2026SpeakLeash/CPTU-Bench3.27
61CYFRAGOVPL/PLLuM-12B-nc-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.23
62Qwen/Qwen3-8B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.23
63CYFRAGOVPL/PLLuM-12B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.21
64upstage/SOLAR-10.7B-Instruct-v1.0OpenupstageMay 2026SpeakLeash/CPTU-Bench3.18
65mistralai/Mixtral-8x7B-Instruct-v0.1OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.17
66CYFRAGOVPL/PLLuM-12B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.17
67meta-llama/Meta-Llama-3-8B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.15
68openchat/openchat-3.5-0106-gemmaOpenopenchatMay 2026SpeakLeash/CPTU-Bench3.08
69mistralai/Mistral-7B-Instruct-v0.3OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.06
70Qwen/Qwen2.5-7B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.02
71Qwen/Qwen3.5-9B non-thinking (API, FP8)APIQwenMay 2026SpeakLeash/CPTU-Bench2.98
72CYFRAGOVPL/Llama-PLLuM-8B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench2.93
73berkeley-nest/Starling-LM-7B-alphaOpenberkeley-nestMay 2026SpeakLeash/CPTU-Bench2.92
74CYFRAGOVPL/Llama-PLLuM-8B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench2.90
75google/gemma-2-2b-itOpengoogleMay 2026SpeakLeash/CPTU-Bench2.90
7601-ai/Yi-1.5-34B-ChatOpen01-aiMay 2026SpeakLeash/CPTU-Bench2.87
77openchat/openchat-3.5-0106OpenopenchatMay 2026SpeakLeash/CPTU-Bench2.83
78internlm/internlm2-chat-20bOpeninternlmMay 2026SpeakLeash/CPTU-Bench2.79
79Voicelab/trurl-2-13b-academicOpenVoicelabMay 2026SpeakLeash/CPTU-Bench2.75
80NousResearch/Hermes-3-Llama-3.2-3BOpenNousResearchMay 2026SpeakLeash/CPTU-Bench2.71
81Qwen/Qwen2.5-3B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench2.46
82microsoft/Phi-4-mini-instructOpenmicrosoftMay 2026SpeakLeash/CPTU-Bench2.43
83speakleash/Bielik-1.5B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench2.33
84meta-llama/Llama-3.2-3B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench2.29
85ibm-granite/granite-3.1-2b-instructOpenibm-graniteMay 2026SpeakLeash/CPTU-Bench2.23
86microsoft/Phi-3.5-mini-instructOpenmicrosoftMay 2026SpeakLeash/CPTU-Bench2.13
87LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpenLGAI-EXAONEMay 2026SpeakLeash/CPTU-Bench2.12
88utter-project/EuroLLM-1.7B-InstructOpenutter-projectMay 2026SpeakLeash/CPTU-Bench1.79
89meta-llama/Llama-3.2-1B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench1.74
90h2oai/h2o-danube2-1.8b-chatOpenh2oaiMay 2026SpeakLeash/CPTU-Bench1.59
91Qwen/Qwen2.5-1.5B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench1.35
92HuggingFaceTB/SmolLM2-1.7B-InstructOpenHuggingFaceTBMay 2026SpeakLeash/CPTU-Bench1.10
93Qwen/Qwen2.5-0.5B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench0.835
phraseology
93 rows
#ModelOrgSubmittedPaper / codephraseology
01gemini-2.0-flash-001APIGoogleMay 2026SpeakLeash/CPTU-Bench4.34
02gemini-2.0-flash-lite-001APIGoogleMay 2026SpeakLeash/CPTU-Bench4.24
03Qwen/Qwen3.5-35B-A3B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.23
04alpindale/WizardLM-2-8x22B (API)APIalpindaleMay 2026SpeakLeash/CPTU-Bench4.22
05Qwen/Qwen3.5-27B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.20
06mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)APImistralaiMay 2026SpeakLeash/CPTU-Bench4.15
07Qwen/Qwen3.5-35B-A3B thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.15
08Qwen/Qwen3.5-27B thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.11
09Qwen/Qwen2.5-32B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench4.04
10google/gemma-3-27b-it (API)APIgoogleMay 2026SpeakLeash/CPTU-Bench4.03
11mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)APImistralaiMay 2026SpeakLeash/CPTU-Bench4.00
12mistralai/Mistral-Large-Instruct-2411OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.99
13speakleash/Bielik-11B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.96
14Qwen/Qwen2.5-72B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.93
15meta-llama/Llama-4-Scout-17B-16E-Instruct (API)APImeta-llamaMay 2026SpeakLeash/CPTU-Bench3.90
16mistralai/Mistral-Small-24B-Instruct-2501OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.88
17mistralai/Mistral-Large-Instruct-2407OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.86
18speakleash/Bielik-4.5B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.67
19deepseek-ai/DeepSeek-R1 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench3.60
20CYFRAGOVPL/PLLuM-12B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.59
21speakleash/Bielik-11B-v2.3-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.55
22mistralai/Mixtral-8x22B-Instruct-v0.1 (API)APImistralaiMay 2026SpeakLeash/CPTU-Bench3.55
23deepseek-ai/DeepSeek-V3.2 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench3.54
24deepseek-ai/DeepSeek-V3-0324 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench3.54
25CYFRAGOVPL/PLLuM-12B-nc-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.54
26deepseek-ai/DeepSeek-V3 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench3.52
27Qwen/Qwen3-30B-A3B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.50
28openai/gpt-oss-120b (API)APIopenaiMay 2026SpeakLeash/CPTU-Bench3.49
29Qwen/Qwen3-235B-A22B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.48
30Qwen/Qwen3.5-9B non-thinking (API, FP8)APIQwenMay 2026SpeakLeash/CPTU-Bench3.48
31deepseek-ai/DeepSeek-V3.1 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench3.48
32meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)APImeta-llamaMay 2026SpeakLeash/CPTU-Bench3.48
33meta-llama/Meta-Llama-3-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.46
34CYFRAGOVPL/PLLuM-8x7B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.46
35CYFRAGOVPL/Llama-PLLuM-8B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.46
36CYFRAGOVPL/pllum-12b-nc-chat-250715OpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.46
37moonshotai/Kimi-K2-Instruct-0905 (API)APImoonshotaiMay 2026SpeakLeash/CPTU-Bench3.43
38CYFRAGOVPL/PLLuM-12B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.43
39speakleash/Bielik-11B-v2.6-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.41
40Qwen/Qwen2.5-14B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.37
41CYFRAGOVPL/Llama-PLLuM-8B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.36
42CYFRAGOVPL/Llama-PLLuM-70B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.35
43CYFRAGOVPL/PLLuM-8x7B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.35
44CYFRAGOVPL/PLLuM-12B-nc-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.32
45CYFRAGOVPL/pllum-12b-nc-instruct-250715OpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.29
46Qwen/Qwen2-72B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.28
47CYFRAGOVPL/Llama-PLLuM-70B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.26
48upstage/SOLAR-10.7B-Instruct-v1.0OpenupstageMay 2026SpeakLeash/CPTU-Bench3.25
49speakleash/Bielik-11B-v2.2-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.25
50meta-llama/Meta-Llama-3.1-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.25
51Qwen/Qwen3-14B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.25
52Qwen/Qwen3-32B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.23
53microsoft/phi-4OpenmicrosoftMay 2026SpeakLeash/CPTU-Bench3.23
54speakleash/Bielik-Minitron-7B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.23
55CYFRAGOVPL/PLLuM-8x7B-nc-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.22
56utter-project/EuroLLM-9B-InstructOpenutter-projectMay 2026SpeakLeash/CPTU-Bench3.17
57speakleash/Bielik-11B-v2.5-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.13
58speakleash/Bielik-11B-v2.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.13
59speakleash/Bielik-11B-v2.1-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.10
60Qwen/Qwen2.5-7B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.10
61CYFRAGOVPL/PLLuM-8x7B-nc-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.08
62meta-llama/Llama-3.3-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.04
63meta-llama/Meta-Llama-3-8B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.04
64Qwen/Qwen1.5-72B-ChatOpenQwenMay 2026SpeakLeash/CPTU-Bench2.98
65mistralai/Mixtral-8x7B-Instruct-v0.1OpenmistralaiMay 2026SpeakLeash/CPTU-Bench2.88
66berkeley-nest/Starling-LM-7B-alphaOpenberkeley-nestMay 2026SpeakLeash/CPTU-Bench2.85
67Qwen/Qwen2.5-3B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench2.80
68THUDM/glm-4-9b-chatOpenTHUDMMay 2026SpeakLeash/CPTU-Bench2.78
69NousResearch/Hermes-3-Llama-3.2-3BOpenNousResearchMay 2026SpeakLeash/CPTU-Bench2.77
70Qwen/Qwen3-8B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench2.77
71nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16OpennvidiaMay 2026SpeakLeash/CPTU-Bench2.76
72mistralai/Mistral-Nemo-Instruct-2407OpenmistralaiMay 2026SpeakLeash/CPTU-Bench2.74
73mistralai/Mistral-7B-Instruct-v0.3OpenmistralaiMay 2026SpeakLeash/CPTU-Bench2.68
74Qwen/Qwen2.5-0.5B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench2.60
75meta-llama/Meta-Llama-3.1-8B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench2.58
76openchat/openchat-3.5-0106OpenopenchatMay 2026SpeakLeash/CPTU-Bench2.56
77h2oai/h2o-danube2-1.8b-chatOpenh2oaiMay 2026SpeakLeash/CPTU-Bench2.47
78openchat/openchat-3.5-0106-gemmaOpenopenchatMay 2026SpeakLeash/CPTU-Bench2.44
79microsoft/Phi-3.5-mini-instructOpenmicrosoftMay 2026SpeakLeash/CPTU-Bench2.42
80internlm/internlm2-chat-20bOpeninternlmMay 2026SpeakLeash/CPTU-Bench2.38
8101-ai/Yi-1.5-34B-ChatOpen01-aiMay 2026SpeakLeash/CPTU-Bench2.38
82speakleash/Bielik-1.5B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench2.38
83HuggingFaceTB/SmolLM2-1.7B-InstructOpenHuggingFaceTBMay 2026SpeakLeash/CPTU-Bench2.35
84meta-llama/Llama-3.2-1B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench2.34
85speakleash/Bielik-7B-Instruct-v0.1OpenspeakleashMay 2026SpeakLeash/CPTU-Bench2.31
86utter-project/EuroLLM-1.7B-InstructOpenutter-projectMay 2026SpeakLeash/CPTU-Bench2.26
87microsoft/Phi-4-mini-instructOpenmicrosoftMay 2026SpeakLeash/CPTU-Bench2.25
88Qwen/Qwen2.5-1.5B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench2.23
89Voicelab/trurl-2-13b-academicOpenVoicelabMay 2026SpeakLeash/CPTU-Bench2.17
90LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpenLGAI-EXAONEMay 2026SpeakLeash/CPTU-Bench2.13
91google/gemma-2-2b-itOpengoogleMay 2026SpeakLeash/CPTU-Bench2.10
92ibm-granite/granite-3.1-2b-instructOpenibm-graniteMay 2026SpeakLeash/CPTU-Bench1.88
93meta-llama/Llama-3.2-3B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench1.72
sentiment
93 rows
#ModelOrgSubmittedPaper / codesentiment
01gemini-2.0-flash-001APIGoogleMay 2026SpeakLeash/CPTU-Bench4.52
02deepseek-ai/DeepSeek-R1 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.49
03deepseek-ai/DeepSeek-V3.2 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.46
04deepseek-ai/DeepSeek-V3.1 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.42
05Qwen/Qwen3.5-27B thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.42
06moonshotai/Kimi-K2-Instruct-0905 (API)APImoonshotaiMay 2026SpeakLeash/CPTU-Bench4.39
07meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)APImeta-llamaMay 2026SpeakLeash/CPTU-Bench4.39
08CYFRAGOVPL/pllum-12b-nc-chat-250715OpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench4.36
09deepseek-ai/DeepSeek-V3 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.36
10deepseek-ai/DeepSeek-V3-0324 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.36
11meta-llama/Meta-Llama-3.1-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench4.33
12mistralai/Mistral-Large-Instruct-2411OpenmistralaiMay 2026SpeakLeash/CPTU-Bench4.33
13meta-llama/Llama-3.3-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench4.29
14Qwen/Qwen3.5-27B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.29
15gemini-2.0-flash-lite-001APIGoogleMay 2026SpeakLeash/CPTU-Bench4.23
16mistralai/Mistral-Large-Instruct-2407OpenmistralaiMay 2026SpeakLeash/CPTU-Bench4.23
17Qwen/Qwen3.5-35B-A3B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.23
18Qwen/Qwen3-235B-A22B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.17
19mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)APImistralaiMay 2026SpeakLeash/CPTU-Bench4.13
20meta-llama/Meta-Llama-3-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench4.13
21Qwen/Qwen3-32B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.13
22meta-llama/Llama-4-Scout-17B-16E-Instruct (API)APImeta-llamaMay 2026SpeakLeash/CPTU-Bench4.10
23speakleash/Bielik-11B-v2.6-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench4.10
24Qwen/Qwen3.5-35B-A3B thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.10
25Qwen/Qwen2.5-72B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench4.08
26mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)APImistralaiMay 2026SpeakLeash/CPTU-Bench4.01
27speakleash/Bielik-11B-v2.5-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench4.01
28meta-llama/Meta-Llama-3.1-8B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.97
29speakleash/Bielik-11B-v2.3-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.97
30speakleash/Bielik-11B-v2.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.97
31speakleash/Bielik-11B-v2.1-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.96
32openai/gpt-oss-120b (API)APIopenaiMay 2026SpeakLeash/CPTU-Bench3.94
33CYFRAGOVPL/Llama-PLLuM-70B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.94
34Qwen/Qwen3-14B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.91
35CYFRAGOVPL/pllum-12b-nc-instruct-250715OpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.91
36mistralai/Mistral-Small-24B-Instruct-2501OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.91
37Qwen/Qwen2.5-14B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.91
38CYFRAGOVPL/PLLuM-8x7B-nc-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.88
39google/gemma-3-27b-it (API)APIgoogleMay 2026SpeakLeash/CPTU-Bench3.88
40speakleash/Bielik-11B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.88
41Qwen/Qwen2.5-32B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.81
42mistralai/Mixtral-8x22B-Instruct-v0.1 (API)APImistralaiMay 2026SpeakLeash/CPTU-Bench3.78
43CYFRAGOVPL/Llama-PLLuM-70B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.78
44Qwen/Qwen2-72B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.76
45speakleash/Bielik-4.5B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.76
46CYFRAGOVPL/PLLuM-8x7B-nc-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.76
47openchat/openchat-3.5-0106-gemmaOpenopenchatMay 2026SpeakLeash/CPTU-Bench3.73
48speakleash/Bielik-11B-v2.2-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.72
49microsoft/phi-4OpenmicrosoftMay 2026SpeakLeash/CPTU-Bench3.72
50Qwen/Qwen3-30B-A3B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.72
51speakleash/Bielik-Minitron-7B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.72
52CYFRAGOVPL/PLLuM-12B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.71
53alpindale/WizardLM-2-8x22B (API)APIalpindaleMay 2026SpeakLeash/CPTU-Bench3.71
54mistralai/Mistral-Nemo-Instruct-2407OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.64
55CYFRAGOVPL/PLLuM-8x7B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.59
56speakleash/Bielik-7B-Instruct-v0.1OpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.59
57THUDM/glm-4-9b-chatOpenTHUDMMay 2026SpeakLeash/CPTU-Bench3.59
58Qwen/Qwen2.5-7B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.56
59nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16OpennvidiaMay 2026SpeakLeash/CPTU-Bench3.53
60speakleash/Bielik-1.5B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.53
61Qwen/Qwen3-8B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.49
62Qwen/Qwen1.5-72B-ChatOpenQwenMay 2026SpeakLeash/CPTU-Bench3.47
63CYFRAGOVPL/PLLuM-8x7B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.44
64google/gemma-2-2b-itOpengoogleMay 2026SpeakLeash/CPTU-Bench3.40
65utter-project/EuroLLM-9B-InstructOpenutter-projectMay 2026SpeakLeash/CPTU-Bench3.37
66meta-llama/Meta-Llama-3-8B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.33
67mistralai/Mistral-7B-Instruct-v0.3OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.33
68CYFRAGOVPL/PLLuM-12B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.32
69internlm/internlm2-chat-20bOpeninternlmMay 2026SpeakLeash/CPTU-Bench3.30
70Voicelab/trurl-2-13b-academicOpenVoicelabMay 2026SpeakLeash/CPTU-Bench3.30
71CYFRAGOVPL/Llama-PLLuM-8B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.24
72CYFRAGOVPL/PLLuM-12B-nc-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.24
73CYFRAGOVPL/PLLuM-12B-nc-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.22
74openchat/openchat-3.5-0106OpenopenchatMay 2026SpeakLeash/CPTU-Bench3.16
75CYFRAGOVPL/Llama-PLLuM-8B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.13
7601-ai/Yi-1.5-34B-ChatOpen01-aiMay 2026SpeakLeash/CPTU-Bench3.08
77meta-llama/Llama-3.2-1B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.08
78ibm-granite/granite-3.1-2b-instructOpenibm-graniteMay 2026SpeakLeash/CPTU-Bench3.08
79berkeley-nest/Starling-LM-7B-alphaOpenberkeley-nestMay 2026SpeakLeash/CPTU-Bench3.06
80mistralai/Mixtral-8x7B-Instruct-v0.1OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.06
81Qwen/Qwen3.5-9B non-thinking (API, FP8)APIQwenMay 2026SpeakLeash/CPTU-Bench3.01
82upstage/SOLAR-10.7B-Instruct-v1.0OpenupstageMay 2026SpeakLeash/CPTU-Bench2.97
83Qwen/Qwen2.5-3B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench2.95
84Qwen/Qwen2.5-1.5B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench2.79
85meta-llama/Llama-3.2-3B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench2.76
86microsoft/Phi-4-mini-instructOpenmicrosoftMay 2026SpeakLeash/CPTU-Bench2.69
87NousResearch/Hermes-3-Llama-3.2-3BOpenNousResearchMay 2026SpeakLeash/CPTU-Bench2.62
88microsoft/Phi-3.5-mini-instructOpenmicrosoftMay 2026SpeakLeash/CPTU-Bench2.44
89h2oai/h2o-danube2-1.8b-chatOpenh2oaiMay 2026SpeakLeash/CPTU-Bench2.37
90HuggingFaceTB/SmolLM2-1.7B-InstructOpenHuggingFaceTBMay 2026SpeakLeash/CPTU-Bench2.28
91utter-project/EuroLLM-1.7B-InstructOpenutter-projectMay 2026SpeakLeash/CPTU-Bench2.24
92Qwen/Qwen2.5-0.5B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench1.96
93LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpenLGAI-EXAONEMay 2026SpeakLeash/CPTU-Bench1.94
tricky-questions
93 rows
#ModelOrgSubmittedPaper / codetricky-questions
01Qwen/Qwen3.5-35B-A3B thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.70
02Qwen/Qwen3.5-27B thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.61
03Qwen/Qwen3.5-27B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.43
04deepseek-ai/DeepSeek-V3.2 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.20
05Qwen/Qwen3.5-35B-A3B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench4.19
06deepseek-ai/DeepSeek-R1 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.12
07deepseek-ai/DeepSeek-V3-0324 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench4.02
08gemini-2.0-flash-001APIGoogleMay 2026SpeakLeash/CPTU-Bench3.99
09deepseek-ai/DeepSeek-V3 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench3.99
10moonshotai/Kimi-K2-Instruct-0905 (API)APImoonshotaiMay 2026SpeakLeash/CPTU-Bench3.93
11openai/gpt-oss-120b (API)APIopenaiMay 2026SpeakLeash/CPTU-Bench3.89
12deepseek-ai/DeepSeek-V3.1 (API)APIdeepseek-aiMay 2026SpeakLeash/CPTU-Bench3.87
13gemini-2.0-flash-lite-001APIGoogleMay 2026SpeakLeash/CPTU-Bench3.85
14Qwen/Qwen3-235B-A22B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.84
15Qwen/Qwen2.5-72B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.81
16meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)APImeta-llamaMay 2026SpeakLeash/CPTU-Bench3.76
17mistralai/Mistral-Large-Instruct-2411OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.72
18meta-llama/Meta-Llama-3-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.71
19Qwen/Qwen2-72B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.68
20mistralai/Mistral-Large-Instruct-2407OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.65
21Qwen/Qwen3.5-9B non-thinking (API, FP8)APIQwenMay 2026SpeakLeash/CPTU-Bench3.64
22Qwen/Qwen2.5-32B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.59
23Qwen/Qwen3-32B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.56
24Qwen/Qwen3-30B-A3B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.54
25google/gemma-3-27b-it (API)APIgoogleMay 2026SpeakLeash/CPTU-Bench3.53
26speakleash/Bielik-11B-v2.1-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.47
27mistralai/Mistral-Small-24B-Instruct-2501OpenmistralaiMay 2026SpeakLeash/CPTU-Bench3.45
28nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16OpennvidiaMay 2026SpeakLeash/CPTU-Bench3.43
29mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)APImistralaiMay 2026SpeakLeash/CPTU-Bench3.42
30meta-llama/Llama-3.3-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.38
31Qwen/Qwen2.5-14B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench3.34
32Qwen/Qwen3-14B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench3.33
33mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)APImistralaiMay 2026SpeakLeash/CPTU-Bench3.30
34mistralai/Mixtral-8x22B-Instruct-v0.1 (API)APImistralaiMay 2026SpeakLeash/CPTU-Bench3.24
35speakleash/Bielik-11B-v2.3-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.22
36CYFRAGOVPL/Llama-PLLuM-70B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench3.21
37meta-llama/Llama-4-Scout-17B-16E-Instruct (API)APImeta-llamaMay 2026SpeakLeash/CPTU-Bench3.19
38speakleash/Bielik-11B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.19
39speakleash/Bielik-11B-v2.2-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.12
40speakleash/Bielik-11B-v2.6-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench3.10
41alpindale/WizardLM-2-8x22B (API)APIalpindaleMay 2026SpeakLeash/CPTU-Bench3.06
42meta-llama/Meta-Llama-3.1-70B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench3.01
43speakleash/Bielik-11B-v2.5-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench2.91
44CYFRAGOVPL/pllum-12b-nc-chat-250715OpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench2.90
45Qwen/Qwen3-8B non-thinking (API)APIQwenMay 2026SpeakLeash/CPTU-Bench2.76
46utter-project/EuroLLM-9B-InstructOpenutter-projectMay 2026SpeakLeash/CPTU-Bench2.75
47speakleash/Bielik-Minitron-7B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench2.74
48microsoft/phi-4OpenmicrosoftMay 2026SpeakLeash/CPTU-Bench2.72
49Qwen/Qwen1.5-72B-ChatOpenQwenMay 2026SpeakLeash/CPTU-Bench2.67
50CYFRAGOVPL/Llama-PLLuM-70B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench2.63
51CYFRAGOVPL/PLLuM-12B-nc-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench2.62
52CYFRAGOVPL/PLLuM-12B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench2.59
53Qwen/Qwen2.5-7B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench2.58
54meta-llama/Meta-Llama-3-8B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench2.48
55speakleash/Bielik-4.5B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench2.46
56CYFRAGOVPL/pllum-12b-nc-instruct-250715OpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench2.37
57CYFRAGOVPL/Llama-PLLuM-8B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench2.25
58google/gemma-2-2b-itOpengoogleMay 2026SpeakLeash/CPTU-Bench2.21
59speakleash/Bielik-11B-v2.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench2.20
60speakleash/Bielik-7B-Instruct-v0.1OpenspeakleashMay 2026SpeakLeash/CPTU-Bench2.16
61upstage/SOLAR-10.7B-Instruct-v1.0OpenupstageMay 2026SpeakLeash/CPTU-Bench2.12
62meta-llama/Meta-Llama-3.1-8B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench2.11
63mistralai/Mistral-Nemo-Instruct-2407OpenmistralaiMay 2026SpeakLeash/CPTU-Bench2.09
64mistralai/Mistral-7B-Instruct-v0.3OpenmistralaiMay 2026SpeakLeash/CPTU-Bench1.99
65THUDM/glm-4-9b-chatOpenTHUDMMay 2026SpeakLeash/CPTU-Bench1.98
66CYFRAGOVPL/PLLuM-12B-nc-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench1.98
67openchat/openchat-3.5-0106OpenopenchatMay 2026SpeakLeash/CPTU-Bench1.96
68CYFRAGOVPL/PLLuM-12B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench1.90
69Qwen/Qwen2.5-3B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench1.81
70mistralai/Mixtral-8x7B-Instruct-v0.1OpenmistralaiMay 2026SpeakLeash/CPTU-Bench1.80
71CYFRAGOVPL/PLLuM-8x7B-nc-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench1.80
72CYFRAGOVPL/PLLuM-8x7B-chatOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench1.78
73CYFRAGOVPL/PLLuM-8x7B-nc-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench1.76
74berkeley-nest/Starling-LM-7B-alphaOpenberkeley-nestMay 2026SpeakLeash/CPTU-Bench1.68
75openchat/openchat-3.5-0106-gemmaOpenopenchatMay 2026SpeakLeash/CPTU-Bench1.68
76CYFRAGOVPL/Llama-PLLuM-8B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench1.66
77CYFRAGOVPL/PLLuM-8x7B-instructOpenCYFRAGOVPLMay 2026SpeakLeash/CPTU-Bench1.51
78microsoft/Phi-4-mini-instructOpenmicrosoftMay 2026SpeakLeash/CPTU-Bench1.30
79speakleash/Bielik-1.5B-v3.0-InstructOpenspeakleashMay 2026SpeakLeash/CPTU-Bench1.22
80meta-llama/Llama-3.2-3B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench1.22
81NousResearch/Hermes-3-Llama-3.2-3BOpenNousResearchMay 2026SpeakLeash/CPTU-Bench1.14
82microsoft/Phi-3.5-mini-instructOpenmicrosoftMay 2026SpeakLeash/CPTU-Bench1.04
83Voicelab/trurl-2-13b-academicOpenVoicelabMay 2026SpeakLeash/CPTU-Bench1.02
8401-ai/Yi-1.5-34B-ChatOpen01-aiMay 2026SpeakLeash/CPTU-Bench1.00
85utter-project/EuroLLM-1.7B-InstructOpenutter-projectMay 2026SpeakLeash/CPTU-Bench0.758
86Qwen/Qwen2.5-1.5B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench0.663
87ibm-granite/granite-3.1-2b-instructOpenibm-graniteMay 2026SpeakLeash/CPTU-Bench0.590
88meta-llama/Llama-3.2-1B-InstructOpenmeta-llamaMay 2026SpeakLeash/CPTU-Bench0.522
89LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpenLGAI-EXAONEMay 2026SpeakLeash/CPTU-Bench0.489
90HuggingFaceTB/SmolLM2-1.7B-InstructOpenHuggingFaceTBMay 2026SpeakLeash/CPTU-Bench0.253
91Qwen/Qwen2.5-0.5B-InstructOpenQwenMay 2026SpeakLeash/CPTU-Bench0.219
92h2oai/h2o-danube2-1.8b-chatOpenh2oaiMay 2026SpeakLeash/CPTU-Bench0.129
93internlm/internlm2-chat-20bOpeninternlmMay 2026SpeakLeash/CPTU-Bench0.124
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 03 · Progress

1 steps
of state of the art.

Each row below marks a model that broke the previous record on average. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · average
  1. May 22, 2026Qwen/Qwen3.5-27B thinking (API)Qwen4.34
Fig 3 · SOTA-setting models only. 1 entries span May 2026 May 2026.
§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies