Who leads the Polish MT-Bench benchmark?

Gemma 3 (27B, IT) currently leads Polish MT-Bench with a score of 9.28 on pl-score.

What is the state-of-the-art score on Polish MT-Bench?

The state-of-the-art result on Polish MT-Bench is 9.28 (pl-score), achieved by Gemma 3 (27B, IT) as of 2026.

How many models are tracked on Polish MT-Bench?

Codesota tracks 50 models on Polish MT-Bench across 9 metrics.

When was the Polish MT-Bench leaderboard last updated?

The Polish MT-Bench leaderboard on Codesota includes results through 2026.

Codesota · Natural Language Processing · Polish Conversation Quality · Polish MT-BenchTasks/Natural Language Processing/Polish Conversation Quality

Polish Conversation Quality · benchmark dataset · 2025 · PL

Polish Multi-Turn Benchmark.

Name: Polish Multi-Turn Benchmark Benchmark Results
Creator: Codesota
Published: 2026-01-01
License: https://creativecommons.org/licenses/by/4.0/

Polish adaptation of MT-Bench evaluating LLMs on multi-turn conversation quality across 8 categories: coding, extraction, humanities, math, reasoning, roleplay, STEM, and writing. Scores on a 1-10 scale judged by GPT-4. Created by SpeakLeash.

Paper ↗Download dataset Submit a result ↵

§ 01 · Leaderboard

Best published scores.

450 results indexed across 9 metrics. Shaded row marks current SOTA; ties broken by submission date.

Primary: pl-score · higher is better
All metrics: coding, extraction, humanities, math, pl-score, reasoning, roleplay, stem, writing

coding

50 rows

#	Model	Org	Submitted	Paper / code	coding
01	Mistral-Small-3.1-24B-Instruct-2503Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.30
02	gemma-3-12b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	8.25
03	Gemma 3 (27B, IT)Open	Google	Apr 2026	SpeakLeash/MT-Bench-PL	8.10
04	Qwen2.5-32B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	7.95
05	Mistral-Small-24B-Instruct-2501Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.95
06	Qwen2-72B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	7.80
07	Phi-4Open	Microsoft	Apr 2026	SpeakLeash/MT-Bench-PL	7.60
08	Gemma-2-27b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	7.45
09	Meta-Llama-3.1-405B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	7.25
10	Mistral-Small-Instruct-2409Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.10
11	Mistral-Large-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	6.75
12	Qwen2.5-14B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	6.70
13	Mixtral-8x22bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	6.45
14	Meta-Llama-3.1-70B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	6.25
15	Bielik-11B-v2.3-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.25
16	GPT-3.5-turboOpen	OpenAI	Apr 2026	SpeakLeash/MT-Bench-PL	6.00
17	Mistral-Nemo-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	5.85
18	aya-expanse-32bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.75
19	Bielik-11B-v2.0-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.60
20	gemma-3-4b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	5.40
21	Bielik-11B-v2.1-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.40
22	openchat-3.5-0106-gemmaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.35
23	Mixtral-8x7bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	5.20
24	openchat-3.5-0106Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.05
25	Bielik-11B-v2.2-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.05
26	Qwen2.5-3B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	5.00
27	aya-expanse-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.90
28	Llama-PLLuM-70B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	4.80
29	Starling-LM-7B-alphaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.75
30	Meta-Llama-3.1-8B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	4.60
31	dolphin-2.9.1-llama-3-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.60
32	PLLuM-12B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	4.55
33	PLLuM-8x7B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	4.55
34	Hermes-3-Llama-3.2-3BOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.45
35	Llama-3.2-3B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	4.40
36	Mistral-7B-Instruct-v0.3Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	4.30
37	Mistral-7B-Instruct-v0.2Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	4.25
38	Phi-3.5-mini-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.20
39	PLLuM-8x7B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	4.10
40	Qwen2.5-1.5B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	3.95
41	Llama-PLLuM-8B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	3.65
42	gemma-3-1b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	3.35
43	PLLuM-12B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	3.05
44	granite-3.0-2b-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.05
45	Bielik-7B-Instruct-v0.1Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.00
46	Polka-Mistral-7B-SFTOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	2.95
47	trurl-2-7bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.80
48	SmolLM2-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.75
49	EuroLLM-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.70
50	Llama-3.2-1B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	1.65

extraction

50 rows

#	Model	Org	Submitted	Paper / code	extraction
01	Mistral-Large-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.90
02	Qwen2.5-32B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	9.90
03	Gemma 3 (27B, IT)Open	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.90
04	Mistral-Small-24B-Instruct-2501Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.90
05	Meta-Llama-3.1-70B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	9.85
06	Meta-Llama-3.1-405B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	9.85
07	Qwen2-72B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	9.80
08	Mistral-Small-3.1-24B-Instruct-2503Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.80
09	Gemma-2-27b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.60
10	gemma-3-12b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.55
11	Mixtral-8x22bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.55
12	Llama-PLLuM-70B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	9.45
13	Bielik-11B-v2.3-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.43
14	Phi-4Open	Microsoft	Apr 2026	SpeakLeash/MT-Bench-PL	9.30
15	Bielik-11B-v2.2-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.30
16	Qwen2.5-14B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	9.25
17	Mistral-Small-Instruct-2409Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.15
18	Bielik-11B-v2.1-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.13
19	Meta-Llama-3.1-8B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	9.10
20	Mistral-Nemo-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.95
21	Bielik-11B-v2.0-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.65
22	Qwen2.5-3B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	8.45
23	gemma-3-4b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	8.40
24	PLLuM-8x7B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	8.40
25	aya-expanse-32bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.40
26	Mixtral-8x7bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.15
27	GPT-3.5-turboOpen	OpenAI	Apr 2026	SpeakLeash/MT-Bench-PL	8.15
28	aya-expanse-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.05
29	PLLuM-8x7B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	8.00
30	Mistral-7B-Instruct-v0.2Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.40
31	Starling-LM-7B-alphaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.35
32	Mistral-7B-Instruct-v0.3Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.30
33	PLLuM-12B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	7.20
34	openchat-3.5-0106-gemmaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.90
35	openchat-3.5-0106Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.90
36	Phi-3.5-mini-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.80
37	PLLuM-12B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.55
38	Llama-PLLuM-8B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.30
39	Llama-3.2-3B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	6.22
40	dolphin-2.9.1-llama-3-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.15
41	Qwen2.5-1.5B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	5.75
42	Hermes-3-Llama-3.2-3BOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.30
43	Polka-Mistral-7B-SFTOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.25
44	gemma-3-1b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	4.87
45	Bielik-7B-Instruct-v0.1Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.35
46	trurl-2-7bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.50
47	granite-3.0-2b-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.45
48	SmolLM2-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	2.75
49	EuroLLM-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	2.25
50	Llama-3.2-1B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	1.60

humanities

50 rows

#	Model	Org	Submitted	Paper / code	humanities
01	gemma-3-12b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	10
02	Mistral-Small-Instruct-2409Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	10
03	aya-expanse-32bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	10
04	Gemma-2-27b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	10
05	Gemma 3 (27B, IT)Open	Google	Apr 2026	SpeakLeash/MT-Bench-PL	10
06	Mistral-Small-3.1-24B-Instruct-2503Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	10
07	Phi-4Open	Microsoft	Apr 2026	SpeakLeash/MT-Bench-PL	9.95
08	gemma-3-4b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.90
09	Qwen2-72B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	9.75
10	GPT-3.5-turboOpen	OpenAI	Apr 2026	SpeakLeash/MT-Bench-PL	9.75
11	Mistral-Small-24B-Instruct-2501Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.70
12	aya-expanse-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.65
13	Qwen2.5-32B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	9.65
14	Meta-Llama-3.1-405B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	9.65
15	Mistral-Nemo-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.50
16	PLLuM-12B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	9.50
17	Bielik-11B-v2.3-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.50
18	Llama-PLLuM-8B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	9.50
19	Meta-Llama-3.1-70B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	9.50
20	Mixtral-8x7bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.45
21	Bielik-11B-v2.0-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.43
22	Mistral-Large-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.40
23	Bielik-11B-v2.2-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.40
24	openchat-3.5-0106Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.30
25	PLLuM-12B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	9.30
26	Bielik-11B-v2.1-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.20
27	Qwen2.5-14B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	9.18
28	Mixtral-8x22bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.10
29	Meta-Llama-3.1-8B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	8.82
30	dolphin-2.9.1-llama-3-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.80
31	openchat-3.5-0106-gemmaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.80
32	Llama-PLLuM-70B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	8.80
33	PLLuM-8x7B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	8.60
34	gemma-3-1b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	8.50
35	Starling-LM-7B-alphaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.50
36	Bielik-7B-Instruct-v0.1Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.47
37	Mistral-7B-Instruct-v0.2Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.40
38	Hermes-3-Llama-3.2-3BOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.05
39	Phi-3.5-mini-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.90
40	Qwen2.5-3B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	7.85
41	PLLuM-8x7B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	7.47
42	Llama-3.2-3B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	7.15
43	Mistral-7B-Instruct-v0.3Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	6.75
44	Polka-Mistral-7B-SFTOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.60
45	trurl-2-7bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.95
46	Qwen2.5-1.5B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	3.45
47	EuroLLM-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.25
48	SmolLM2-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.85
49	granite-3.0-2b-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.45
50	Llama-3.2-1B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	1.40

math

50 rows

#	Model	Org	Submitted	Paper / code	math
01	Gemma 3 (27B, IT)Open	Google	Apr 2026	SpeakLeash/MT-Bench-PL	8.25
02	Qwen2.5-14B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	8.10
03	Mistral-Small-3.1-24B-Instruct-2503Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.85
04	Mistral-Small-24B-Instruct-2501Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.83
05	Gemma-2-27b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	7.80
06	Mistral-Large-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.80
07	Phi-4Open	Microsoft	Apr 2026	SpeakLeash/MT-Bench-PL	7.70
08	Bielik-11B-v2.3-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.70
09	Qwen2.5-32B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	7.60
10	gemma-3-12b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	7.45
11	gemma-3-4b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	7.40
12	Mistral-Small-Instruct-2409Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.00
13	Mixtral-8x22bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	6.90
14	GPT-3.5-turboOpen	OpenAI	Apr 2026	SpeakLeash/MT-Bench-PL	6.85
15	Mistral-Nemo-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	6.70
16	aya-expanse-32bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.60
17	Qwen2-72B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	6.50
18	Bielik-11B-v2.2-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.45
19	Qwen2.5-3B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	6.40
20	Meta-Llama-3.1-405B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	6.25
21	Bielik-11B-v2.1-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.15
22	Meta-Llama-3.1-70B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	6.00
23	Mixtral-8x7bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	5.65
24	Bielik-11B-v2.0-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.50
25	Meta-Llama-3.1-8B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	5.30
26	dolphin-2.9.1-llama-3-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.80
27	openchat-3.5-0106-gemmaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.55
28	Phi-3.5-mini-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.50
29	Llama-3.2-3B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	4.50
30	aya-expanse-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.35
31	Starling-LM-7B-alphaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.15
32	Bielik-7B-Instruct-v0.1Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.10
33	gemma-3-1b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	4.05
34	openchat-3.5-0106Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.80
35	Hermes-3-Llama-3.2-3BOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.70
36	PLLuM-8x7B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	3.45
37	Qwen2.5-1.5B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	3.45
38	PLLuM-8x7B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	3.35
39	Mistral-7B-Instruct-v0.2Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	3.20
40	Polka-Mistral-7B-SFTOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	2.95
41	Llama-PLLuM-70B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	2.90
42	Llama-PLLuM-8B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	2.75
43	PLLuM-12B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	2.65
44	Llama-3.2-1B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	2.60
45	Mistral-7B-Instruct-v0.3Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	2.35
46	PLLuM-12B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	2.30
47	granite-3.0-2b-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.95
48	SmolLM2-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.80
49	trurl-2-7bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.70
50	EuroLLM-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.10

pl-score· primary

50 rows

#	Model	Org	Submitted	Paper / code	pl-score
01	Gemma 3 (27B, IT)Open	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.28
02	Mistral-Small-3.1-24B-Instruct-2503Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.18
03	Phi-4Open	Microsoft	Apr 2026	SpeakLeash/MT-Bench-PL	9.07
04	gemma-3-12b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	8.97
05	Qwen2.5-32B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	8.86
06	Qwen2-72B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	8.78
07	Mistral-Small-24B-Instruct-2501Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.72
08	Mistral-Large-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.66
09	Gemma-2-27b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	8.62
10	aya-expanse-32bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.62
11	Mistral-Small-Instruct-2409Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.56
12	Bielik-11B-v2.3-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.56
13	Qwen2.5-14B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	8.33
14	Mixtral-8x22bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.23
15	gemma-3-4b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	8.22
16	Meta-Llama-3.1-405B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	8.17
17	Meta-Llama-3.1-70B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	8.15
18	Bielik-11B-v2.2-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.12
19	Bielik-11B-v2.1-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.00
20	aya-expanse-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.76
21	GPT-3.5-turboOpen	OpenAI	Apr 2026	SpeakLeash/MT-Bench-PL	7.72
22	Mixtral-8x7bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.64
23	Bielik-11B-v2.0-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.56
24	Mistral-Nemo-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.37
25	Llama-PLLuM-70B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.75
26	openchat-3.5-0106-gemmaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.51
27	PLLuM-12B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.47
28	PLLuM-8x7B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.43
29	Qwen2.5-3B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	6.35
30	PLLuM-8x7B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.30
31	Meta-Llama-3.1-8B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	6.24
32	Llama-PLLuM-8B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.05
33	Starling-LM-7B-alphaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.05
34	openchat-3.5-0106Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.03
35	PLLuM-12B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	5.81
36	Mistral-7B-Instruct-v0.3Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	5.75
37	Phi-3.5-mini-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.56
38	Hermes-3-Llama-3.2-3BOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.54
39	gemma-3-1b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	5.46
40	Bielik-7B-Instruct-v0.1Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.40
41	dolphin-2.9.1-llama-3-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.24
42	Llama-3.2-3B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	4.95
43	Polka-Mistral-7B-SFTOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.43
44	Qwen2.5-1.5B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	3.30
45	EuroLLM-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.01
46	trurl-2-7bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	2.75
47	Mistral-7B-Instruct-v0.2Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	2.05
48	granite-3.0-2b-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	2.03
49	Llama-3.2-1B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	1.61
50	SmolLM2-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.58

reasoning

50 rows

#	Model	Org	Submitted	Paper / code	reasoning
01	Phi-4Open	Microsoft	Apr 2026	SpeakLeash/MT-Bench-PL	9.55
02	Qwen2.5-32B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	9.10
03	Mistral-Small-3.1-24B-Instruct-2503Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.00
04	aya-expanse-32bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.95
05	Qwen2-72B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	8.85
06	Mistral-Large-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.70
07	Gemma 3 (27B, IT)Open	Google	Apr 2026	SpeakLeash/MT-Bench-PL	8.40
08	Bielik-11B-v2.3-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.35
09	Mistral-Small-Instruct-2409Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.90
10	Mistral-Small-24B-Instruct-2501Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.90
11	gemma-3-12b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	7.75
12	Qwen2.5-14B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	7.55
13	Bielik-11B-v2.2-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.90
14	aya-expanse-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.85
15	Gemma-2-27b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	6.85
16	Mixtral-8x22bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	6.30
17	Bielik-11B-v2.1-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.25
18	gemma-3-4b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	6.25
19	Bielik-7B-Instruct-v0.1Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.15
20	Meta-Llama-3.1-70B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	6.15
21	Bielik-11B-v2.0-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.05
22	Mistral-Nemo-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	5.80
23	Mixtral-8x7bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	5.80
24	Meta-Llama-3.1-405B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	5.80
25	openchat-3.5-0106-gemmaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.40
26	Llama-PLLuM-8B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	5.35
27	Llama-PLLuM-70B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	5.20
28	GPT-3.5-turboOpen	OpenAI	Apr 2026	SpeakLeash/MT-Bench-PL	5.20
29	Mistral-7B-Instruct-v0.2Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	5.00
30	PLLuM-8x7B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	4.95
31	Phi-3.5-mini-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.95
32	PLLuM-8x7B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	4.90
33	PLLuM-12B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	4.80
34	Qwen2.5-3B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	4.25
35	PLLuM-12B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	3.90
36	openchat-3.5-0106Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.90
37	Starling-LM-7B-alphaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.90
38	Mistral-7B-Instruct-v0.3Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	3.80
39	gemma-3-1b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	3.50
40	dolphin-2.9.1-llama-3-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.30
41	Hermes-3-Llama-3.2-3BOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.10
42	Llama-3.2-3B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	2.70
43	EuroLLM-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	2.65
44	Qwen2.5-1.5B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	2.60
45	Meta-Llama-3.1-8B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	2.50
46	Polka-Mistral-7B-SFTOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	2.45
47	trurl-2-7bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	2.05
48	granite-3.0-2b-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.55
49	Llama-3.2-1B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	1.30
50	SmolLM2-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.10

roleplay

50 rows

#	Model	Org	Submitted	Paper / code	roleplay
01	Gemma 3 (27B, IT)Open	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.95
02	aya-expanse-32bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.70
03	gemma-3-4b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.45
04	gemma-3-12b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.45
05	Bielik-11B-v2.1-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.45
06	Mistral-Small-3.1-24B-Instruct-2503Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.40
07	aya-expanse-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.25
08	Qwen2-72B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	9.20
09	Phi-4Open	Microsoft	Apr 2026	SpeakLeash/MT-Bench-PL	9.20
10	Mixtral-8x22bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.05
11	Mistral-Small-24B-Instruct-2501Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.05
12	Bielik-11B-v2.2-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.03
13	Mixtral-8x7bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.95
14	Mistral-Small-Instruct-2409Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.90
15	Meta-Llama-3.1-70B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	8.80
16	Bielik-11B-v2.3-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.75
17	Meta-Llama-3.1-405B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	8.70
18	Gemma-2-27b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	8.70
19	Mistral-Large-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.70
20	GPT-3.5-turboOpen	OpenAI	Apr 2026	SpeakLeash/MT-Bench-PL	8.65
21	Mistral-7B-Instruct-v0.2Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.65
22	Qwen2.5-14B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	8.50
23	Qwen2.5-32B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	8.30
24	openchat-3.5-0106-gemmaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.97
25	Bielik-7B-Instruct-v0.1Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.83
26	Bielik-11B-v2.0-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.75
27	Mistral-Nemo-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.45
28	dolphin-2.9.1-llama-3-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.40
29	Mistral-7B-Instruct-v0.3Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.25
30	PLLuM-8x7B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.90
31	Starling-LM-7B-alphaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.90
32	PLLuM-12B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.75
33	Hermes-3-Llama-3.2-3BOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.75
34	Llama-PLLuM-70B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.60
35	Qwen2.5-3B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	6.55
36	gemma-3-1b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	6.25
37	PLLuM-8x7B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.25
38	Llama-PLLuM-8B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	6.15
39	openchat-3.5-0106Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.00
40	Meta-Llama-3.1-8B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	5.60
41	Llama-3.2-3B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	5.30
42	PLLuM-12B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	5.00
43	Polka-Mistral-7B-SFTOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.90
44	Phi-3.5-mini-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.65
45	EuroLLM-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.60
46	trurl-2-7bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.30
47	Qwen2.5-1.5B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	2.55
48	Llama-3.2-1B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	1.65
49	granite-3.0-2b-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.30
50	SmolLM2-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.00

stem

50 rows

#	Model	Org	Submitted	Paper / code	stem
01	gemma-3-12b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	10
02	Phi-4Open	Microsoft	Apr 2026	SpeakLeash/MT-Bench-PL	10
03	aya-expanse-32bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.95
04	Gemma 3 (27B, IT)Open	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.95
05	Mistral-Small-3.1-24B-Instruct-2503Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.90
06	Gemma-2-27b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.80
07	aya-expanse-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.75
08	Qwen2.5-32B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	9.70
09	gemma-3-4b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.65
10	Mistral-Small-Instruct-2409Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.65
11	Qwen2.5-14B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	9.60
12	Meta-Llama-3.1-70B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	9.55
13	Qwen2-72B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	9.55
14	Mistral-Small-24B-Instruct-2501Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.50
15	Bielik-11B-v2.2-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.45
16	Mistral-Large-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.35
17	GPT-3.5-turboOpen	OpenAI	Apr 2026	SpeakLeash/MT-Bench-PL	9.25
18	Mixtral-8x22bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.25
19	PLLuM-12B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	9.10
20	Bielik-11B-v2.3-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.97
21	PLLuM-8x7B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	8.90
22	Bielik-11B-v2.1-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.90
23	Starling-LM-7B-alphaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.85
24	Bielik-11B-v2.0-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.78
25	Meta-Llama-3.1-405B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	8.65
26	Mixtral-8x7bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.55
27	openchat-3.5-0106-gemmaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.47
28	openchat-3.5-0106Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.40
29	Mistral-Nemo-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.30
30	Llama-PLLuM-70B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	8.20
31	PLLuM-8x7B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	8.20
32	PLLuM-12B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	8.00
33	Mistral-7B-Instruct-v0.2Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.85
34	Llama-PLLuM-8B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	7.50
35	Mistral-7B-Instruct-v0.3Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.45
36	gemma-3-1b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	7.10
37	Hermes-3-Llama-3.2-3BOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.95
38	Bielik-7B-Instruct-v0.1Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.90
39	Phi-3.5-mini-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.85
40	Polka-Mistral-7B-SFTOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.80
41	Qwen2.5-3B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	6.75
42	dolphin-2.9.1-llama-3-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.35
43	Meta-Llama-3.1-8B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	6.30
44	Llama-3.2-3B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	4.85
45	EuroLLM-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.65
46	trurl-2-7bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	2.65
47	Qwen2.5-1.5B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	2.15
48	granite-3.0-2b-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.45
49	SmolLM2-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.35
50	Llama-3.2-1B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	1.30

writing

50 rows

#	Model	Org	Submitted	Paper / code	writing
01	Gemma 3 (27B, IT)Open	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.70
02	aya-expanse-32bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.60
03	Bielik-11B-v2.3-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.50
04	Bielik-11B-v2.1-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.50
05	Mixtral-8x7bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.35
06	Bielik-11B-v2.2-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.35
07	gemma-3-12b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.30
08	gemma-3-4b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	9.30
09	aya-expanse-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	9.30
10	Mixtral-8x22bOpen	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.25
11	Phi-4Open	Microsoft	Apr 2026	SpeakLeash/MT-Bench-PL	9.25
12	Meta-Llama-3.1-405B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	9.20
13	Mistral-Small-3.1-24B-Instruct-2503Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	9.15
14	GPT-3.5-turboOpen	OpenAI	Apr 2026	SpeakLeash/MT-Bench-PL	9.10
15	Meta-Llama-3.1-70B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	9.10
16	Mistral-Small-Instruct-2409Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.80
17	Bielik-11B-v2.0-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	8.75
18	Gemma-2-27b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	8.75
19	Qwen2-72B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	8.75
20	Mistral-Large-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	8.70
21	Qwen2.5-32B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	8.65
22	Llama-PLLuM-70B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	8.05
23	PLLuM-12B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	8.00
24	Mistral-Small-24B-Instruct-2501Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.95
25	Bielik-7B-Instruct-v0.1Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.85
26	Qwen2.5-14B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	7.75
27	openchat-3.5-0106Open	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.75
28	Meta-Llama-3.1-8B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	7.70
29	Mistral-7B-Instruct-v0.2Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.70
30	PLLuM-12B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	7.55
31	Starling-LM-7B-alphaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.55
32	PLLuM-8x7B-nc-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	7.40
33	Mistral-7B-Instruct-v0.3Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	7.35
34	Llama-PLLuM-8B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	7.20
35	PLLuM-8x7B-chatOpen	PLLuM	Apr 2026	SpeakLeash/MT-Bench-PL	7.10
36	openchat-3.5-0106-gemmaOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	7.05
37	Mistral-Nemo-Instruct-2407Open	Mistral	Apr 2026	SpeakLeash/MT-Bench-PL	6.40
38	gemma-3-1b-itOpen	Google	Apr 2026	SpeakLeash/MT-Bench-PL	6.05
39	Hermes-3-Llama-3.2-3BOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	6.00
40	Qwen2.5-3B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	5.55
41	dolphin-2.9.1-llama-3-8bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.50
42	Polka-Mistral-7B-SFTOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	5.25
43	Phi-3.5-mini-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	4.65
44	Llama-3.2-3B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	4.45
45	EuroLLM-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.90
46	trurl-2-7bOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	3.15
47	Qwen2.5-1.5B-InstructOpen	Alibaba	Apr 2026	SpeakLeash/MT-Bench-PL	2.70
48	granite-3.0-2b-instructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	2.10
49	Llama-3.2-1B-InstructOpen	Meta	Apr 2026	SpeakLeash/MT-Bench-PL	1.40
50	SmolLM2-1.7B-InstructOpen	—	Apr 2026	SpeakLeash/MT-Bench-PL	1.05

Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.

§ 03 · Progress

1 steps
of state of the art.

Each row below marks a model that broke the previous record on pl-score. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · pl-score

Apr 2, 2026Gemma 3 (27B, IT)Google9.28

Fig 3 · SOTA-setting models only. 1 entries span Apr 2026 → Apr 2026.

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

Polish Multi-Turn Benchmark.

Best published scores.

1 stepsof state of the art.

Have a score that beatsthis table?

1 steps
of state of the art.

Have a score that beats
this table?