Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Natural Language Processing · Polish Conversation Quality · Polish MT-BenchTasks/Natural Language Processing/Polish Conversation Quality
Polish Conversation Quality · benchmark dataset · 2025 · PL

Polish Multi-Turn Benchmark.

Polish adaptation of MT-Bench evaluating LLMs on multi-turn conversation quality across 8 categories: coding, extraction, humanities, math, reasoning, roleplay, STEM, and writing. Scores on a 1-10 scale judged by GPT-4. Created by SpeakLeash.

Paper Download datasetSubmit a result
§ 01 · Leaderboard

Best published scores.

450 results indexed across 9 metrics. Shaded row marks current SOTA; ties broken by submission date.


Primary
pl-score · higher is better
All metrics
coding, extraction, humanities, math, pl-score, reasoning, roleplay, stem, writing
coding
50 rows
#ModelOrgSubmittedPaper / codecoding
01Mistral-Small-3.1-24B-Instruct-2503OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.30
02gemma-3-12b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL8.25
03Gemma 3 (27B, IT)OSSGoogleApr 2026SpeakLeash/MT-Bench-PL8.10
04Qwen2.5-32B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL7.95
05Mistral-Small-24B-Instruct-2501OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.95
06Qwen2-72B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL7.80
07Phi-4MicrosoftApr 2026SpeakLeash/MT-Bench-PL7.60
08Gemma-2-27b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL7.45
09Meta-Llama-3.1-405B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL7.25
10Mistral-Small-Instruct-2409OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.10
11Mistral-Large-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL6.75
12Qwen2.5-14B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL6.70
13Mixtral-8x22bOSSMistralApr 2026SpeakLeash/MT-Bench-PL6.45
14Meta-Llama-3.1-70B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL6.25
15Bielik-11B-v2.3-InstructOSSApr 2026SpeakLeash/MT-Bench-PL6.25
16GPT-3.5-turboOSSOpenAIApr 2026SpeakLeash/MT-Bench-PL6.00
17Mistral-Nemo-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL5.85
18aya-expanse-32bOSSApr 2026SpeakLeash/MT-Bench-PL5.75
19Bielik-11B-v2.0-InstructOSSApr 2026SpeakLeash/MT-Bench-PL5.60
20gemma-3-4b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL5.40
21Bielik-11B-v2.1-InstructOSSApr 2026SpeakLeash/MT-Bench-PL5.40
22openchat-3.5-0106-gemmaOSSApr 2026SpeakLeash/MT-Bench-PL5.35
23Mixtral-8x7bOSSMistralApr 2026SpeakLeash/MT-Bench-PL5.20
24openchat-3.5-0106OSSApr 2026SpeakLeash/MT-Bench-PL5.05
25Bielik-11B-v2.2-InstructOSSApr 2026SpeakLeash/MT-Bench-PL5.05
26Qwen2.5-3B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL5.00
27aya-expanse-8bOSSApr 2026SpeakLeash/MT-Bench-PL4.90
28Llama-PLLuM-70B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL4.80
29Starling-LM-7B-alphaOSSApr 2026SpeakLeash/MT-Bench-PL4.75
30Meta-Llama-3.1-8B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL4.60
31dolphin-2.9.1-llama-3-8bOSSApr 2026SpeakLeash/MT-Bench-PL4.60
32PLLuM-12B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL4.55
33PLLuM-8x7B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL4.55
34Hermes-3-Llama-3.2-3BOSSApr 2026SpeakLeash/MT-Bench-PL4.45
35Llama-3.2-3B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL4.40
36Mistral-7B-Instruct-v0.3OSSMistralApr 2026SpeakLeash/MT-Bench-PL4.30
37Mistral-7B-Instruct-v0.2OSSMistralApr 2026SpeakLeash/MT-Bench-PL4.25
38Phi-3.5-mini-instructOSSApr 2026SpeakLeash/MT-Bench-PL4.20
39PLLuM-8x7B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL4.10
40Qwen2.5-1.5B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL3.95
41Llama-PLLuM-8B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL3.65
42gemma-3-1b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL3.35
43PLLuM-12B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL3.05
44granite-3.0-2b-instructOSSApr 2026SpeakLeash/MT-Bench-PL3.05
45Bielik-7B-Instruct-v0.1OSSApr 2026SpeakLeash/MT-Bench-PL3.00
46Polka-Mistral-7B-SFTOSSApr 2026SpeakLeash/MT-Bench-PL2.95
47trurl-2-7bOSSApr 2026SpeakLeash/MT-Bench-PL1.80
48SmolLM2-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL1.75
49EuroLLM-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL1.70
50Llama-3.2-1B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL1.65
extraction
50 rows
#ModelOrgSubmittedPaper / codeextraction
01Mistral-Large-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.90
02Qwen2.5-32B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL9.90
03Gemma 3 (27B, IT)OSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.90
04Mistral-Small-24B-Instruct-2501OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.90
05Meta-Llama-3.1-70B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL9.85
06Meta-Llama-3.1-405B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL9.85
07Qwen2-72B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL9.80
08Mistral-Small-3.1-24B-Instruct-2503OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.80
09Gemma-2-27b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.60
10gemma-3-12b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.55
11Mixtral-8x22bOSSMistralApr 2026SpeakLeash/MT-Bench-PL9.55
12Llama-PLLuM-70B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL9.45
13Bielik-11B-v2.3-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.43
14Phi-4MicrosoftApr 2026SpeakLeash/MT-Bench-PL9.30
15Bielik-11B-v2.2-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.30
16Qwen2.5-14B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL9.25
17Mistral-Small-Instruct-2409OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.15
18Bielik-11B-v2.1-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.13
19Meta-Llama-3.1-8B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL9.10
20Mistral-Nemo-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.95
21Bielik-11B-v2.0-InstructOSSApr 2026SpeakLeash/MT-Bench-PL8.65
22Qwen2.5-3B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL8.45
23gemma-3-4b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL8.40
24PLLuM-8x7B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL8.40
25aya-expanse-32bOSSApr 2026SpeakLeash/MT-Bench-PL8.40
26Mixtral-8x7bOSSMistralApr 2026SpeakLeash/MT-Bench-PL8.15
27GPT-3.5-turboOSSOpenAIApr 2026SpeakLeash/MT-Bench-PL8.15
28aya-expanse-8bOSSApr 2026SpeakLeash/MT-Bench-PL8.05
29PLLuM-8x7B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL8.00
30Mistral-7B-Instruct-v0.2OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.40
31Starling-LM-7B-alphaOSSApr 2026SpeakLeash/MT-Bench-PL7.35
32Mistral-7B-Instruct-v0.3OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.30
33PLLuM-12B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL7.20
34openchat-3.5-0106-gemmaOSSApr 2026SpeakLeash/MT-Bench-PL6.90
35openchat-3.5-0106OSSApr 2026SpeakLeash/MT-Bench-PL6.90
36Phi-3.5-mini-instructOSSApr 2026SpeakLeash/MT-Bench-PL6.80
37PLLuM-12B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.55
38Llama-PLLuM-8B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.30
39Llama-3.2-3B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL6.22
40dolphin-2.9.1-llama-3-8bOSSApr 2026SpeakLeash/MT-Bench-PL6.15
41Qwen2.5-1.5B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL5.75
42Hermes-3-Llama-3.2-3BOSSApr 2026SpeakLeash/MT-Bench-PL5.30
43Polka-Mistral-7B-SFTOSSApr 2026SpeakLeash/MT-Bench-PL5.25
44gemma-3-1b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL4.87
45Bielik-7B-Instruct-v0.1OSSApr 2026SpeakLeash/MT-Bench-PL4.35
46trurl-2-7bOSSApr 2026SpeakLeash/MT-Bench-PL3.50
47granite-3.0-2b-instructOSSApr 2026SpeakLeash/MT-Bench-PL3.45
48SmolLM2-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL2.75
49EuroLLM-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL2.25
50Llama-3.2-1B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL1.60
humanities
50 rows
#ModelOrgSubmittedPaper / codehumanities
01gemma-3-12b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL10
02Mistral-Small-Instruct-2409OSSMistralApr 2026SpeakLeash/MT-Bench-PL10
03aya-expanse-32bOSSApr 2026SpeakLeash/MT-Bench-PL10
04Gemma-2-27b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL10
05Gemma 3 (27B, IT)OSSGoogleApr 2026SpeakLeash/MT-Bench-PL10
06Mistral-Small-3.1-24B-Instruct-2503OSSMistralApr 2026SpeakLeash/MT-Bench-PL10
07Phi-4MicrosoftApr 2026SpeakLeash/MT-Bench-PL9.95
08gemma-3-4b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.90
09Qwen2-72B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL9.75
10GPT-3.5-turboOSSOpenAIApr 2026SpeakLeash/MT-Bench-PL9.75
11Mistral-Small-24B-Instruct-2501OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.70
12aya-expanse-8bOSSApr 2026SpeakLeash/MT-Bench-PL9.65
13Qwen2.5-32B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL9.65
14Meta-Llama-3.1-405B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL9.65
15Mistral-Nemo-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.50
16PLLuM-12B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL9.50
17Bielik-11B-v2.3-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.50
18Llama-PLLuM-8B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL9.50
19Meta-Llama-3.1-70B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL9.50
20Mixtral-8x7bOSSMistralApr 2026SpeakLeash/MT-Bench-PL9.45
21Bielik-11B-v2.0-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.43
22Mistral-Large-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.40
23Bielik-11B-v2.2-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.40
24openchat-3.5-0106OSSApr 2026SpeakLeash/MT-Bench-PL9.30
25PLLuM-12B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL9.30
26Bielik-11B-v2.1-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.20
27Qwen2.5-14B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL9.18
28Mixtral-8x22bOSSMistralApr 2026SpeakLeash/MT-Bench-PL9.10
29Meta-Llama-3.1-8B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL8.82
30dolphin-2.9.1-llama-3-8bOSSApr 2026SpeakLeash/MT-Bench-PL8.80
31openchat-3.5-0106-gemmaOSSApr 2026SpeakLeash/MT-Bench-PL8.80
32Llama-PLLuM-70B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL8.80
33PLLuM-8x7B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL8.60
34gemma-3-1b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL8.50
35Starling-LM-7B-alphaOSSApr 2026SpeakLeash/MT-Bench-PL8.50
36Bielik-7B-Instruct-v0.1OSSApr 2026SpeakLeash/MT-Bench-PL8.47
37Mistral-7B-Instruct-v0.2OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.40
38Hermes-3-Llama-3.2-3BOSSApr 2026SpeakLeash/MT-Bench-PL8.05
39Phi-3.5-mini-instructOSSApr 2026SpeakLeash/MT-Bench-PL7.90
40Qwen2.5-3B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL7.85
41PLLuM-8x7B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL7.47
42Llama-3.2-3B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL7.15
43Mistral-7B-Instruct-v0.3OSSMistralApr 2026SpeakLeash/MT-Bench-PL6.75
44Polka-Mistral-7B-SFTOSSApr 2026SpeakLeash/MT-Bench-PL5.60
45trurl-2-7bOSSApr 2026SpeakLeash/MT-Bench-PL3.95
46Qwen2.5-1.5B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL3.45
47EuroLLM-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL3.25
48SmolLM2-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL1.85
49granite-3.0-2b-instructOSSApr 2026SpeakLeash/MT-Bench-PL1.45
50Llama-3.2-1B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL1.40
math
50 rows
#ModelOrgSubmittedPaper / codemath
01Gemma 3 (27B, IT)OSSGoogleApr 2026SpeakLeash/MT-Bench-PL8.25
02Qwen2.5-14B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL8.10
03Mistral-Small-3.1-24B-Instruct-2503OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.85
04Mistral-Small-24B-Instruct-2501OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.83
05Gemma-2-27b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL7.80
06Mistral-Large-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.80
07Phi-4MicrosoftApr 2026SpeakLeash/MT-Bench-PL7.70
08Bielik-11B-v2.3-InstructOSSApr 2026SpeakLeash/MT-Bench-PL7.70
09Qwen2.5-32B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL7.60
10gemma-3-12b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL7.45
11gemma-3-4b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL7.40
12Mistral-Small-Instruct-2409OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.00
13Mixtral-8x22bOSSMistralApr 2026SpeakLeash/MT-Bench-PL6.90
14GPT-3.5-turboOSSOpenAIApr 2026SpeakLeash/MT-Bench-PL6.85
15Mistral-Nemo-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL6.70
16aya-expanse-32bOSSApr 2026SpeakLeash/MT-Bench-PL6.60
17Qwen2-72B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL6.50
18Bielik-11B-v2.2-InstructOSSApr 2026SpeakLeash/MT-Bench-PL6.45
19Qwen2.5-3B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL6.40
20Meta-Llama-3.1-405B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL6.25
21Bielik-11B-v2.1-InstructOSSApr 2026SpeakLeash/MT-Bench-PL6.15
22Meta-Llama-3.1-70B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL6.00
23Mixtral-8x7bOSSMistralApr 2026SpeakLeash/MT-Bench-PL5.65
24Bielik-11B-v2.0-InstructOSSApr 2026SpeakLeash/MT-Bench-PL5.50
25Meta-Llama-3.1-8B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL5.30
26dolphin-2.9.1-llama-3-8bOSSApr 2026SpeakLeash/MT-Bench-PL4.80
27openchat-3.5-0106-gemmaOSSApr 2026SpeakLeash/MT-Bench-PL4.55
28Phi-3.5-mini-instructOSSApr 2026SpeakLeash/MT-Bench-PL4.50
29Llama-3.2-3B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL4.50
30aya-expanse-8bOSSApr 2026SpeakLeash/MT-Bench-PL4.35
31Starling-LM-7B-alphaOSSApr 2026SpeakLeash/MT-Bench-PL4.15
32Bielik-7B-Instruct-v0.1OSSApr 2026SpeakLeash/MT-Bench-PL4.10
33gemma-3-1b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL4.05
34openchat-3.5-0106OSSApr 2026SpeakLeash/MT-Bench-PL3.80
35Hermes-3-Llama-3.2-3BOSSApr 2026SpeakLeash/MT-Bench-PL3.70
36PLLuM-8x7B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL3.45
37Qwen2.5-1.5B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL3.45
38PLLuM-8x7B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL3.35
39Mistral-7B-Instruct-v0.2OSSMistralApr 2026SpeakLeash/MT-Bench-PL3.20
40Polka-Mistral-7B-SFTOSSApr 2026SpeakLeash/MT-Bench-PL2.95
41Llama-PLLuM-70B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL2.90
42Llama-PLLuM-8B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL2.75
43PLLuM-12B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL2.65
44Llama-3.2-1B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL2.60
45Mistral-7B-Instruct-v0.3OSSMistralApr 2026SpeakLeash/MT-Bench-PL2.35
46PLLuM-12B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL2.30
47granite-3.0-2b-instructOSSApr 2026SpeakLeash/MT-Bench-PL1.95
48SmolLM2-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL1.80
49trurl-2-7bOSSApr 2026SpeakLeash/MT-Bench-PL1.70
50EuroLLM-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL1.10
pl-score· primary
50 rows
#ModelOrgSubmittedPaper / codepl-score
01Gemma 3 (27B, IT)OSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.28
02Mistral-Small-3.1-24B-Instruct-2503OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.18
03Phi-4MicrosoftApr 2026SpeakLeash/MT-Bench-PL9.07
04gemma-3-12b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL8.97
05Qwen2.5-32B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL8.86
06Qwen2-72B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL8.78
07Mistral-Small-24B-Instruct-2501OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.72
08Mistral-Large-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.66
09Gemma-2-27b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL8.62
10aya-expanse-32bOSSApr 2026SpeakLeash/MT-Bench-PL8.62
11Mistral-Small-Instruct-2409OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.56
12Bielik-11B-v2.3-InstructOSSApr 2026SpeakLeash/MT-Bench-PL8.56
13Qwen2.5-14B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL8.33
14Mixtral-8x22bOSSMistralApr 2026SpeakLeash/MT-Bench-PL8.23
15gemma-3-4b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL8.22
16Meta-Llama-3.1-405B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL8.17
17Meta-Llama-3.1-70B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL8.15
18Bielik-11B-v2.2-InstructOSSApr 2026SpeakLeash/MT-Bench-PL8.12
19Bielik-11B-v2.1-InstructOSSApr 2026SpeakLeash/MT-Bench-PL8.00
20aya-expanse-8bOSSApr 2026SpeakLeash/MT-Bench-PL7.76
21GPT-3.5-turboOSSOpenAIApr 2026SpeakLeash/MT-Bench-PL7.72
22Mixtral-8x7bOSSMistralApr 2026SpeakLeash/MT-Bench-PL7.64
23Bielik-11B-v2.0-InstructOSSApr 2026SpeakLeash/MT-Bench-PL7.56
24Mistral-Nemo-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.37
25Llama-PLLuM-70B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.75
26openchat-3.5-0106-gemmaOSSApr 2026SpeakLeash/MT-Bench-PL6.51
27PLLuM-12B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.47
28PLLuM-8x7B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.43
29Qwen2.5-3B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL6.35
30PLLuM-8x7B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.30
31Meta-Llama-3.1-8B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL6.24
32Llama-PLLuM-8B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.05
33Starling-LM-7B-alphaOSSApr 2026SpeakLeash/MT-Bench-PL6.05
34openchat-3.5-0106OSSApr 2026SpeakLeash/MT-Bench-PL6.03
35PLLuM-12B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL5.81
36Mistral-7B-Instruct-v0.3OSSMistralApr 2026SpeakLeash/MT-Bench-PL5.75
37Phi-3.5-mini-instructOSSApr 2026SpeakLeash/MT-Bench-PL5.56
38Hermes-3-Llama-3.2-3BOSSApr 2026SpeakLeash/MT-Bench-PL5.54
39gemma-3-1b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL5.46
40Bielik-7B-Instruct-v0.1OSSApr 2026SpeakLeash/MT-Bench-PL5.40
41dolphin-2.9.1-llama-3-8bOSSApr 2026SpeakLeash/MT-Bench-PL5.24
42Llama-3.2-3B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL4.95
43Polka-Mistral-7B-SFTOSSApr 2026SpeakLeash/MT-Bench-PL4.43
44Qwen2.5-1.5B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL3.30
45EuroLLM-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL3.01
46trurl-2-7bOSSApr 2026SpeakLeash/MT-Bench-PL2.75
47Mistral-7B-Instruct-v0.2OSSMistralApr 2026SpeakLeash/MT-Bench-PL2.05
48granite-3.0-2b-instructOSSApr 2026SpeakLeash/MT-Bench-PL2.03
49Llama-3.2-1B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL1.61
50SmolLM2-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL1.58
reasoning
50 rows
#ModelOrgSubmittedPaper / codereasoning
01Phi-4MicrosoftApr 2026SpeakLeash/MT-Bench-PL9.55
02Qwen2.5-32B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL9.10
03Mistral-Small-3.1-24B-Instruct-2503OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.00
04aya-expanse-32bOSSApr 2026SpeakLeash/MT-Bench-PL8.95
05Qwen2-72B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL8.85
06Mistral-Large-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.70
07Gemma 3 (27B, IT)OSSGoogleApr 2026SpeakLeash/MT-Bench-PL8.40
08Bielik-11B-v2.3-InstructOSSApr 2026SpeakLeash/MT-Bench-PL8.35
09Mistral-Small-Instruct-2409OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.90
10Mistral-Small-24B-Instruct-2501OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.90
11gemma-3-12b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL7.75
12Qwen2.5-14B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL7.55
13Bielik-11B-v2.2-InstructOSSApr 2026SpeakLeash/MT-Bench-PL6.90
14aya-expanse-8bOSSApr 2026SpeakLeash/MT-Bench-PL6.85
15Gemma-2-27b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL6.85
16Mixtral-8x22bOSSMistralApr 2026SpeakLeash/MT-Bench-PL6.30
17Bielik-11B-v2.1-InstructOSSApr 2026SpeakLeash/MT-Bench-PL6.25
18gemma-3-4b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL6.25
19Bielik-7B-Instruct-v0.1OSSApr 2026SpeakLeash/MT-Bench-PL6.15
20Meta-Llama-3.1-70B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL6.15
21Bielik-11B-v2.0-InstructOSSApr 2026SpeakLeash/MT-Bench-PL6.05
22Mistral-Nemo-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL5.80
23Mixtral-8x7bOSSMistralApr 2026SpeakLeash/MT-Bench-PL5.80
24Meta-Llama-3.1-405B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL5.80
25openchat-3.5-0106-gemmaOSSApr 2026SpeakLeash/MT-Bench-PL5.40
26Llama-PLLuM-8B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL5.35
27Llama-PLLuM-70B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL5.20
28GPT-3.5-turboOSSOpenAIApr 2026SpeakLeash/MT-Bench-PL5.20
29Mistral-7B-Instruct-v0.2OSSMistralApr 2026SpeakLeash/MT-Bench-PL5.00
30PLLuM-8x7B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL4.95
31Phi-3.5-mini-instructOSSApr 2026SpeakLeash/MT-Bench-PL4.95
32PLLuM-8x7B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL4.90
33PLLuM-12B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL4.80
34Qwen2.5-3B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL4.25
35PLLuM-12B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL3.90
36openchat-3.5-0106OSSApr 2026SpeakLeash/MT-Bench-PL3.90
37Starling-LM-7B-alphaOSSApr 2026SpeakLeash/MT-Bench-PL3.90
38Mistral-7B-Instruct-v0.3OSSMistralApr 2026SpeakLeash/MT-Bench-PL3.80
39gemma-3-1b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL3.50
40dolphin-2.9.1-llama-3-8bOSSApr 2026SpeakLeash/MT-Bench-PL3.30
41Hermes-3-Llama-3.2-3BOSSApr 2026SpeakLeash/MT-Bench-PL3.10
42Llama-3.2-3B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL2.70
43EuroLLM-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL2.65
44Qwen2.5-1.5B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL2.60
45Meta-Llama-3.1-8B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL2.50
46Polka-Mistral-7B-SFTOSSApr 2026SpeakLeash/MT-Bench-PL2.45
47trurl-2-7bOSSApr 2026SpeakLeash/MT-Bench-PL2.05
48granite-3.0-2b-instructOSSApr 2026SpeakLeash/MT-Bench-PL1.55
49Llama-3.2-1B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL1.30
50SmolLM2-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL1.10
roleplay
50 rows
#ModelOrgSubmittedPaper / coderoleplay
01Gemma 3 (27B, IT)OSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.95
02aya-expanse-32bOSSApr 2026SpeakLeash/MT-Bench-PL9.70
03gemma-3-4b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.45
04gemma-3-12b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.45
05Bielik-11B-v2.1-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.45
06Mistral-Small-3.1-24B-Instruct-2503OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.40
07aya-expanse-8bOSSApr 2026SpeakLeash/MT-Bench-PL9.25
08Qwen2-72B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL9.20
09Phi-4MicrosoftApr 2026SpeakLeash/MT-Bench-PL9.20
10Mixtral-8x22bOSSMistralApr 2026SpeakLeash/MT-Bench-PL9.05
11Mistral-Small-24B-Instruct-2501OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.05
12Bielik-11B-v2.2-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.03
13Mixtral-8x7bOSSMistralApr 2026SpeakLeash/MT-Bench-PL8.95
14Mistral-Small-Instruct-2409OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.90
15Meta-Llama-3.1-70B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL8.80
16Bielik-11B-v2.3-InstructOSSApr 2026SpeakLeash/MT-Bench-PL8.75
17Meta-Llama-3.1-405B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL8.70
18Gemma-2-27b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL8.70
19Mistral-Large-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.70
20GPT-3.5-turboOSSOpenAIApr 2026SpeakLeash/MT-Bench-PL8.65
21Mistral-7B-Instruct-v0.2OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.65
22Qwen2.5-14B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL8.50
23Qwen2.5-32B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL8.30
24openchat-3.5-0106-gemmaOSSApr 2026SpeakLeash/MT-Bench-PL7.97
25Bielik-7B-Instruct-v0.1OSSApr 2026SpeakLeash/MT-Bench-PL7.83
26Bielik-11B-v2.0-InstructOSSApr 2026SpeakLeash/MT-Bench-PL7.75
27Mistral-Nemo-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.45
28dolphin-2.9.1-llama-3-8bOSSApr 2026SpeakLeash/MT-Bench-PL7.40
29Mistral-7B-Instruct-v0.3OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.25
30PLLuM-8x7B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.90
31Starling-LM-7B-alphaOSSApr 2026SpeakLeash/MT-Bench-PL6.90
32PLLuM-12B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.75
33Hermes-3-Llama-3.2-3BOSSApr 2026SpeakLeash/MT-Bench-PL6.75
34Llama-PLLuM-70B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.60
35Qwen2.5-3B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL6.55
36gemma-3-1b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL6.25
37PLLuM-8x7B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.25
38Llama-PLLuM-8B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL6.15
39openchat-3.5-0106OSSApr 2026SpeakLeash/MT-Bench-PL6.00
40Meta-Llama-3.1-8B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL5.60
41Llama-3.2-3B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL5.30
42PLLuM-12B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL5.00
43Polka-Mistral-7B-SFTOSSApr 2026SpeakLeash/MT-Bench-PL4.90
44Phi-3.5-mini-instructOSSApr 2026SpeakLeash/MT-Bench-PL4.65
45EuroLLM-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL4.60
46trurl-2-7bOSSApr 2026SpeakLeash/MT-Bench-PL3.30
47Qwen2.5-1.5B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL2.55
48Llama-3.2-1B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL1.65
49granite-3.0-2b-instructOSSApr 2026SpeakLeash/MT-Bench-PL1.30
50SmolLM2-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL1.00
stem
50 rows
#ModelOrgSubmittedPaper / codestem
01gemma-3-12b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL10
02Phi-4MicrosoftApr 2026SpeakLeash/MT-Bench-PL10
03aya-expanse-32bOSSApr 2026SpeakLeash/MT-Bench-PL9.95
04Gemma 3 (27B, IT)OSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.95
05Mistral-Small-3.1-24B-Instruct-2503OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.90
06Gemma-2-27b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.80
07aya-expanse-8bOSSApr 2026SpeakLeash/MT-Bench-PL9.75
08Qwen2.5-32B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL9.70
09gemma-3-4b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.65
10Mistral-Small-Instruct-2409OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.65
11Qwen2.5-14B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL9.60
12Meta-Llama-3.1-70B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL9.55
13Qwen2-72B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL9.55
14Mistral-Small-24B-Instruct-2501OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.50
15Bielik-11B-v2.2-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.45
16Mistral-Large-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.35
17GPT-3.5-turboOSSOpenAIApr 2026SpeakLeash/MT-Bench-PL9.25
18Mixtral-8x22bOSSMistralApr 2026SpeakLeash/MT-Bench-PL9.25
19PLLuM-12B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL9.10
20Bielik-11B-v2.3-InstructOSSApr 2026SpeakLeash/MT-Bench-PL8.97
21PLLuM-8x7B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL8.90
22Bielik-11B-v2.1-InstructOSSApr 2026SpeakLeash/MT-Bench-PL8.90
23Starling-LM-7B-alphaOSSApr 2026SpeakLeash/MT-Bench-PL8.85
24Bielik-11B-v2.0-InstructOSSApr 2026SpeakLeash/MT-Bench-PL8.78
25Meta-Llama-3.1-405B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL8.65
26Mixtral-8x7bOSSMistralApr 2026SpeakLeash/MT-Bench-PL8.55
27openchat-3.5-0106-gemmaOSSApr 2026SpeakLeash/MT-Bench-PL8.47
28openchat-3.5-0106OSSApr 2026SpeakLeash/MT-Bench-PL8.40
29Mistral-Nemo-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.30
30Llama-PLLuM-70B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL8.20
31PLLuM-8x7B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL8.20
32PLLuM-12B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL8.00
33Mistral-7B-Instruct-v0.2OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.85
34Llama-PLLuM-8B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL7.50
35Mistral-7B-Instruct-v0.3OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.45
36gemma-3-1b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL7.10
37Hermes-3-Llama-3.2-3BOSSApr 2026SpeakLeash/MT-Bench-PL6.95
38Bielik-7B-Instruct-v0.1OSSApr 2026SpeakLeash/MT-Bench-PL6.90
39Phi-3.5-mini-instructOSSApr 2026SpeakLeash/MT-Bench-PL6.85
40Polka-Mistral-7B-SFTOSSApr 2026SpeakLeash/MT-Bench-PL6.80
41Qwen2.5-3B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL6.75
42dolphin-2.9.1-llama-3-8bOSSApr 2026SpeakLeash/MT-Bench-PL6.35
43Meta-Llama-3.1-8B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL6.30
44Llama-3.2-3B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL4.85
45EuroLLM-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL4.65
46trurl-2-7bOSSApr 2026SpeakLeash/MT-Bench-PL2.65
47Qwen2.5-1.5B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL2.15
48granite-3.0-2b-instructOSSApr 2026SpeakLeash/MT-Bench-PL1.45
49SmolLM2-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL1.35
50Llama-3.2-1B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL1.30
writing
50 rows
#ModelOrgSubmittedPaper / codewriting
01Gemma 3 (27B, IT)OSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.70
02aya-expanse-32bOSSApr 2026SpeakLeash/MT-Bench-PL9.60
03Bielik-11B-v2.3-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.50
04Bielik-11B-v2.1-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.50
05Mixtral-8x7bOSSMistralApr 2026SpeakLeash/MT-Bench-PL9.35
06Bielik-11B-v2.2-InstructOSSApr 2026SpeakLeash/MT-Bench-PL9.35
07gemma-3-12b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.30
08gemma-3-4b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL9.30
09aya-expanse-8bOSSApr 2026SpeakLeash/MT-Bench-PL9.30
10Mixtral-8x22bOSSMistralApr 2026SpeakLeash/MT-Bench-PL9.25
11Phi-4MicrosoftApr 2026SpeakLeash/MT-Bench-PL9.25
12Meta-Llama-3.1-405B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL9.20
13Mistral-Small-3.1-24B-Instruct-2503OSSMistralApr 2026SpeakLeash/MT-Bench-PL9.15
14GPT-3.5-turboOSSOpenAIApr 2026SpeakLeash/MT-Bench-PL9.10
15Meta-Llama-3.1-70B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL9.10
16Mistral-Small-Instruct-2409OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.80
17Bielik-11B-v2.0-InstructOSSApr 2026SpeakLeash/MT-Bench-PL8.75
18Gemma-2-27b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL8.75
19Qwen2-72B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL8.75
20Mistral-Large-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL8.70
21Qwen2.5-32B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL8.65
22Llama-PLLuM-70B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL8.05
23PLLuM-12B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL8.00
24Mistral-Small-24B-Instruct-2501OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.95
25Bielik-7B-Instruct-v0.1OSSApr 2026SpeakLeash/MT-Bench-PL7.85
26Qwen2.5-14B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL7.75
27openchat-3.5-0106OSSApr 2026SpeakLeash/MT-Bench-PL7.75
28Meta-Llama-3.1-8B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL7.70
29Mistral-7B-Instruct-v0.2OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.70
30PLLuM-12B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL7.55
31Starling-LM-7B-alphaOSSApr 2026SpeakLeash/MT-Bench-PL7.55
32PLLuM-8x7B-nc-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL7.40
33Mistral-7B-Instruct-v0.3OSSMistralApr 2026SpeakLeash/MT-Bench-PL7.35
34Llama-PLLuM-8B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL7.20
35PLLuM-8x7B-chatOSSPLLuMApr 2026SpeakLeash/MT-Bench-PL7.10
36openchat-3.5-0106-gemmaOSSApr 2026SpeakLeash/MT-Bench-PL7.05
37Mistral-Nemo-Instruct-2407OSSMistralApr 2026SpeakLeash/MT-Bench-PL6.40
38gemma-3-1b-itOSSGoogleApr 2026SpeakLeash/MT-Bench-PL6.05
39Hermes-3-Llama-3.2-3BOSSApr 2026SpeakLeash/MT-Bench-PL6.00
40Qwen2.5-3B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL5.55
41dolphin-2.9.1-llama-3-8bOSSApr 2026SpeakLeash/MT-Bench-PL5.50
42Polka-Mistral-7B-SFTOSSApr 2026SpeakLeash/MT-Bench-PL5.25
43Phi-3.5-mini-instructOSSApr 2026SpeakLeash/MT-Bench-PL4.65
44Llama-3.2-3B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL4.45
45EuroLLM-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL3.90
46trurl-2-7bOSSApr 2026SpeakLeash/MT-Bench-PL3.15
47Qwen2.5-1.5B-InstructOSSAlibabaApr 2026SpeakLeash/MT-Bench-PL2.70
48granite-3.0-2b-instructOSSApr 2026SpeakLeash/MT-Bench-PL2.10
49Llama-3.2-1B-InstructOSSMetaApr 2026SpeakLeash/MT-Bench-PL1.40
50SmolLM2-1.7B-InstructOSSApr 2026SpeakLeash/MT-Bench-PL1.05
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 03 · Progress

1 steps
of state of the art.

Each row below marks a model that broke the previous record on pl-score. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · pl-score
  1. Apr 2, 2026Gemma 3 (27B, IT)Google9.28
Fig 3 · SOTA-setting models only. 1 entries span Apr 2026 Apr 2026.
§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies