Polish Conversation Quality2025en
Polish Multi-Turn Benchmark
Polish adaptation of MT-Bench evaluating LLMs on multi-turn conversation quality across 8 categories: coding, extraction, humanities, math, reasoning, roleplay, STEM, and writing. Scores on a 1-10 scale judged by GPT-4. Created by SpeakLeash.
Samples:50
Metrics:pl-score, coding, extraction, humanities, math, reasoning, roleplay, stem, writing
Paper / WebsiteDownloadCurrent State of the Art
gemma-3-27b-it
9.28
pl-score
Polish MT-Bench — pl-score
50 results · 1 SOTA advances · higher is better
All results
SOTA frontier
Top Models Performance Comparison
Top 10 models ranked by pl-score
Best Score
9.3
Top Model
gemma-3-27b-it
Models Compared
10
Score Range
0.660
coding
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Mistral-Small-3.1-24B-Instruct-2503Open Source Mistral | 8.3 | Apr 2026 | |
| 2 | gemma-3-12b-itOpen Source Google | 8.25 | Apr 2026 | |
| 3 | gemma-3-27b-itOpen Source Google | 8.1 | Apr 2026 | |
| 4 | Qwen2.5-32B-InstructOpen Source Alibaba | 7.95 | Apr 2026 | |
| 5 | Mistral-Small-24B-Instruct-2501Open Source Mistral | 7.95 | Apr 2026 | |
| 6 | Qwen2-72B-InstructOpen Source Alibaba | 7.8 | Apr 2026 | |
| 7 | Phi-4 Microsoft | 7.6 | Apr 2026 | |
| 8 | Gemma-2-27b-itOpen Source Google | 7.45 | Apr 2026 | |
| 9 | Meta-Llama-3.1-405B-InstructOpen Source Meta | 7.25 | Apr 2026 | |
| 10 | Mistral-Small-Instruct-2409Open Source Mistral | 7.1 | Apr 2026 | |
| 11 | Mistral-Large-Instruct-2407Open Source Mistral | 6.75 | Apr 2026 | |
| 12 | Qwen2.5-14B-InstructOpen Source Alibaba | 6.7 | Apr 2026 | |
| 13 | Mixtral-8x22bOpen Source Mistral | 6.45 | Apr 2026 | |
| 14 | Meta-Llama-3.1-70B-InstructOpen Source Meta | 6.25 | Apr 2026 | |
| 15 | Bielik-11B-v2.3-InstructOpen Source | 6.25 | Apr 2026 | |
| 16 | GPT-3.5-turboOpen Source OpenAI | 6 | Apr 2026 | |
| 17 | Mistral-Nemo-Instruct-2407Open Source Mistral | 5.85 | Apr 2026 | |
| 18 | aya-expanse-32bOpen Source | 5.75 | Apr 2026 | |
| 19 | Bielik-11B-v2.0-InstructOpen Source | 5.6 | Apr 2026 | |
| 20 | gemma-3-4b-itOpen Source Google | 5.4 | Apr 2026 | |
| 21 | Bielik-11B-v2.1-InstructOpen Source | 5.4 | Apr 2026 | |
| 22 | openchat-3.5-0106-gemmaOpen Source | 5.35 | Apr 2026 | |
| 23 | Mixtral-8x7bOpen Source Mistral | 5.2 | Apr 2026 | |
| 24 | openchat-3.5-0106Open Source | 5.05 | Apr 2026 | |
| 25 | Bielik-11B-v2.2-InstructOpen Source | 5.05 | Apr 2026 | |
| 26 | Qwen2.5-3B-InstructOpen Source Alibaba | 5 | Apr 2026 | |
| 27 | aya-expanse-8bOpen Source | 4.9 | Apr 2026 | |
| 28 | Llama-PLLuM-70B-chatOpen Source PLLuM | 4.8 | Apr 2026 | |
| 29 | Starling-LM-7B-alphaOpen Source | 4.75 | Apr 2026 | |
| 30 | Meta-Llama-3.1-8B-InstructOpen Source Meta | 4.6 | Apr 2026 | |
| 31 | dolphin-2.9.1-llama-3-8bOpen Source | 4.6 | Apr 2026 | |
| 32 | PLLuM-12B-nc-chatOpen Source PLLuM | 4.55 | Apr 2026 | |
| 33 | PLLuM-8x7B-chatOpen Source PLLuM | 4.55 | Apr 2026 | |
| 34 | Hermes-3-Llama-3.2-3BOpen Source | 4.45 | Apr 2026 | |
| 35 | Llama-3.2-3B-InstructOpen Source Meta | 4.4 | Apr 2026 | |
| 36 | Mistral-7B-Instruct-v0.3Open Source Mistral | 4.3 | Apr 2026 | |
| 37 | Mistral-7B-Instruct-v0.2Open Source Mistral | 4.25 | Apr 2026 | |
| 38 | Phi-3.5-mini-instructOpen Source | 4.2 | Apr 2026 | |
| 39 | PLLuM-8x7B-nc-chatOpen Source PLLuM | 4.1 | Apr 2026 | |
| 40 | Qwen2.5-1.5B-InstructOpen Source Alibaba | 3.95 | Apr 2026 | |
| 41 | Llama-PLLuM-8B-chatOpen Source PLLuM | 3.65 | Apr 2026 | |
| 42 | gemma-3-1b-itOpen Source Google | 3.35 | Apr 2026 | |
| 43 | PLLuM-12B-chatOpen Source PLLuM | 3.05 | Apr 2026 | |
| 44 | granite-3.0-2b-instructOpen Source | 3.05 | Apr 2026 | |
| 45 | Bielik-7B-Instruct-v0.1Open Source | 3 | Apr 2026 | |
| 46 | Polka-Mistral-7B-SFTOpen Source | 2.95 | Apr 2026 | |
| 47 | trurl-2-7bOpen Source | 1.8 | Apr 2026 | |
| 48 | SmolLM2-1.7B-InstructOpen Source | 1.75 | Apr 2026 | |
| 49 | EuroLLM-1.7B-InstructOpen Source | 1.7 | Apr 2026 | |
| 50 | Llama-3.2-1B-InstructOpen Source Meta | 1.65 | Apr 2026 |
extraction
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Mistral-Large-Instruct-2407Open Source Mistral | 9.9 | Apr 2026 | |
| 2 | Qwen2.5-32B-InstructOpen Source Alibaba | 9.9 | Apr 2026 | |
| 3 | gemma-3-27b-itOpen Source Google | 9.9 | Apr 2026 | |
| 4 | Mistral-Small-24B-Instruct-2501Open Source Mistral | 9.9 | Apr 2026 | |
| 5 | Meta-Llama-3.1-70B-InstructOpen Source Meta | 9.85 | Apr 2026 | |
| 6 | Meta-Llama-3.1-405B-InstructOpen Source Meta | 9.85 | Apr 2026 | |
| 7 | Qwen2-72B-InstructOpen Source Alibaba | 9.8 | Apr 2026 | |
| 8 | Mistral-Small-3.1-24B-Instruct-2503Open Source Mistral | 9.8 | Apr 2026 | |
| 9 | Gemma-2-27b-itOpen Source Google | 9.6 | Apr 2026 | |
| 10 | gemma-3-12b-itOpen Source Google | 9.55 | Apr 2026 | |
| 11 | Mixtral-8x22bOpen Source Mistral | 9.55 | Apr 2026 | |
| 12 | Llama-PLLuM-70B-chatOpen Source PLLuM | 9.45 | Apr 2026 | |
| 13 | Bielik-11B-v2.3-InstructOpen Source | 9.43 | Apr 2026 | |
| 14 | Phi-4 Microsoft | 9.3 | Apr 2026 | |
| 15 | Bielik-11B-v2.2-InstructOpen Source | 9.3 | Apr 2026 | |
| 16 | Qwen2.5-14B-InstructOpen Source Alibaba | 9.25 | Apr 2026 | |
| 17 | Mistral-Small-Instruct-2409Open Source Mistral | 9.15 | Apr 2026 | |
| 18 | Bielik-11B-v2.1-InstructOpen Source | 9.125 | Apr 2026 | |
| 19 | Meta-Llama-3.1-8B-InstructOpen Source Meta | 9.1 | Apr 2026 | |
| 20 | Mistral-Nemo-Instruct-2407Open Source Mistral | 8.95 | Apr 2026 | |
| 21 | Bielik-11B-v2.0-InstructOpen Source | 8.65 | Apr 2026 | |
| 22 | Qwen2.5-3B-InstructOpen Source Alibaba | 8.45 | Apr 2026 | |
| 23 | gemma-3-4b-itOpen Source Google | 8.4 | Apr 2026 | |
| 24 | PLLuM-8x7B-nc-chatOpen Source PLLuM | 8.4 | Apr 2026 | |
| 25 | aya-expanse-32bOpen Source | 8.4 | Apr 2026 | |
| 26 | Mixtral-8x7bOpen Source Mistral | 8.15 | Apr 2026 | |
| 27 | GPT-3.5-turboOpen Source OpenAI | 8.15 | Apr 2026 | |
| 28 | aya-expanse-8bOpen Source | 8.05 | Apr 2026 | |
| 29 | PLLuM-8x7B-chatOpen Source PLLuM | 8 | Apr 2026 | |
| 30 | Mistral-7B-Instruct-v0.2Open Source Mistral | 7.4 | Apr 2026 | |
| 31 | Starling-LM-7B-alphaOpen Source | 7.35 | Apr 2026 | |
| 32 | Mistral-7B-Instruct-v0.3Open Source Mistral | 7.3 | Apr 2026 | |
| 33 | PLLuM-12B-nc-chatOpen Source PLLuM | 7.2 | Apr 2026 | |
| 34 | openchat-3.5-0106-gemmaOpen Source | 6.9 | Apr 2026 | |
| 35 | openchat-3.5-0106Open Source | 6.9 | Apr 2026 | |
| 36 | Phi-3.5-mini-instructOpen Source | 6.8 | Apr 2026 | |
| 37 | PLLuM-12B-chatOpen Source PLLuM | 6.55 | Apr 2026 | |
| 38 | Llama-PLLuM-8B-chatOpen Source PLLuM | 6.3 | Apr 2026 | |
| 39 | Llama-3.2-3B-InstructOpen Source Meta | 6.225 | Apr 2026 | |
| 40 | dolphin-2.9.1-llama-3-8bOpen Source | 6.15 | Apr 2026 | |
| 41 | Qwen2.5-1.5B-InstructOpen Source Alibaba | 5.75 | Apr 2026 | |
| 42 | Hermes-3-Llama-3.2-3BOpen Source | 5.3 | Apr 2026 | |
| 43 | Polka-Mistral-7B-SFTOpen Source | 5.25 | Apr 2026 | |
| 44 | gemma-3-1b-itOpen Source Google | 4.87 | Apr 2026 | |
| 45 | Bielik-7B-Instruct-v0.1Open Source | 4.35 | Apr 2026 | |
| 46 | trurl-2-7bOpen Source | 3.5 | Apr 2026 | |
| 47 | granite-3.0-2b-instructOpen Source | 3.45 | Apr 2026 | |
| 48 | SmolLM2-1.7B-InstructOpen Source | 2.75 | Apr 2026 | |
| 49 | EuroLLM-1.7B-InstructOpen Source | 2.25 | Apr 2026 | |
| 50 | Llama-3.2-1B-InstructOpen Source Meta | 1.6 | Apr 2026 |
humanities
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | gemma-3-12b-itOpen Source Google | 10 | Apr 2026 | |
| 2 | Mistral-Small-Instruct-2409Open Source Mistral | 10 | Apr 2026 | |
| 3 | aya-expanse-32bOpen Source | 10 | Apr 2026 | |
| 4 | Gemma-2-27b-itOpen Source Google | 10 | Apr 2026 | |
| 5 | gemma-3-27b-itOpen Source Google | 10 | Apr 2026 | |
| 6 | Mistral-Small-3.1-24B-Instruct-2503Open Source Mistral | 10 | Apr 2026 | |
| 7 | Phi-4 Microsoft | 9.95 | Apr 2026 | |
| 8 | gemma-3-4b-itOpen Source Google | 9.9 | Apr 2026 | |
| 9 | Qwen2-72B-InstructOpen Source Alibaba | 9.75 | Apr 2026 | |
| 10 | GPT-3.5-turboOpen Source OpenAI | 9.75 | Apr 2026 | |
| 11 | Mistral-Small-24B-Instruct-2501Open Source Mistral | 9.7 | Apr 2026 | |
| 12 | aya-expanse-8bOpen Source | 9.65 | Apr 2026 | |
| 13 | Qwen2.5-32B-InstructOpen Source Alibaba | 9.65 | Apr 2026 | |
| 14 | Meta-Llama-3.1-405B-InstructOpen Source Meta | 9.65 | Apr 2026 | |
| 15 | Mistral-Nemo-Instruct-2407Open Source Mistral | 9.5 | Apr 2026 | |
| 16 | PLLuM-12B-nc-chatOpen Source PLLuM | 9.5 | Apr 2026 | |
| 17 | Bielik-11B-v2.3-InstructOpen Source | 9.5 | Apr 2026 | |
| 18 | Llama-PLLuM-8B-chatOpen Source PLLuM | 9.5 | Apr 2026 | |
| 19 | Meta-Llama-3.1-70B-InstructOpen Source Meta | 9.5 | Apr 2026 | |
| 20 | Mixtral-8x7bOpen Source Mistral | 9.45 | Apr 2026 | |
| 21 | Bielik-11B-v2.0-InstructOpen Source | 9.425 | Apr 2026 | |
| 22 | Mistral-Large-Instruct-2407Open Source Mistral | 9.4 | Apr 2026 | |
| 23 | Bielik-11B-v2.2-InstructOpen Source | 9.4 | Apr 2026 | |
| 24 | openchat-3.5-0106Open Source | 9.3 | Apr 2026 | |
| 25 | PLLuM-12B-chatOpen Source PLLuM | 9.3 | Apr 2026 | |
| 26 | Bielik-11B-v2.1-InstructOpen Source | 9.2 | Apr 2026 | |
| 27 | Qwen2.5-14B-InstructOpen Source Alibaba | 9.175 | Apr 2026 | |
| 28 | Mixtral-8x22bOpen Source Mistral | 9.1 | Apr 2026 | |
| 29 | Meta-Llama-3.1-8B-InstructOpen Source Meta | 8.825 | Apr 2026 | |
| 30 | dolphin-2.9.1-llama-3-8bOpen Source | 8.8 | Apr 2026 | |
| 31 | openchat-3.5-0106-gemmaOpen Source | 8.8 | Apr 2026 | |
| 32 | Llama-PLLuM-70B-chatOpen Source PLLuM | 8.8 | Apr 2026 | |
| 33 | PLLuM-8x7B-chatOpen Source PLLuM | 8.6 | Apr 2026 | |
| 34 | gemma-3-1b-itOpen Source Google | 8.5 | Apr 2026 | |
| 35 | Starling-LM-7B-alphaOpen Source | 8.5 | Apr 2026 | |
| 36 | Bielik-7B-Instruct-v0.1Open Source | 8.475 | Apr 2026 | |
| 37 | Mistral-7B-Instruct-v0.2Open Source Mistral | 8.4 | Apr 2026 | |
| 38 | Hermes-3-Llama-3.2-3BOpen Source | 8.05 | Apr 2026 | |
| 39 | Phi-3.5-mini-instructOpen Source | 7.9 | Apr 2026 | |
| 40 | Qwen2.5-3B-InstructOpen Source Alibaba | 7.85 | Apr 2026 | |
| 41 | PLLuM-8x7B-nc-chatOpen Source PLLuM | 7.475 | Apr 2026 | |
| 42 | Llama-3.2-3B-InstructOpen Source Meta | 7.15 | Apr 2026 | |
| 43 | Mistral-7B-Instruct-v0.3Open Source Mistral | 6.75 | Apr 2026 | |
| 44 | Polka-Mistral-7B-SFTOpen Source | 5.6 | Apr 2026 | |
| 45 | trurl-2-7bOpen Source | 3.95 | Apr 2026 | |
| 46 | Qwen2.5-1.5B-InstructOpen Source Alibaba | 3.45 | Apr 2026 | |
| 47 | EuroLLM-1.7B-InstructOpen Source | 3.25 | Apr 2026 | |
| 48 | SmolLM2-1.7B-InstructOpen Source | 1.85 | Apr 2026 | |
| 49 | granite-3.0-2b-instructOpen Source | 1.45 | Apr 2026 | |
| 50 | Llama-3.2-1B-InstructOpen Source Meta | 1.4 | Apr 2026 |
math
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | gemma-3-27b-itOpen Source Google | 8.25 | Apr 2026 | |
| 2 | Qwen2.5-14B-InstructOpen Source Alibaba | 8.1 | Apr 2026 | |
| 3 | Mistral-Small-3.1-24B-Instruct-2503Open Source Mistral | 7.85 | Apr 2026 | |
| 4 | Mistral-Small-24B-Instruct-2501Open Source Mistral | 7.825 | Apr 2026 | |
| 5 | Gemma-2-27b-itOpen Source Google | 7.8 | Apr 2026 | |
| 6 | Mistral-Large-Instruct-2407Open Source Mistral | 7.8 | Apr 2026 | |
| 7 | Phi-4 Microsoft | 7.7 | Apr 2026 | |
| 8 | Bielik-11B-v2.3-InstructOpen Source | 7.7 | Apr 2026 | |
| 9 | Qwen2.5-32B-InstructOpen Source Alibaba | 7.6 | Apr 2026 | |
| 10 | gemma-3-12b-itOpen Source Google | 7.45 | Apr 2026 | |
| 11 | gemma-3-4b-itOpen Source Google | 7.4 | Apr 2026 | |
| 12 | Mistral-Small-Instruct-2409Open Source Mistral | 7 | Apr 2026 | |
| 13 | Mixtral-8x22bOpen Source Mistral | 6.9 | Apr 2026 | |
| 14 | GPT-3.5-turboOpen Source OpenAI | 6.85 | Apr 2026 | |
| 15 | Mistral-Nemo-Instruct-2407Open Source Mistral | 6.7 | Apr 2026 | |
| 16 | aya-expanse-32bOpen Source | 6.6 | Apr 2026 | |
| 17 | Qwen2-72B-InstructOpen Source Alibaba | 6.5 | Apr 2026 | |
| 18 | Bielik-11B-v2.2-InstructOpen Source | 6.45 | Apr 2026 | |
| 19 | Qwen2.5-3B-InstructOpen Source Alibaba | 6.4 | Apr 2026 | |
| 20 | Meta-Llama-3.1-405B-InstructOpen Source Meta | 6.25 | Apr 2026 | |
| 21 | Bielik-11B-v2.1-InstructOpen Source | 6.15 | Apr 2026 | |
| 22 | Meta-Llama-3.1-70B-InstructOpen Source Meta | 6 | Apr 2026 | |
| 23 | Mixtral-8x7bOpen Source Mistral | 5.65 | Apr 2026 | |
| 24 | Bielik-11B-v2.0-InstructOpen Source | 5.5 | Apr 2026 | |
| 25 | Meta-Llama-3.1-8B-InstructOpen Source Meta | 5.3 | Apr 2026 | |
| 26 | dolphin-2.9.1-llama-3-8bOpen Source | 4.8 | Apr 2026 | |
| 27 | openchat-3.5-0106-gemmaOpen Source | 4.55 | Apr 2026 | |
| 28 | Phi-3.5-mini-instructOpen Source | 4.5 | Apr 2026 | |
| 29 | Llama-3.2-3B-InstructOpen Source Meta | 4.5 | Apr 2026 | |
| 30 | aya-expanse-8bOpen Source | 4.35 | Apr 2026 | |
| 31 | Starling-LM-7B-alphaOpen Source | 4.15 | Apr 2026 | |
| 32 | Bielik-7B-Instruct-v0.1Open Source | 4.1 | Apr 2026 | |
| 33 | gemma-3-1b-itOpen Source Google | 4.05 | Apr 2026 | |
| 34 | openchat-3.5-0106Open Source | 3.8 | Apr 2026 | |
| 35 | Hermes-3-Llama-3.2-3BOpen Source | 3.7 | Apr 2026 | |
| 36 | PLLuM-8x7B-chatOpen Source PLLuM | 3.45 | Apr 2026 | |
| 37 | Qwen2.5-1.5B-InstructOpen Source Alibaba | 3.45 | Apr 2026 | |
| 38 | PLLuM-8x7B-nc-chatOpen Source PLLuM | 3.35 | Apr 2026 | |
| 39 | Mistral-7B-Instruct-v0.2Open Source Mistral | 3.2 | Apr 2026 | |
| 40 | Polka-Mistral-7B-SFTOpen Source | 2.95 | Apr 2026 | |
| 41 | Llama-PLLuM-70B-chatOpen Source PLLuM | 2.9 | Apr 2026 | |
| 42 | Llama-PLLuM-8B-chatOpen Source PLLuM | 2.75 | Apr 2026 | |
| 43 | PLLuM-12B-chatOpen Source PLLuM | 2.65 | Apr 2026 | |
| 44 | Llama-3.2-1B-InstructOpen Source Meta | 2.6 | Apr 2026 | |
| 45 | Mistral-7B-Instruct-v0.3Open Source Mistral | 2.35 | Apr 2026 | |
| 46 | PLLuM-12B-nc-chatOpen Source PLLuM | 2.3 | Apr 2026 | |
| 47 | granite-3.0-2b-instructOpen Source | 1.95 | Apr 2026 | |
| 48 | SmolLM2-1.7B-InstructOpen Source | 1.8 | Apr 2026 | |
| 49 | trurl-2-7bOpen Source | 1.7 | Apr 2026 | |
| 50 | EuroLLM-1.7B-InstructOpen Source | 1.1 | Apr 2026 |
pl-scorePrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | gemma-3-27b-itOpen Source Google | 9.28 | Apr 2026 | |
| 2 | Mistral-Small-3.1-24B-Instruct-2503Open Source Mistral | 9.18 | Apr 2026 | |
| 3 | Phi-4 Microsoft | 9.07 | Apr 2026 | |
| 4 | gemma-3-12b-itOpen Source Google | 8.97 | Apr 2026 | |
| 5 | Qwen2.5-32B-InstructOpen Source Alibaba | 8.86 | Apr 2026 | |
| 6 | Qwen2-72B-InstructOpen Source Alibaba | 8.78 | Apr 2026 | |
| 7 | Mistral-Small-24B-Instruct-2501Open Source Mistral | 8.72 | Apr 2026 | |
| 8 | Mistral-Large-Instruct-2407Open Source Mistral | 8.66 | Apr 2026 | |
| 9 | Gemma-2-27b-itOpen Source Google | 8.62 | Apr 2026 | |
| 10 | aya-expanse-32bOpen Source | 8.62 | Apr 2026 | |
| 11 | Mistral-Small-Instruct-2409Open Source Mistral | 8.56 | Apr 2026 | |
| 12 | Bielik-11B-v2.3-InstructOpen Source | 8.56 | Apr 2026 | |
| 13 | Qwen2.5-14B-InstructOpen Source Alibaba | 8.33 | Apr 2026 | |
| 14 | Mixtral-8x22bOpen Source Mistral | 8.23 | Apr 2026 | |
| 15 | gemma-3-4b-itOpen Source Google | 8.22 | Apr 2026 | |
| 16 | Meta-Llama-3.1-405B-InstructOpen Source Meta | 8.17 | Apr 2026 | |
| 17 | Meta-Llama-3.1-70B-InstructOpen Source Meta | 8.15 | Apr 2026 | |
| 18 | Bielik-11B-v2.2-InstructOpen Source | 8.12 | Apr 2026 | |
| 19 | Bielik-11B-v2.1-InstructOpen Source | 8 | Apr 2026 | |
| 20 | aya-expanse-8bOpen Source | 7.7625 | Apr 2026 | |
| 21 | GPT-3.5-turboOpen Source OpenAI | 7.72 | Apr 2026 | |
| 22 | Mixtral-8x7bOpen Source Mistral | 7.64 | Apr 2026 | |
| 23 | Bielik-11B-v2.0-InstructOpen Source | 7.56 | Apr 2026 | |
| 24 | Mistral-Nemo-Instruct-2407Open Source Mistral | 7.37 | Apr 2026 | |
| 25 | Llama-PLLuM-70B-chatOpen Source PLLuM | 6.75 | Apr 2026 | |
| 26 | openchat-3.5-0106-gemmaOpen Source | 6.51 | Apr 2026 | |
| 27 | PLLuM-12B-nc-chatOpen Source PLLuM | 6.47 | Apr 2026 | |
| 28 | PLLuM-8x7B-nc-chatOpen Source PLLuM | 6.43 | Apr 2026 | |
| 29 | Qwen2.5-3B-InstructOpen Source Alibaba | 6.35 | Apr 2026 | |
| 30 | PLLuM-8x7B-chatOpen Source PLLuM | 6.3 | Apr 2026 | |
| 31 | Meta-Llama-3.1-8B-InstructOpen Source Meta | 6.24 | Apr 2026 | |
| 32 | Llama-PLLuM-8B-chatOpen Source PLLuM | 6.05 | Apr 2026 | |
| 33 | Starling-LM-7B-alphaOpen Source | 6.05 | Apr 2026 | |
| 34 | openchat-3.5-0106Open Source | 6.03 | Apr 2026 | |
| 35 | PLLuM-12B-chatOpen Source PLLuM | 5.81 | Apr 2026 | |
| 36 | Mistral-7B-Instruct-v0.3Open Source Mistral | 5.75 | Apr 2026 | |
| 37 | Phi-3.5-mini-instructOpen Source | 5.56 | Apr 2026 | |
| 38 | Hermes-3-Llama-3.2-3BOpen Source | 5.54 | Apr 2026 | |
| 39 | gemma-3-1b-itOpen Source Google | 5.46 | Apr 2026 | |
| 40 | Bielik-7B-Instruct-v0.1Open Source | 5.4 | Apr 2026 | |
| 41 | dolphin-2.9.1-llama-3-8bOpen Source | 5.24 | Apr 2026 | |
| 42 | Llama-3.2-3B-InstructOpen Source Meta | 4.95 | Apr 2026 | |
| 43 | Polka-Mistral-7B-SFTOpen Source | 4.43 | Apr 2026 | |
| 44 | Qwen2.5-1.5B-InstructOpen Source Alibaba | 3.3 | Apr 2026 | |
| 45 | EuroLLM-1.7B-InstructOpen Source | 3.01 | Apr 2026 | |
| 46 | trurl-2-7bOpen Source | 2.75 | Apr 2026 | |
| 47 | Mistral-7B-Instruct-v0.2Open Source Mistral | 2.05 | Apr 2026 | |
| 48 | granite-3.0-2b-instructOpen Source | 2.03 | Apr 2026 | |
| 49 | Llama-3.2-1B-InstructOpen Source Meta | 1.61 | Apr 2026 | |
| 50 | SmolLM2-1.7B-InstructOpen Source | 1.58 | Apr 2026 |
reasoning
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Phi-4 Microsoft | 9.55 | Apr 2026 | |
| 2 | Qwen2.5-32B-InstructOpen Source Alibaba | 9.1 | Apr 2026 | |
| 3 | Mistral-Small-3.1-24B-Instruct-2503Open Source Mistral | 9 | Apr 2026 | |
| 4 | aya-expanse-32bOpen Source | 8.95 | Apr 2026 | |
| 5 | Qwen2-72B-InstructOpen Source Alibaba | 8.85 | Apr 2026 | |
| 6 | Mistral-Large-Instruct-2407Open Source Mistral | 8.7 | Apr 2026 | |
| 7 | gemma-3-27b-itOpen Source Google | 8.4 | Apr 2026 | |
| 8 | Bielik-11B-v2.3-InstructOpen Source | 8.35 | Apr 2026 | |
| 9 | Mistral-Small-Instruct-2409Open Source Mistral | 7.9 | Apr 2026 | |
| 10 | Mistral-Small-24B-Instruct-2501Open Source Mistral | 7.9 | Apr 2026 | |
| 11 | gemma-3-12b-itOpen Source Google | 7.75 | Apr 2026 | |
| 12 | Qwen2.5-14B-InstructOpen Source Alibaba | 7.55 | Apr 2026 | |
| 13 | Bielik-11B-v2.2-InstructOpen Source | 6.9 | Apr 2026 | |
| 14 | aya-expanse-8bOpen Source | 6.85 | Apr 2026 | |
| 15 | Gemma-2-27b-itOpen Source Google | 6.85 | Apr 2026 | |
| 16 | Mixtral-8x22bOpen Source Mistral | 6.3 | Apr 2026 | |
| 17 | Bielik-11B-v2.1-InstructOpen Source | 6.25 | Apr 2026 | |
| 18 | gemma-3-4b-itOpen Source Google | 6.25 | Apr 2026 | |
| 19 | Bielik-7B-Instruct-v0.1Open Source | 6.15 | Apr 2026 | |
| 20 | Meta-Llama-3.1-70B-InstructOpen Source Meta | 6.15 | Apr 2026 | |
| 21 | Bielik-11B-v2.0-InstructOpen Source | 6.05 | Apr 2026 | |
| 22 | Mistral-Nemo-Instruct-2407Open Source Mistral | 5.8 | Apr 2026 | |
| 23 | Mixtral-8x7bOpen Source Mistral | 5.8 | Apr 2026 | |
| 24 | Meta-Llama-3.1-405B-InstructOpen Source Meta | 5.8 | Apr 2026 | |
| 25 | openchat-3.5-0106-gemmaOpen Source | 5.4 | Apr 2026 | |
| 26 | Llama-PLLuM-8B-chatOpen Source PLLuM | 5.35 | Apr 2026 | |
| 27 | Llama-PLLuM-70B-chatOpen Source PLLuM | 5.2 | Apr 2026 | |
| 28 | GPT-3.5-turboOpen Source OpenAI | 5.2 | Apr 2026 | |
| 29 | Mistral-7B-Instruct-v0.2Open Source Mistral | 5 | Apr 2026 | |
| 30 | PLLuM-8x7B-nc-chatOpen Source PLLuM | 4.95 | Apr 2026 | |
| 31 | Phi-3.5-mini-instructOpen Source | 4.95 | Apr 2026 | |
| 32 | PLLuM-8x7B-chatOpen Source PLLuM | 4.9 | Apr 2026 | |
| 33 | PLLuM-12B-nc-chatOpen Source PLLuM | 4.8 | Apr 2026 | |
| 34 | Qwen2.5-3B-InstructOpen Source Alibaba | 4.25 | Apr 2026 | |
| 35 | PLLuM-12B-chatOpen Source PLLuM | 3.9 | Apr 2026 | |
| 36 | openchat-3.5-0106Open Source | 3.9 | Apr 2026 | |
| 37 | Starling-LM-7B-alphaOpen Source | 3.9 | Apr 2026 | |
| 38 | Mistral-7B-Instruct-v0.3Open Source Mistral | 3.8 | Apr 2026 | |
| 39 | gemma-3-1b-itOpen Source Google | 3.5 | Apr 2026 | |
| 40 | dolphin-2.9.1-llama-3-8bOpen Source | 3.3 | Apr 2026 | |
| 41 | Hermes-3-Llama-3.2-3BOpen Source | 3.1 | Apr 2026 | |
| 42 | Llama-3.2-3B-InstructOpen Source Meta | 2.7 | Apr 2026 | |
| 43 | EuroLLM-1.7B-InstructOpen Source | 2.65 | Apr 2026 | |
| 44 | Qwen2.5-1.5B-InstructOpen Source Alibaba | 2.6 | Apr 2026 | |
| 45 | Meta-Llama-3.1-8B-InstructOpen Source Meta | 2.5 | Apr 2026 | |
| 46 | Polka-Mistral-7B-SFTOpen Source | 2.45 | Apr 2026 | |
| 47 | trurl-2-7bOpen Source | 2.05 | Apr 2026 | |
| 48 | granite-3.0-2b-instructOpen Source | 1.55 | Apr 2026 | |
| 49 | Llama-3.2-1B-InstructOpen Source Meta | 1.3 | Apr 2026 | |
| 50 | SmolLM2-1.7B-InstructOpen Source | 1.1 | Apr 2026 |
roleplay
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | gemma-3-27b-itOpen Source Google | 9.95 | Apr 2026 | |
| 2 | aya-expanse-32bOpen Source | 9.7 | Apr 2026 | |
| 3 | gemma-3-4b-itOpen Source Google | 9.45 | Apr 2026 | |
| 4 | gemma-3-12b-itOpen Source Google | 9.45 | Apr 2026 | |
| 5 | Bielik-11B-v2.1-InstructOpen Source | 9.45 | Apr 2026 | |
| 6 | Mistral-Small-3.1-24B-Instruct-2503Open Source Mistral | 9.4 | Apr 2026 | |
| 7 | aya-expanse-8bOpen Source | 9.25 | Apr 2026 | |
| 8 | Qwen2-72B-InstructOpen Source Alibaba | 9.2 | Apr 2026 | |
| 9 | Phi-4 Microsoft | 9.2 | Apr 2026 | |
| 10 | Mixtral-8x22bOpen Source Mistral | 9.05 | Apr 2026 | |
| 11 | Mistral-Small-24B-Instruct-2501Open Source Mistral | 9.05 | Apr 2026 | |
| 12 | Bielik-11B-v2.2-InstructOpen Source | 9.025 | Apr 2026 | |
| 13 | Mixtral-8x7bOpen Source Mistral | 8.95 | Apr 2026 | |
| 14 | Mistral-Small-Instruct-2409Open Source Mistral | 8.9 | Apr 2026 | |
| 15 | Meta-Llama-3.1-70B-InstructOpen Source Meta | 8.8 | Apr 2026 | |
| 16 | Bielik-11B-v2.3-InstructOpen Source | 8.75 | Apr 2026 | |
| 17 | Meta-Llama-3.1-405B-InstructOpen Source Meta | 8.7 | Apr 2026 | |
| 18 | Gemma-2-27b-itOpen Source Google | 8.7 | Apr 2026 | |
| 19 | Mistral-Large-Instruct-2407Open Source Mistral | 8.7 | Apr 2026 | |
| 20 | GPT-3.5-turboOpen Source OpenAI | 8.65 | Apr 2026 | |
| 21 | Mistral-7B-Instruct-v0.2Open Source Mistral | 8.65 | Apr 2026 | |
| 22 | Qwen2.5-14B-InstructOpen Source Alibaba | 8.5 | Apr 2026 | |
| 23 | Qwen2.5-32B-InstructOpen Source Alibaba | 8.3 | Apr 2026 | |
| 24 | openchat-3.5-0106-gemmaOpen Source | 7.975 | Apr 2026 | |
| 25 | Bielik-7B-Instruct-v0.1Open Source | 7.825 | Apr 2026 | |
| 26 | Bielik-11B-v2.0-InstructOpen Source | 7.75 | Apr 2026 | |
| 27 | Mistral-Nemo-Instruct-2407Open Source Mistral | 7.45 | Apr 2026 | |
| 28 | dolphin-2.9.1-llama-3-8bOpen Source | 7.4 | Apr 2026 | |
| 29 | Mistral-7B-Instruct-v0.3Open Source Mistral | 7.25 | Apr 2026 | |
| 30 | PLLuM-8x7B-nc-chatOpen Source PLLuM | 6.9 | Apr 2026 | |
| 31 | Starling-LM-7B-alphaOpen Source | 6.9 | Apr 2026 | |
| 32 | PLLuM-12B-nc-chatOpen Source PLLuM | 6.75 | Apr 2026 | |
| 33 | Hermes-3-Llama-3.2-3BOpen Source | 6.75 | Apr 2026 | |
| 34 | Llama-PLLuM-70B-chatOpen Source PLLuM | 6.6 | Apr 2026 | |
| 35 | Qwen2.5-3B-InstructOpen Source Alibaba | 6.55 | Apr 2026 | |
| 36 | gemma-3-1b-itOpen Source Google | 6.25 | Apr 2026 | |
| 37 | PLLuM-8x7B-chatOpen Source PLLuM | 6.25 | Apr 2026 | |
| 38 | Llama-PLLuM-8B-chatOpen Source PLLuM | 6.15 | Apr 2026 | |
| 39 | openchat-3.5-0106Open Source | 6 | Apr 2026 | |
| 40 | Meta-Llama-3.1-8B-InstructOpen Source Meta | 5.6 | Apr 2026 | |
| 41 | Llama-3.2-3B-InstructOpen Source Meta | 5.3 | Apr 2026 | |
| 42 | PLLuM-12B-chatOpen Source PLLuM | 5 | Apr 2026 | |
| 43 | Polka-Mistral-7B-SFTOpen Source | 4.9 | Apr 2026 | |
| 44 | Phi-3.5-mini-instructOpen Source | 4.65 | Apr 2026 | |
| 45 | EuroLLM-1.7B-InstructOpen Source | 4.6 | Apr 2026 | |
| 46 | trurl-2-7bOpen Source | 3.3 | Apr 2026 | |
| 47 | Qwen2.5-1.5B-InstructOpen Source Alibaba | 2.55 | Apr 2026 | |
| 48 | Llama-3.2-1B-InstructOpen Source Meta | 1.65 | Apr 2026 | |
| 49 | granite-3.0-2b-instructOpen Source | 1.3 | Apr 2026 | |
| 50 | SmolLM2-1.7B-InstructOpen Source | 1 | Apr 2026 |
stem
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | gemma-3-12b-itOpen Source Google | 10 | Apr 2026 | |
| 2 | Phi-4 Microsoft | 10 | Apr 2026 | |
| 3 | aya-expanse-32bOpen Source | 9.95 | Apr 2026 | |
| 4 | gemma-3-27b-itOpen Source Google | 9.95 | Apr 2026 | |
| 5 | Mistral-Small-3.1-24B-Instruct-2503Open Source Mistral | 9.9 | Apr 2026 | |
| 6 | Gemma-2-27b-itOpen Source Google | 9.8 | Apr 2026 | |
| 7 | aya-expanse-8bOpen Source | 9.75 | Apr 2026 | |
| 8 | Qwen2.5-32B-InstructOpen Source Alibaba | 9.7 | Apr 2026 | |
| 9 | gemma-3-4b-itOpen Source Google | 9.65 | Apr 2026 | |
| 10 | Mistral-Small-Instruct-2409Open Source Mistral | 9.65 | Apr 2026 | |
| 11 | Qwen2.5-14B-InstructOpen Source Alibaba | 9.6 | Apr 2026 | |
| 12 | Meta-Llama-3.1-70B-InstructOpen Source Meta | 9.55 | Apr 2026 | |
| 13 | Qwen2-72B-InstructOpen Source Alibaba | 9.55 | Apr 2026 | |
| 14 | Mistral-Small-24B-Instruct-2501Open Source Mistral | 9.5 | Apr 2026 | |
| 15 | Bielik-11B-v2.2-InstructOpen Source | 9.45 | Apr 2026 | |
| 16 | Mistral-Large-Instruct-2407Open Source Mistral | 9.35 | Apr 2026 | |
| 17 | GPT-3.5-turboOpen Source OpenAI | 9.25 | Apr 2026 | |
| 18 | Mixtral-8x22bOpen Source Mistral | 9.25 | Apr 2026 | |
| 19 | PLLuM-12B-nc-chatOpen Source PLLuM | 9.1 | Apr 2026 | |
| 20 | Bielik-11B-v2.3-InstructOpen Source | 8.975 | Apr 2026 | |
| 21 | PLLuM-8x7B-nc-chatOpen Source PLLuM | 8.9 | Apr 2026 | |
| 22 | Bielik-11B-v2.1-InstructOpen Source | 8.9 | Apr 2026 | |
| 23 | Starling-LM-7B-alphaOpen Source | 8.85 | Apr 2026 | |
| 24 | Bielik-11B-v2.0-InstructOpen Source | 8.775 | Apr 2026 | |
| 25 | Meta-Llama-3.1-405B-InstructOpen Source Meta | 8.65 | Apr 2026 | |
| 26 | Mixtral-8x7bOpen Source Mistral | 8.55 | Apr 2026 | |
| 27 | openchat-3.5-0106-gemmaOpen Source | 8.475 | Apr 2026 | |
| 28 | openchat-3.5-0106Open Source | 8.4 | Apr 2026 | |
| 29 | Mistral-Nemo-Instruct-2407Open Source Mistral | 8.3 | Apr 2026 | |
| 30 | Llama-PLLuM-70B-chatOpen Source PLLuM | 8.2 | Apr 2026 | |
| 31 | PLLuM-8x7B-chatOpen Source PLLuM | 8.2 | Apr 2026 | |
| 32 | PLLuM-12B-chatOpen Source PLLuM | 8 | Apr 2026 | |
| 33 | Mistral-7B-Instruct-v0.2Open Source Mistral | 7.85 | Apr 2026 | |
| 34 | Llama-PLLuM-8B-chatOpen Source PLLuM | 7.5 | Apr 2026 | |
| 35 | Mistral-7B-Instruct-v0.3Open Source Mistral | 7.45 | Apr 2026 | |
| 36 | gemma-3-1b-itOpen Source Google | 7.1 | Apr 2026 | |
| 37 | Hermes-3-Llama-3.2-3BOpen Source | 6.95 | Apr 2026 | |
| 38 | Bielik-7B-Instruct-v0.1Open Source | 6.9 | Apr 2026 | |
| 39 | Phi-3.5-mini-instructOpen Source | 6.85 | Apr 2026 | |
| 40 | Polka-Mistral-7B-SFTOpen Source | 6.8 | Apr 2026 | |
| 41 | Qwen2.5-3B-InstructOpen Source Alibaba | 6.75 | Apr 2026 | |
| 42 | dolphin-2.9.1-llama-3-8bOpen Source | 6.35 | Apr 2026 | |
| 43 | Meta-Llama-3.1-8B-InstructOpen Source Meta | 6.3 | Apr 2026 | |
| 44 | Llama-3.2-3B-InstructOpen Source Meta | 4.85 | Apr 2026 | |
| 45 | EuroLLM-1.7B-InstructOpen Source | 4.65 | Apr 2026 | |
| 46 | trurl-2-7bOpen Source | 2.65 | Apr 2026 | |
| 47 | Qwen2.5-1.5B-InstructOpen Source Alibaba | 2.15 | Apr 2026 | |
| 48 | granite-3.0-2b-instructOpen Source | 1.45 | Apr 2026 | |
| 49 | SmolLM2-1.7B-InstructOpen Source | 1.35 | Apr 2026 | |
| 50 | Llama-3.2-1B-InstructOpen Source Meta | 1.3 | Apr 2026 |
writing
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | gemma-3-27b-itOpen Source Google | 9.7 | Apr 2026 | |
| 2 | aya-expanse-32bOpen Source | 9.6 | Apr 2026 | |
| 3 | Bielik-11B-v2.3-InstructOpen Source | 9.5 | Apr 2026 | |
| 4 | Bielik-11B-v2.1-InstructOpen Source | 9.5 | Apr 2026 | |
| 5 | Mixtral-8x7bOpen Source Mistral | 9.35 | Apr 2026 | |
| 6 | Bielik-11B-v2.2-InstructOpen Source | 9.35 | Apr 2026 | |
| 7 | gemma-3-12b-itOpen Source Google | 9.3 | Apr 2026 | |
| 8 | gemma-3-4b-itOpen Source Google | 9.3 | Apr 2026 | |
| 9 | aya-expanse-8bOpen Source | 9.3 | Apr 2026 | |
| 10 | Mixtral-8x22bOpen Source Mistral | 9.25 | Apr 2026 | |
| 11 | Phi-4 Microsoft | 9.25 | Apr 2026 | |
| 12 | Meta-Llama-3.1-405B-InstructOpen Source Meta | 9.2 | Apr 2026 | |
| 13 | Mistral-Small-3.1-24B-Instruct-2503Open Source Mistral | 9.15 | Apr 2026 | |
| 14 | GPT-3.5-turboOpen Source OpenAI | 9.1 | Apr 2026 | |
| 15 | Meta-Llama-3.1-70B-InstructOpen Source Meta | 9.1 | Apr 2026 | |
| 16 | Mistral-Small-Instruct-2409Open Source Mistral | 8.8 | Apr 2026 | |
| 17 | Bielik-11B-v2.0-InstructOpen Source | 8.75 | Apr 2026 | |
| 18 | Gemma-2-27b-itOpen Source Google | 8.75 | Apr 2026 | |
| 19 | Qwen2-72B-InstructOpen Source Alibaba | 8.75 | Apr 2026 | |
| 20 | Mistral-Large-Instruct-2407Open Source Mistral | 8.7 | Apr 2026 | |
| 21 | Qwen2.5-32B-InstructOpen Source Alibaba | 8.65 | Apr 2026 | |
| 22 | Llama-PLLuM-70B-chatOpen Source PLLuM | 8.05 | Apr 2026 | |
| 23 | PLLuM-12B-chatOpen Source PLLuM | 8 | Apr 2026 | |
| 24 | Mistral-Small-24B-Instruct-2501Open Source Mistral | 7.95 | Apr 2026 | |
| 25 | Bielik-7B-Instruct-v0.1Open Source | 7.85 | Apr 2026 | |
| 26 | Qwen2.5-14B-InstructOpen Source Alibaba | 7.75 | Apr 2026 | |
| 27 | openchat-3.5-0106Open Source | 7.75 | Apr 2026 | |
| 28 | Meta-Llama-3.1-8B-InstructOpen Source Meta | 7.7 | Apr 2026 | |
| 29 | Mistral-7B-Instruct-v0.2Open Source Mistral | 7.7 | Apr 2026 | |
| 30 | PLLuM-12B-nc-chatOpen Source PLLuM | 7.55 | Apr 2026 | |
| 31 | Starling-LM-7B-alphaOpen Source | 7.55 | Apr 2026 | |
| 32 | PLLuM-8x7B-nc-chatOpen Source PLLuM | 7.4 | Apr 2026 | |
| 33 | Mistral-7B-Instruct-v0.3Open Source Mistral | 7.35 | Apr 2026 | |
| 34 | Llama-PLLuM-8B-chatOpen Source PLLuM | 7.2 | Apr 2026 | |
| 35 | PLLuM-8x7B-chatOpen Source PLLuM | 7.1 | Apr 2026 | |
| 36 | openchat-3.5-0106-gemmaOpen Source | 7.05 | Apr 2026 | |
| 37 | Mistral-Nemo-Instruct-2407Open Source Mistral | 6.4 | Apr 2026 | |
| 38 | gemma-3-1b-itOpen Source Google | 6.05 | Apr 2026 | |
| 39 | Hermes-3-Llama-3.2-3BOpen Source | 6 | Apr 2026 | |
| 40 | Qwen2.5-3B-InstructOpen Source Alibaba | 5.55 | Apr 2026 | |
| 41 | dolphin-2.9.1-llama-3-8bOpen Source | 5.5 | Apr 2026 | |
| 42 | Polka-Mistral-7B-SFTOpen Source | 5.25 | Apr 2026 | |
| 43 | Phi-3.5-mini-instructOpen Source | 4.65 | Apr 2026 | |
| 44 | Llama-3.2-3B-InstructOpen Source Meta | 4.45 | Apr 2026 | |
| 45 | EuroLLM-1.7B-InstructOpen Source | 3.9 | Apr 2026 | |
| 46 | trurl-2-7bOpen Source | 3.15 | Apr 2026 | |
| 47 | Qwen2.5-1.5B-InstructOpen Source Alibaba | 2.7 | Apr 2026 | |
| 48 | granite-3.0-2b-instructOpen Source | 2.1 | Apr 2026 | |
| 49 | Llama-3.2-1B-InstructOpen Source Meta | 1.4 | Apr 2026 | |
| 50 | SmolLM2-1.7B-InstructOpen Source | 1.05 | Apr 2026 |