Who leads the Open PL LLM Leaderboard benchmark?

internlm2-1_8b currently leads Open PL LLM Leaderboard with a score of 60296.3 on Poleval2018 Task3.

What is the state-of-the-art score on Open PL LLM Leaderboard?

The state-of-the-art result on Open PL LLM Leaderboard is 60296.3 (Poleval2018 Task3), achieved by internlm2-1_8b as of 2026.

How many models are tracked on Open PL LLM Leaderboard?

Codesota tracks 94 models on Open PL LLM Leaderboard across 3 metrics.

When was the Open PL LLM Leaderboard leaderboard last updated?

The Open PL LLM Leaderboard leaderboard on Codesota includes results through 2026.

Codesota · Benchmark · Open PL LLM LeaderboardHome/Leaderboards/Open PL LLM Leaderboard

Unknown

Open PL LLM Leaderboard.

Name: Open PL LLM Leaderboard Benchmark Results
Creator: Unknown
Published: 2026-01-01
License: https://creativecommons.org/licenses/by/4.0/

Comprehensive evaluation of LLMs on Polish language understanding across 29 benchmarks including sentiment analysis (PolEmo2), reading comprehension (Belebele, DYK), question answering (PolQA, PPC, PoQuAD), cyberbullying detection (CBD), KLEJ NER, PolEval 2018 Task 3, and emotional intelligence (EQ-Bench). Maintained by SpeakLeash. 5-shot evaluation.

Paper ↗Leaderboard ↓

§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Poleval2018 Task3

Poleval2018 Task3 is the reported evaluation metric for Open PL LLM Leaderboard. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Poleval2018 Task3verifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	internlm2-1_8b	verified	60296.3	2026	Source ↗	Looks wrong?
02	🚧Mistral-7B-v0.1	verified	16141.77	2026	Source ↗	Looks wrong?
03	Mistral-7B-Instruct-v0.1	verified	6909.94	2026	Source ↗	Looks wrong?
04	OpenChat3.5-0106-Spichlerz-Inst-001	verified	6516.05	2026	Source ↗	Looks wrong?
05	🚧polish-mistral-7B/epoch_0_hf	verified	6351.83	2026	Source ↗	Looks wrong?
06	internlm2-7b	verified	5498.23	2026	Source ↗	Looks wrong?
07	zephyr-7b-alpha	verified	4464.45	2026	Source ↗	Looks wrong?
08	internlm2-chat-7b-sft	verified	4269.63	2026	Source ↗	Looks wrong?
09	Llama-2-7b-chat-hf	verified	4018.74	2026	Source ↗	Looks wrong?
10	internlm2-chat-7b	verified	3892.5	2026	Source ↗	Looks wrong?
11	zephyr-7b-beta	verified	3613.14	2026	Source ↗	Looks wrong?
12	internlm2-base-7b	verified	3110.92	2026	Source ↗	Looks wrong?
13	Mistral-7B-Instruct-v0.2	verified	2088.08	2026	Source ↗	Looks wrong?
14	zephyr-speakleash-007-pl-8192-32-16-0.05	verified	2032.75	2026	Source ↗	Looks wrong?
15	openchat-3.5-1210	verified	1923.83	2026	Source ↗	Looks wrong?
16	gemma-7b	verified	1783.2	2026	Source ↗	Looks wrong?
17	OpenHermes-2.5-Mistral-7B	verified	1463	2026	Source ↗	Looks wrong?
18	berkeley-nest/Starling-LM-7B-alpha	verified	1438.04	2026	Source ↗	Looks wrong?
19	Starling-LM-7B-beta	verified	1161.54	2026	Source ↗	Looks wrong?
20	openchat/openchat-3.5-0106	verified	1106.56	2026	Source ↗	Looks wrong?
21	trurl-2-7b	verified	1098.88	2026	Source ↗	Looks wrong?
22	Mistral-7B-v0.2-hf	verified	932.6	2026	Source ↗	Looks wrong?
23	OpenChat3.5-0106-Spichlerz-Bocian	verified	920.61	2026	Source ↗	Looks wrong?
24	Llama-2-7b-hf	verified	850.45	2026	Source ↗	Looks wrong?
25	upstage/SOLAR-10.7B-Instruct-v1.0	verified	789.58	2026	Source ↗	Looks wrong?
26	Voicelab/trurl-2-13b-academic	verified	733.91	2026	Source ↗	Looks wrong?
27	SOLAR-10.7B-v1.0	verified	641.05	2026	Source ↗	Looks wrong?
28	Qra-1b	verified	398.96	2026	Source ↗	Looks wrong?
29	Curie-7B-v1	verified	389.17	2026	Source ↗	Looks wrong?
30	🚧mistral_7B-v2/spkl-all_sft/e1_base/spkl-all-e1_9aee511a	verified	379.79	2026	Source ↗	Looks wrong?
31	🚧mistral_7B-v2/spkl-only_sft_v2/e1_base/spkl-only-e3_a5833b75	verified	350.97	2026	Source ↗	Looks wrong?
32	🚧mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e1_70c70cc6	verified	279.24	2026	Source ↗	Looks wrong?
33	speakleash/Bielik-7B-Instruct-v0.1	verified	277.92	2026	Source ↗	Looks wrong?
34	🚧mistral-apt3-7B/apt3-e0_hf	verified	220.98	2026	Source ↗	Looks wrong?
35	Qra-7b	verified	203.36	2026	Source ↗	Looks wrong?
36	🚧mistral-apt3-7B/spkl-all_sft_v3-lr2/e0_base/spkl-all-e0-lr6_376eb1d5	verified	179.39	2026	Source ↗	Looks wrong?
37	🚧mistral-apt3-7B/spkl-all_sft_v4/e0_base/spkl-all-e0-lr2e6_71659188	verified	177.61	2026	Source ↗	Looks wrong?
38	🚧llama-apt3-7B/only-spi-e0_hf	verified	176.91	2026	Source ↗	Looks wrong?
39	speakleash/Bielik-11B-v2.0-Instruct	verified	173.08	2026	Source ↗	Looks wrong?
40	Qra-13b	verified	168.66	2026	Source ↗	Looks wrong?
41	🚧mistral-apt3-7B_v2/spkl-only_7e6-e0_8544bbd3	verified	157.27	2026	Source ↗	Looks wrong?
42	🚧mistral-apt3-7B/only-spi-e0_hf	verified	150.1	2026	Source ↗	Looks wrong?
43	🚧llama-apt3-13B/spkl-only/e0_cc0931c5	verified	147.36	2026	Source ↗	Looks wrong?
44	🚧mistral-apt3-7B_v2/spkl-only_2e5-e1_013bd434	verified	145.02	2026	Source ↗	Looks wrong?
45	🚧mistral-apt3-7B_v2/spkl-only-e1_87bfffac	verified	144.3	2026	Source ↗	Looks wrong?
46	🚧llama-apt3-13B/spkl-plus/e0_caa5ad79	verified	144.05	2026	Source ↗	Looks wrong?
47	🚧mistral-PL_7B/epoch_0_hf	verified	143.94	2026	Source ↗	Looks wrong?
48	🚧mistral-apt3-7B_v2/spkl-only_2e5-e0_116fa2bc	verified	140.58	2026	Source ↗	Looks wrong?
49	🚧mistral-apt3-7B/spi-e0_hf	verified	132.78	2026	Source ↗	Looks wrong?
50	🚧mistral_7B-v2/spkl-only-e2_5dac700d	verified	124.78	2026	Source ↗	Looks wrong?
51	🚧mistral_7B-v2/spkl-only-e0_ef715d74	verified	124.31	2026	Source ↗	Looks wrong?
52	Bielik-7B-v0.1	verified	123.31	2026	Source ↗	Looks wrong?
53	🚧mistral_7B-v2/spkl-all-e0_8cf0987d	verified	120.41	2026	Source ↗	Looks wrong?
54	🚧mistral_7B-v2/spkl-all-e2_5bd6027d	verified	120.39	2026	Source ↗	Looks wrong?
55	🚧mistral_7B-v2/spkl-all-e1_0b514ce9	verified	119.59	2026	Source ↗	Looks wrong?
56	Bielik-11B-v2	verified	102.23	2026	Source ↗	Looks wrong?

Belebele

Belebele is the reported evaluation metric for Open PL LLM Leaderboard. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Belebeleverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Meta-Llama-3.1-405B-Instruct-FP8	verified	93.44	2026	Source ↗	Looks wrong?
02	Qwen/Qwen2.5-72B-Instruct	verified	93	2026	Source ↗	Looks wrong?
03	QwQ-32B-Preview	verified	92.78	2026	Source ↗	Looks wrong?
04	meta-llama/Llama-3.3-70B-Instruct	verified	92.56	2026	Source ↗	Looks wrong?
05	mistralai/Mistral-Large-Instruct-2411	verified	92.56	2026	Source ↗	Looks wrong?
06	mistralai/Mistral-Large-Instruct-2407	verified	92.56	2026	Source ↗	Looks wrong?
07	QwQ-32B	verified	92.56	2026	Source ↗	Looks wrong?
08	Llama-3.1-Nemotron-70B-Instruct-HF	verified	92.44	2026	Source ↗	Looks wrong?
09	Qwen2.5-72B	verified	92	2026	Source ↗	Looks wrong?
10	meta-llama/Meta-Llama-3.1-70B-Instruct	verified	92	2026	Source ↗	Looks wrong?
11	mistralai/Mistral-Small-24B-Instruct-2501	verified	91.89	2026	Source ↗	Looks wrong?
12	Qwen2.5-32B	verified	91.89	2026	Source ↗	Looks wrong?
13	Meta-Llama-3-70B	verified	91.89	2026	Source ↗	Looks wrong?
14	Qwen/Qwen2.5-32B-Instruct	verified	91.89	2026	Source ↗	Looks wrong?
15	Mistral-Small-24B-Base-2501	verified	91.78	2026	Source ↗	Looks wrong?
16	Qwen3-32B	verified	91.67	2026	Source ↗	Looks wrong?
17	microsoft/phi-4	verified	91.56	2026	Source ↗	Looks wrong?
18	Mixtral-8x22B-v0.1	verified	91.33	2026	Source ↗	Looks wrong?
19	Athene-70B	verified	91.33	2026	Source ↗	Looks wrong?
20	meta-llama/Llama-4-Scout-17B-16E-Instruct (API)	verified	91.22	2026	Source ↗	Looks wrong?

Polqa Open Book

Polqa Open Book is the reported evaluation metric for Open PL LLM Leaderboard. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Polqa Open Bookverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	b11t2	verified	93.18	2026	Source ↗	Looks wrong?
02	b11p	verified	92.85	2026	Source ↗	Looks wrong?
03	MSH-v1-Bielik-v2.3-Instruct-MedIT-merge	verified	92.65	2026	Source ↗	Looks wrong?
04	Bielik-11B-v2.2-M-1.2	verified	92.32	2026	Source ↗	Looks wrong?
05	Bielik-11B-v2.4-Instruct-MS	verified	92.31	2026	Source ↗	Looks wrong?
06	Bielik-11B-v2.4-Instruct-SL	verified	92.31	2026	Source ↗	Looks wrong?
07	Bielik-11B-v2.4-Instruct-TI	verified	92.29	2026	Source ↗	Looks wrong?
08	🚧Test-v02-ep3	verified	92.26	2026	Source ↗	Looks wrong?
09	Mixtral-8x22B-v0.1	verified	92.21	2026	Source ↗	Looks wrong?
10	speakleash/Bielik-11B-v2.3-Instruct	verified	92.19	2026	Source ↗	Looks wrong?
11	Meta-Llama-3-70B	verified	92.09	2026	Source ↗	Looks wrong?
12	Llama-4-Scout-17B-16E	verified	91.92	2026	Source ↗	Looks wrong?
13	Bielik-11B-v3-Base-20250730	verified	91.75	2026	Source ↗	Looks wrong?
14	Meta-Llama-3.1-70B	verified	91.68	2026	Source ↗	Looks wrong?
15	🚧mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e2_db0cd739	verified	91.57	2026	Source ↗	Looks wrong?
16	🚧mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e1_70c70cc6	verified	91.5	2026	Source ↗	Looks wrong?
17	Bielik-11B-v2	verified	91.46	2026	Source ↗	Looks wrong?
18	mistralai/Mistral-Large-Instruct-2411	verified	91.45	2026	Source ↗	Looks wrong?
19	🚧mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e3_4960543c	verified	91.26	2026	Source ↗	Looks wrong?
20	remek/v2/dpo/rel/D-G-PL-110	verified	91.23	2026	Source ↗	Looks wrong?
21	🚧mistral_7B-v2/spkl-only_sft_v2/e1_base/spkl-only_9e7-e1_561ac4bb	verified	91.18	2026	Source ↗	Looks wrong?
22	🚧mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e0_1b65c3ac	verified	91.16	2026	Source ↗	Looks wrong?
23	speakleash/Bielik-11B-v2.0-Instruct	verified	91.16	2026	Source ↗	Looks wrong?
24	speakleash/Bielik-11B-v2.6-Instruct	verified	91.16	2026	Source ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards