Codesota · Benchmark · Open PL LLM LeaderboardHome/Leaderboards/Open PL LLM Leaderboard
Unknown

Open PL LLM Leaderboard.

Comprehensive evaluation of LLMs on Polish language understanding across 29 benchmarks including sentiment analysis (PolEmo2), reading comprehension (Belebele, DYK), question answering (PolQA, PPC, PoQuAD), cyberbullying detection (CBD), KLEJ NER, PolEval 2018 Task 3, and emotional intelligence (EQ-Bench). Maintained by SpeakLeash. 5-shot evaluation.

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Poleval2018 Task3

Poleval2018 Task3 is the reported evaluation metric for Open PL LLM Leaderboard. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Poleval2018 Task3verifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01internlm2-1_8bverified60296.32026Source ↗Looks wrong?
02🚧Mistral-7B-v0.1verified16141.772026Source ↗Looks wrong?
03Mistral-7B-Instruct-v0.1verified6909.942026Source ↗Looks wrong?
04OpenChat3.5-0106-Spichlerz-Inst-001verified6516.052026Source ↗Looks wrong?
05🚧polish-mistral-7B/epoch_0_hfverified6351.832026Source ↗Looks wrong?
06internlm2-7bverified5498.232026Source ↗Looks wrong?
07zephyr-7b-alphaverified4464.452026Source ↗Looks wrong?
08internlm2-chat-7b-sftverified4269.632026Source ↗Looks wrong?
09Llama-2-7b-chat-hfverified4018.742026Source ↗Looks wrong?
10internlm2-chat-7bverified3892.52026Source ↗Looks wrong?
11zephyr-7b-betaverified3613.142026Source ↗Looks wrong?
12internlm2-base-7bverified3110.922026Source ↗Looks wrong?
13Mistral-7B-Instruct-v0.2verified2088.082026Source ↗Looks wrong?
14zephyr-speakleash-007-pl-8192-32-16-0.05verified2032.752026Source ↗Looks wrong?
15openchat-3.5-1210verified1923.832026Source ↗Looks wrong?
16gemma-7bverified1783.22026Source ↗Looks wrong?
17OpenHermes-2.5-Mistral-7Bverified14632026Source ↗Looks wrong?
18berkeley-nest/Starling-LM-7B-alphaverified1438.042026Source ↗Looks wrong?
19Starling-LM-7B-betaverified1161.542026Source ↗Looks wrong?
20openchat/openchat-3.5-0106verified1106.562026Source ↗Looks wrong?
21trurl-2-7bverified1098.882026Source ↗Looks wrong?
22Mistral-7B-v0.2-hfverified932.62026Source ↗Looks wrong?
23OpenChat3.5-0106-Spichlerz-Bocianverified920.612026Source ↗Looks wrong?
24Llama-2-7b-hfverified850.452026Source ↗Looks wrong?
25upstage/SOLAR-10.7B-Instruct-v1.0verified789.582026Source ↗Looks wrong?
26Voicelab/trurl-2-13b-academicverified733.912026Source ↗Looks wrong?
27SOLAR-10.7B-v1.0verified641.052026Source ↗Looks wrong?
28Qra-1bverified398.962026Source ↗Looks wrong?
29Curie-7B-v1verified389.172026Source ↗Looks wrong?
30🚧mistral_7B-v2/spkl-all_sft/e1_base/spkl-all-e1_9aee511averified379.792026Source ↗Looks wrong?
31🚧mistral_7B-v2/spkl-only_sft_v2/e1_base/spkl-only-e3_a5833b75verified350.972026Source ↗Looks wrong?
32🚧mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e1_70c70cc6verified279.242026Source ↗Looks wrong?
33speakleash/Bielik-7B-Instruct-v0.1verified277.922026Source ↗Looks wrong?
34🚧mistral-apt3-7B/apt3-e0_hfverified220.982026Source ↗Looks wrong?
35Qra-7bverified203.362026Source ↗Looks wrong?
36🚧mistral-apt3-7B/spkl-all_sft_v3-lr2/e0_base/spkl-all-e0-lr6_376eb1d5verified179.392026Source ↗Looks wrong?
37🚧mistral-apt3-7B/spkl-all_sft_v4/e0_base/spkl-all-e0-lr2e6_71659188verified177.612026Source ↗Looks wrong?
38🚧llama-apt3-7B/only-spi-e0_hfverified176.912026Source ↗Looks wrong?
39speakleash/Bielik-11B-v2.0-Instructverified173.082026Source ↗Looks wrong?
40Qra-13bverified168.662026Source ↗Looks wrong?
41🚧mistral-apt3-7B_v2/spkl-only_7e6-e0_8544bbd3verified157.272026Source ↗Looks wrong?
42🚧mistral-apt3-7B/only-spi-e0_hfverified150.12026Source ↗Looks wrong?
43🚧llama-apt3-13B/spkl-only/e0_cc0931c5verified147.362026Source ↗Looks wrong?
44🚧mistral-apt3-7B_v2/spkl-only_2e5-e1_013bd434verified145.022026Source ↗Looks wrong?
45🚧mistral-apt3-7B_v2/spkl-only-e1_87bfffacverified144.32026Source ↗Looks wrong?
46🚧llama-apt3-13B/spkl-plus/e0_caa5ad79verified144.052026Source ↗Looks wrong?
47🚧mistral-PL_7B/epoch_0_hfverified143.942026Source ↗Looks wrong?
48🚧mistral-apt3-7B_v2/spkl-only_2e5-e0_116fa2bcverified140.582026Source ↗Looks wrong?
49🚧mistral-apt3-7B/spi-e0_hfverified132.782026Source ↗Looks wrong?
50🚧mistral_7B-v2/spkl-only-e2_5dac700dverified124.782026Source ↗Looks wrong?
51🚧mistral_7B-v2/spkl-only-e0_ef715d74verified124.312026Source ↗Looks wrong?
52Bielik-7B-v0.1verified123.312026Source ↗Looks wrong?
53🚧mistral_7B-v2/spkl-all-e0_8cf0987dverified120.412026Source ↗Looks wrong?
54🚧mistral_7B-v2/spkl-all-e2_5bd6027dverified120.392026Source ↗Looks wrong?
55🚧mistral_7B-v2/spkl-all-e1_0b514ce9verified119.592026Source ↗Looks wrong?
56Bielik-11B-v2verified102.232026Source ↗Looks wrong?

Belebele

Belebele is the reported evaluation metric for Open PL LLM Leaderboard. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Belebeleverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Meta-Llama-3.1-405B-Instruct-FP8verified93.442026Source ↗Looks wrong?
02Qwen/Qwen2.5-72B-Instructverified932026Source ↗Looks wrong?
03QwQ-32B-Previewverified92.782026Source ↗Looks wrong?
04meta-llama/Llama-3.3-70B-Instructverified92.562026Source ↗Looks wrong?
05mistralai/Mistral-Large-Instruct-2411verified92.562026Source ↗Looks wrong?
06mistralai/Mistral-Large-Instruct-2407verified92.562026Source ↗Looks wrong?
07QwQ-32Bverified92.562026Source ↗Looks wrong?
08Llama-3.1-Nemotron-70B-Instruct-HFverified92.442026Source ↗Looks wrong?
09Qwen2.5-72Bverified922026Source ↗Looks wrong?
10meta-llama/Meta-Llama-3.1-70B-Instructverified922026Source ↗Looks wrong?
11mistralai/Mistral-Small-24B-Instruct-2501verified91.892026Source ↗Looks wrong?
12Qwen2.5-32Bverified91.892026Source ↗Looks wrong?
13Meta-Llama-3-70Bverified91.892026Source ↗Looks wrong?
14Qwen/Qwen2.5-32B-Instructverified91.892026Source ↗Looks wrong?
15Mistral-Small-24B-Base-2501verified91.782026Source ↗Looks wrong?
16Qwen3-32Bverified91.672026Source ↗Looks wrong?
17microsoft/phi-4verified91.562026Source ↗Looks wrong?
18Mixtral-8x22B-v0.1verified91.332026Source ↗Looks wrong?
19Athene-70Bverified91.332026Source ↗Looks wrong?
20meta-llama/Llama-4-Scout-17B-16E-Instruct (API)verified91.222026Source ↗Looks wrong?

Polqa Open Book

Polqa Open Book is the reported evaluation metric for Open PL LLM Leaderboard. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Polqa Open Bookverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01b11t2verified93.182026Source ↗Looks wrong?
02b11pverified92.852026Source ↗Looks wrong?
03MSH-v1-Bielik-v2.3-Instruct-MedIT-mergeverified92.652026Source ↗Looks wrong?
04Bielik-11B-v2.2-M-1.2verified92.322026Source ↗Looks wrong?
05Bielik-11B-v2.4-Instruct-MSverified92.312026Source ↗Looks wrong?
06Bielik-11B-v2.4-Instruct-SLverified92.312026Source ↗Looks wrong?
07Bielik-11B-v2.4-Instruct-TIverified92.292026Source ↗Looks wrong?
08🚧Test-v02-ep3verified92.262026Source ↗Looks wrong?
09Mixtral-8x22B-v0.1verified92.212026Source ↗Looks wrong?
10speakleash/Bielik-11B-v2.3-Instructverified92.192026Source ↗Looks wrong?
11Meta-Llama-3-70Bverified92.092026Source ↗Looks wrong?
12Llama-4-Scout-17B-16Everified91.922026Source ↗Looks wrong?
13Bielik-11B-v3-Base-20250730verified91.752026Source ↗Looks wrong?
14Meta-Llama-3.1-70Bverified91.682026Source ↗Looks wrong?
15🚧mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e2_db0cd739verified91.572026Source ↗Looks wrong?
16🚧mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e1_70c70cc6verified91.52026Source ↗Looks wrong?
17Bielik-11B-v2verified91.462026Source ↗Looks wrong?
18mistralai/Mistral-Large-Instruct-2411verified91.452026Source ↗Looks wrong?
19🚧mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e3_4960543cverified91.262026Source ↗Looks wrong?
20remek/v2/dpo/rel/D-G-PL-110verified91.232026Source ↗Looks wrong?
21🚧mistral_7B-v2/spkl-only_sft_v2/e1_base/spkl-only_9e7-e1_561ac4bbverified91.182026Source ↗Looks wrong?
22🚧mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e0_1b65c3acverified91.162026Source ↗Looks wrong?
23speakleash/Bielik-11B-v2.0-Instructverified91.162026Source ↗Looks wrong?
24speakleash/Bielik-11B-v2.6-Instructverified91.162026Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards