Codesota · Benchmark · PLCCHome/Leaderboards/PLCC
Unknown

PLCC.

Evaluates LLMs on Polish linguistic and cultural knowledge across 6 categories: art & entertainment, culture & tradition, geography, grammar, history, and vocabulary. Accuracy (0-100) per category. Created by Dadas et al. (2025).

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Geography

Geography is the reported evaluation metric for PLCC. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Geographyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Gemini-3.1-Pro-Previewverified1002026Source ↗Looks wrong?
02Gemini-3.0-Pro-Previewverified1002026Source ↗Looks wrong?
03Gemini-2.5-Pro-Preview-06-05verified982026Source ↗Looks wrong?
04Gemini-2.5-Pro-Exp-03-25verified972026Source ↗Looks wrong?
05GPT-5.4-2026-03-05 (low reasoning)verified972026Source ↗Looks wrong?
06GPT-5-2025-08-07verified972026Source ↗Looks wrong?
07GPT-5.1-2025-11-13 (high reasoning)verified972026Source ↗Looks wrong?
08O3-2025-04-16verified972026Source ↗Looks wrong?
09Gemini-3-Flash-Previewverified962026Source ↗Looks wrong?
10GPT-5.4-2026-03-05 (high reasoning)verified962026Source ↗Looks wrong?
11GPT-5-Pro-2025-10-06 (high reasoning)verified962026Source ↗Looks wrong?
12O1-2024-12-17verified952026Source ↗Looks wrong?
13GPT-5.2-2025-12-11 (high reasoning)verified952026Source ↗Looks wrong?
14DeepSeek-V3.2-Specialeverified942026Source ↗Looks wrong?
15Gemini-2.5-Flash-Preview-04-17verified942026Source ↗Looks wrong?
16GPT-5-mini-2025-08-07verified942026Source ↗Looks wrong?
17GPT-5.2-2025-12-11 (medium reasoning)verified942026Source ↗Looks wrong?
18GPT-5.2-2025-12-11 (xhigh reasoning)verified942026Source ↗Looks wrong?
19Grok 4verified942026Source ↗Looks wrong?
20GPT-5.4-mini-2026-03-17 (high reasoning)verified922026Source ↗Looks wrong?
21GLM-5verified912026Source ↗Looks wrong?
22GPT-4.5-preview-2025-02-27verified902026Source ↗Looks wrong?
23DeepSeek-v3.1 (thinking)verified892026Source ↗Looks wrong?

Culture And Tradition

Culture And Tradition is the reported evaluation metric for PLCC. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Culture And Traditionverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Gemini-3.1-Pro-Previewverified1002026Source ↗Looks wrong?
02Gemini-3.0-Pro-Previewverified992026Source ↗Looks wrong?
03Gemini-3-Flash-Previewverified982026Source ↗Looks wrong?
04Gemini-2.5-Pro-Preview-06-05verified962026Source ↗Looks wrong?
05Grok 4verified952026Source ↗Looks wrong?
06GPT-5-Pro-2025-10-06 (high reasoning)verified942026Source ↗Looks wrong?
07GPT-5.2-2025-12-11 (xhigh reasoning)verified932026Source ↗Looks wrong?
08GPT-5.4-2026-03-05 (high reasoning)verified932026Source ↗Looks wrong?
09GPT-5.4-2026-03-05 (low reasoning)verified932026Source ↗Looks wrong?
10O1-2024-12-17verified922026Source ↗Looks wrong?
11GPT-4o-2024-05-13verified922026Source ↗Looks wrong?
12GPT-4.5-preview-2025-02-27verified922026Source ↗Looks wrong?
13O3-2025-04-16verified912026Source ↗Looks wrong?
14Gemini-2.5-Pro-Exp-03-25verified912026Source ↗Looks wrong?
15GPT-5.1-2025-11-13 (high reasoning)verified902026Source ↗Looks wrong?
16Grok-3-Betaverified902026Source ↗Looks wrong?
17Gemini-Exp-1206verified902026Source ↗Looks wrong?
18GPT-4o-2024-11-20verified892026Source ↗Looks wrong?
19GPT-5-2025-08-07verified892026Source ↗Looks wrong?
20GPT-4o-2024-08-06verified892026Source ↗Looks wrong?

History

History is the reported evaluation metric for PLCC. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Historyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Gemini-3.1-Pro-Previewverified982026Source ↗Looks wrong?
02Gemini-3.0-Pro-Previewverified952026Source ↗Looks wrong?
03Grok 4verified942026Source ↗Looks wrong?
04GPT-5.2-2025-12-11 (xhigh reasoning)verified942026Source ↗Looks wrong?
05GPT-5.4-2026-03-05 (low reasoning)verified932026Source ↗Looks wrong?
06Gemini-3-Flash-Previewverified922026Source ↗Looks wrong?
07GPT-5.4-2026-03-05 (high reasoning)verified922026Source ↗Looks wrong?
08Gemini-2.5-Pro-Exp-03-25verified922026Source ↗Looks wrong?
09Claude-3.7-Sonnet-Thinkingverified922026Source ↗Looks wrong?
10Gemini-2.5-Pro-Preview-06-05verified922026Source ↗Looks wrong?
11Claude-Opus-4.1verified912026Source ↗Looks wrong?
12DeepSeek-R1-0528verified912026Source ↗Looks wrong?
13Claude-3.5-Sonnet-20241022verified912026Source ↗Looks wrong?
14GPT-5-2025-08-07verified912026Source ↗Looks wrong?
15GPT-5-Pro-2025-10-06 (high reasoning)verified912026Source ↗Looks wrong?
16GPT-5.2-2025-12-11 (medium reasoning)verified902026Source ↗Looks wrong?
17DeepSeek-V3.2-Specialeverified902026Source ↗Looks wrong?
18GPT-4.5-preview-2025-02-27verified902026Source ↗Looks wrong?
19O1-2024-12-17verified902026Source ↗Looks wrong?
20Claude-3.7-Sonnetverified902026Source ↗Looks wrong?
21GPT-5.2-2025-12-11 (high reasoning)verified902026Source ↗Looks wrong?
22O3-2025-04-16verified892026Source ↗Looks wrong?
23DeepSeek-v3.1 (thinking)verified892026Source ↗Looks wrong?
24Kimi-K2.5verified892026Source ↗Looks wrong?

Average

Average is the reported evaluation metric for PLCC. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Averageverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Gemini-3.1-Pro-Previewverified972026Source ↗Looks wrong?
02Gemini-3.0-Pro-Previewverified95.8333332026Source ↗Looks wrong?
03GPT-5.4-2026-03-05 (high reasoning)verified92.1666672026Source ↗Looks wrong?
04Gemini-2.5-Pro-Preview-06-05verified92.1666672026Source ↗Looks wrong?
05Gemini-3-Flash-Previewverified91.6666672026Source ↗Looks wrong?
06GPT-5-Pro-2025-10-06 (high reasoning)verified912026Source ↗Looks wrong?
07Grok 4verified90.52026Source ↗Looks wrong?
08GPT-5.4-2026-03-05 (low reasoning)verified90.52026Source ↗Looks wrong?
09GPT-5-2025-08-07verified89.52026Source ↗Looks wrong?
10Gemini-2.5-Pro-Exp-03-25verified89.52026Source ↗Looks wrong?
11GPT-5.2-2025-12-11 (xhigh reasoning)verified89.3333332026Source ↗Looks wrong?
12O3-2025-04-16verified89.1666672026Source ↗Looks wrong?
13O1-2024-12-17verified89.1666672026Source ↗Looks wrong?

Vocabulary

Vocabulary is the reported evaluation metric for PLCC. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Vocabularyverifiedpapervendorcommunityunverified

Art And Entertainment

Art And Entertainment is the reported evaluation metric for PLCC. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Art And Entertainmentverifiedpapervendorcommunityunverified

Grammar

Grammar is the reported evaluation metric for PLCC. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Grammarverifiedpapervendorcommunityunverified
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards