Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Tasks · Polish Cultural CompetencyHome/Tasks/Natural Language Processing/Polish Cultural Competency

Polish Cultural Competency.

Evaluating language models on Polish linguistic and cultural knowledge across art & entertainment, culture & tradition, geography, grammar, history, and vocabulary.

1
Datasets
1155
Results
average
Canonical metric
§ 02 · Canonical benchmark

The reference dataset.

PLCC

Evaluates LLMs on Polish linguistic and cultural knowledge across 6 categories: art & entertainment, culture & tradition, geography, grammar, history, and vocabulary. Accuracy (0-100) per category. Created by Dadas et al. (2025).

Primary metric: average
View full leaderboard →
§ 03 · Top 10

Leading models.

Leading models on PLCC.

#ModelgeographyYearSource
Gemini-3.1-Pro-Preview1002026paper ↗
2Gemini-3.0-Pro-Preview1002026paper ↗
3Gemini-3.1-Pro-Preview1002026paper ↗
4Gemini-3.0-Pro-Preview99.02026paper ↗
5Gemini-2.5-Pro-Preview-06-0598.02026paper ↗
6Gemini-3-Flash-Preview98.02026paper ↗
7Gemini-3.1-Pro-Preview98.02026paper ↗
8GPT-5.1-2025-11-13 (high reasoning)97.02026paper ↗
9Gemini-3.1-Pro-Preview97.02026paper ↗
10Gemini-2.5-Pro-Exp-03-2597.02026paper ↗

What were you looking for on Polish Cultural Competency?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

1 dataset tracked for this task.

PLCC
CANONICAL
1155 results · average
Top: Gemini-3.1-Pro-Preview 100
§ 05 · Related tasks

Other tasks in Natural Language Processing.

Feature ExtractionFill-MaskNamed Entity RecognitionNatural Language InferencePolish Conversation QualityPolish Emotional IntelligencePolish LLM GeneralPolish Text Understanding
Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Polish Cultural Competency? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.