Codesota · Tasks · Polish Cultural CompetencyHome/Tasks/Natural Language Processing/Polish Cultural Competency

Polish Cultural Competency.

Evaluating language models on Polish linguistic and cultural knowledge across art & entertainment, culture & tradition, geography, grammar, history, and vocabulary.

Datasets

1155

Results

average

Canonical metric

§ 02 · Canonical benchmark

The reference dataset.

PLCC

Evaluates LLMs on Polish linguistic and cultural knowledge across 6 categories: art & entertainment, culture & tradition, geography, grammar, history, and vocabulary. Accuracy (0-100) per category. Created by Dadas et al. (2025).

Primary metric: average

View full leaderboard →

§ 03 · Top 10

Leading models.

Leading models on PLCC.

#	Model	geography	Year	Source
★	Gemini-3.1-Pro-Preview✓	100	2026	paper ↗
2	Gemini-3.0-Pro-Preview✓	100	2026	paper ↗
3	Gemini-3.1-Pro-Preview✓	100	2026	paper ↗
4	Gemini-3.0-Pro-Preview✓	99.0	2026	paper ↗
5	Gemini-2.5-Pro-Preview-06-05✓	98.0	2026	paper ↗
6	Gemini-3-Flash-Preview✓	98.0	2026	paper ↗
7	Gemini-3.1-Pro-Preview✓	98.0	2026	paper ↗
8	GPT-5.1-2025-11-13 (high reasoning)✓	97.0	2026	paper ↗
9	Gemini-3.1-Pro-Preview✓	97.0	2026	paper ↗
10	Gemini-2.5-Pro-Exp-03-25✓	97.0	2026	paper ↗

What were you looking for on Polish Cultural Competency?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

1 dataset tracked for this task.

PLCC

CANONICAL

1155 results · average

Top: Gemini-3.1-Pro-Preview — 100

§ 05 · Related tasks

Other tasks in Natural Language Processing.

Feature Extraction Fill-Mask Named Entity Recognition Natural Language Inference Polish Conversation Quality Polish Emotional Intelligence Polish LLM General Polish Text Understanding

Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Polish Cultural Competency? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.