Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Tasks · Polish Text UnderstandingHome/Tasks/Natural Language Processing/Polish Text Understanding

Polish Text Understanding.

Evaluating language models on understanding Polish text: sentiment, implicatures, phraseology, tricky questions, and hallucination resistance.

1
Datasets
465
Results
average
Canonical metric
§ 02 · Canonical benchmark

The reference dataset.

CPTU-Bench

Evaluates LLMs on understanding Polish text across 4 dimensions: sentiment analysis, language understanding (implicatures, author intent), phraseology (idioms, phraseological compounds), and tricky questions (logic, ambiguity, hallucination resistance). Score range 0-5 per category. 378 hand-written examples. Created by SpeakLeash/Spichlerz.

Primary metric: average
View full leaderboard →
§ 03 · Top 10

Leading models.

Leading models on CPTU-Bench.

#Modeltricky-questionsYearSource
Qwen/Qwen3.5-35B-A3B thinking (API)4.702025paper ↗
2Qwen/Qwen3.5-27B thinking (API)4.612025paper ↗
3gemini-2.0-flash-0014.522025paper ↗
4deepseek-ai/DeepSeek-R1 (API)4.492025paper ↗
5deepseek-ai/DeepSeek-V3.2 (API)4.462025paper ↗
6Qwen/Qwen3.5-27B non-thinking (API)4.432025paper ↗
7deepseek-ai/DeepSeek-V3.1 (API)4.422025paper ↗
8Qwen/Qwen3.5-27B thinking (API)4.422025paper ↗
9meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)4.392025paper ↗
10moonshotai/Kimi-K2-Instruct-0905 (API)4.392025paper ↗

What were you looking for on Polish Text Understanding?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

1 dataset tracked for this task.

CPTU-Bench
CANONICAL
465 results · average
Top: Qwen/Qwen3.5-35B-A3B thinking (API) 4.70
§ 05 · Related tasks

Other tasks in Natural Language Processing.

Feature ExtractionFill-MaskNamed Entity RecognitionNatural Language InferencePolish Conversation QualityPolish Cultural CompetencyPolish Emotional IntelligencePolish LLM General
Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Polish Text Understanding? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.