Polish LLM General

General-purpose evaluation of language models on Polish language tasks: sentiment, reading comprehension, question answering, cyberbullying detection, and emotional intelligence.

1
Datasets
0
Results
average
Canonical metric
Canonical Benchmark

Open PL LLM Leaderboard

Comprehensive evaluation of LLMs on Polish language understanding across 29 benchmarks including sentiment analysis (PolEmo2), reading comprehension (Belebele, DYK), question answering (PolQA, PPC), cyberbullying detection (CBD), and emotional intelligence (EQ-Bench). Maintained by SpeakLeash. 5-shot evaluation.

Primary metric: average
View full leaderboard

Top 10

Leading models on Open PL LLM Leaderboard.

No results yet. Be the first to contribute.

All datasets

1 dataset tracked for this task.

Related tasks

Other tasks in Natural Language Processing.