Polish LLM General
General-purpose evaluation of language models on Polish language tasks: sentiment, reading comprehension, question answering, cyberbullying detection, and emotional intelligence.
1
Datasets
0
Results
average
Canonical metric
Canonical Benchmark
Open PL LLM Leaderboard
Comprehensive evaluation of LLMs on Polish language understanding across 29 benchmarks including sentiment analysis (PolEmo2), reading comprehension (Belebele, DYK), question answering (PolQA, PPC), cyberbullying detection (CBD), and emotional intelligence (EQ-Bench). Maintained by SpeakLeash. 5-shot evaluation.
Primary metric: average
Top 10
Leading models on Open PL LLM Leaderboard.
No results yet. Be the first to contribute.
All datasets
1 dataset tracked for this task.
Related tasks
Other tasks in Natural Language Processing.