Comprehensive evaluation of LLMs on Polish language understanding across 29 benchmarks including sentiment analysis (PolEmo2), reading comprehension (Belebele, DYK), question answering (PolQA, PPC, PoQuAD), cyberbullying detection (CBD), KLEJ NER, PolEval 2018 Task 3, and emotional intelligence (EQ-Bench). Maintained by SpeakLeash. 5-shot evaluation.
Poleval2018 Task3 is the reported evaluation metric for Open PL LLM Leaderboard. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Belebele is the reported evaluation metric for Open PL LLM Leaderboard. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Polqa Open Book is the reported evaluation metric for Open PL LLM Leaderboard. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better