okapi MMLU (translated).

Name: okapi MMLU (translated) Benchmark Results
Creator: Unknown
License: https://creativecommons.org/licenses/by/4.0/

A translated / multilingual version of the MMLU (Measuring Massive Multitask Language Understanding) benchmark adapted for multilingual evaluation. MMLU is a 57-task, multiple-choice benchmark covering subjects across humanities, social sciences, and STEM requiring broad world knowledge and problem-solving. The "okapi MMLU (translated)" assets on Hugging Face provide MMLU questions and answers translated into multiple languages (examples on HF include many languages such as id, vi, ar, bn, de, es, fr, etc.). The translated MMLU variants are commonly used for multilingual few-shot evaluation (the Okapi paper reports using translated MMLU in 5-shot evaluations). License on the HF repos is listed as CC-BY-NC-4.0. Source references: the original MMLU paper (Hendrycks et al., arXiv:2009.03300) and the Okapi project (Okapi: instruction-tuned LLMs; arXiv:2307.16039) and the Hugging Face dataset pages (e.g., jon-tow/okapi_mmlu and SEACrowd/okapi_m_mmlu).

Paper ↗Leaderboard ↓

§ 01 · SOTA history

Year over year.

Not enough data to show trend.

§ 02 · Leaderboard

Results by metric.

Only 1 model on this benchmark

Help build the community leaderboard — submit your model results.

Accuracy

Accuracy is the reported evaluation metric for okapi MMLU (translated). Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified

Rank	Model	Trust	Score	Year	Source
01	Qwen2.5-72B-Instruct dataset: okapi MMLU (translated); task: 5	paper	79.97	N/A	Source ↗

§ 04 · Submit a result

Add to the leaderboard.

← Back to Language Modeling