Codesota · Benchmark · MGSHome/Leaderboards/Language & Knowledge/Language Modeling/MGS
Unknown

MGS.

Multilingual Grade School Math (MGSM) is a multilingual benchmark of grade-school math word problems introduced in the paper “Language Models are Multilingual Chain-of-Thought Reasoners” (arXiv:2210.03057). It contains the same 250 problems from GSM8K, each manually translated into 10 typologically diverse languages (Spanish, French, German, Russian, Chinese, Japanese, Thai, Swahili, Bengali, Telugu) plus English. MGSM is used to evaluate multilingual reasoning and chain-of-thought capabilities of language models (includes inputs, targets, and manually translated few-shot exemplars). License: CC BY-SA 4.0. Size: 250 problems × languages (1K<n<10K overall). Note: referenced as MGS / MGSM in some papers (reported in pre-training comparisons).

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Only 1 model on this benchmark
Help build the community leaderboard — submit your model results.
Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Accuracy

Accuracy is the reported evaluation metric for MGS. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

RankModelTrustScoreYearLinksFix
01Qwen2.5-72B-Instruct
dataset: MGS; task: 5
paper88.16N/APaper ↗Code ↗Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Language Modeling