GSM8K (Grade School Math 8K) is a dataset of 8.5K high-quality, linguistically diverse grade school math word problems.
Accuracy is the reported evaluation metric for GSM8k. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Muted rows were not state of the art when published — an earlier or same-year result already scored better.
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | Qwen2.5-Plus | paper | 96 | N/A | Paper ↗Code ↗Source ↗ | Looks wrong? |