Codesota · Natural Language Processing · Language Modeling · AIME 2024Tasks/Natural Language Processing/Language Modeling
Language Modeling · benchmark dataset · EN

AIME 2024.

The AIME 2024 dataset contains problems from the American Invitational Mathematics Examination (AIME) 2024. It is primarily used for evaluating Large Language Models' (LLMs) mathematical reasoning and problem-solving capabilities on complex mathematical problems. Each record includes an ID, problem statement, detailed solution process, and the final numerical answer. The dataset covers various mathematical domains (geometry, algebra, number theory, etc.) and is known for its high difficulty level.

Submit a result
§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet
§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies
AIME 2024 — Language Modeling benchmark · Codesota | CodeSOTA