The MATH 500 dataset is an academic math benchmark focusing on probability, algebra, and trigonometry. It is designed to evaluate language models on their ability to solve mathematical problems. The dataset includes questions from various subjects such as Algebra, Intermediate Algebra, Precalculus, Geometry, Number Theory, Prealgebra, and Counting & Probability, across different difficulty levels (1 to 5).
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.