Olympiad-style short-answer math benchmark used by reasoning-model releases. Small test set, so score swings should be read with caution.
Accuracy is the reported evaluation metric for AIME 2025. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better