Reading-comprehension benchmark requiring arithmetic, counting, sorting, comparison, and other discrete reasoning over paragraphs.
F1 is the reported evaluation metric for DROP. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Edit |
|---|---|---|---|---|---|---|
| 01 | MiniMax-Text-01 | unverified | 87.8 | 2025 | Paper ↗Code ↗ | Edit result |
| 02 | HRM-Text-1B | unverified | 82.3 | 2026 | Paper ↗Code ↗ | Edit result |
| 03 | ByT5 XXL | unverified | 80 | 2021 | Paper ↗Code ↗ | Edit result |
| 04 | Apertus-70B-Instruct | unverified | 50.8 | 2025 | Paper ↗Code ↗ | Edit result |
| 05 | OLMo-2-7B-1124 (olmOCR-peS2o) | unverified | 43.7 | 2025 | Paper ↗Code ↗ | Edit result |