Help Prioritize Research

246 benchmarks need research to determine if they're still relevant. Vote on which ones should be updated first.

How this works

1

Vote for benchmarks

Click the upvote button on benchmarks you want us to research

2

We research top voted

Using Exa and manual review to find latest SOTA results

3

Benchmarks get updated

Fresh data gets added, or benchmark marked as saturated/legacy

Top Voted

OmniDocBench v1.5Computer Vision/Document Parsing
28 resultsNo paper dates
inverse-textComputer Vision/Optical Character Recognition
18 resultsLatest: May 2023
SWE-bench: Software Engineering BenchmarkComputer Code/Code Generation
0 resultsNo paper dates
X-Ray Weld Defect Detection DatasetIndustrial Inspection/Anomaly Detection
1 resultsNo paper dates
Mathematics Aptitude Test of HeuristicsReasoning/Mathematical Reasoning
34 resultsNo paper dates
coco-textComputer Vision/Scene Text Detection
33 resultsLatest: May 2023
videodb's-ocr-benchmark-public-collectionComputer Vision/Optical Character Recognition
15 resultsLatest: Feb 2025
SuperGLUENatural Language Processing/Text Classification
7 resultsLatest: Jul 2024
Cityscapes DatasetComputer Vision/Semantic Segmentation
0 resultsNo paper dates
Open Graph BenchmarkGraphs/Node Classification
0 resultsNo paper dates

Computer Vision174

OmniDocBench v1.5/Document Parsing
28 resultsNo paper dates
inverse-text/Optical Character Recognition
18 resultsLatest: May 2023
coco-text/Scene Text Detection
33 resultsLatest: May 2023
videodb's-ocr-benchmark-public-collection/Optical Character Recognition
15 resultsLatest: Feb 2025
Cityscapes Dataset/Semantic Segmentation
0 resultsNo paper dates
Total-Text/Scene Text Detection
108 resultsLatest: Aug 2023
msra-td500/Scene Text Detection
61 resultsLatest: Aug 2023
icdar-2017-mlt/Scene Text Detection
42 resultsLatest: Dec 2019
dart/Optical Character Recognition
32 resultsLatest: Oct 2023
icdar2015/Optical Character Recognition
26 resultsLatest: Aug 2023
tabfact/Optical Character Recognition
23 resultsLatest: Dec 2024
iiit5k/Scene Text Recognition
21 resultsLatest: Aug 2023
cute80/Scene Text Recognition
20 resultsLatest: Aug 2023
sun-rgb-d/Optical Character Recognition
19 resultsLatest: Jun 2021
svtp/Scene Text Recognition
19 resultsLatest: Aug 2023
Curved Text in the Wild 1500/Scene Text Detection
18 resultsLatest: Feb 2022
18 resultsNo paper dates
pendigits/Optical Character Recognition
15 resultsLatest: May 2021
tobacco-3482/Document Image Classification
14 resultsLatest: Jan 2023
CodeSearchNet/Optical Character Recognition
14 resultsLatest: Sep 2024

+154 more...

Multimodal14

MVBench/Video Understanding
0 resultsNo paper dates
MJHQ-30K FID/Text-to-Image Generation
0 resultsNo paper dates
InstructPix2Pix/Image-Text-to-Image
0 resultsNo paper dates
ViDoRe/Cross-Modal Retrieval
0 resultsNo paper dates
DEMON Bench/Any-to-Any
0 resultsNo paper dates
MMBench/Image-Text-to-Text
0 resultsNo paper dates
MMMU/Image-Text-to-Text
0 resultsNo paper dates
MMStar/Image-Text-to-Text
0 resultsNo paper dates
DPG-Bench/Text-to-Image Generation
0 resultsNo paper dates
AudioBench/Audio-Text-to-Text
0 resultsNo paper dates
MagicBrush/Image-Text-to-Image
0 resultsNo paper dates
VideoBench/Image-Text-to-Video
0 resultsNo paper dates
Video-MME/Video Understanding
0 resultsNo paper dates
GenEval/Text-to-Image Generation
0 resultsNo paper dates

Natural Language Processing12

SuperGLUE/Text Classification
7 resultsLatest: Jul 2024
MTEB Leaderboard/Feature Extraction
6 resultsLatest: Sep 2024
MS MARCO/Text Ranking
4 resultsLatest: Oct 2023
WMT'23/Machine Translation
4 resultsNo paper dates
BEIR/Text Ranking
4 resultsLatest: Sep 2024
STS Benchmark/Semantic Textual Similarity
3 resultsLatest: Jan 2024
XNLI/Zero-Shot Classification
3 resultsLatest: Jan 2023
WikiTableQuestions/Table Question Answering
3 resultsLatest: Apr 2020
GLUE/Fill-Mask
3 resultsLatest: Jan 2023
FLORES-200/Machine Translation
0 resultsNo paper dates
WikiText Perplexity/Language Modeling
0 resultsNo paper dates
SQA/Table Question Answering
0 resultsNo paper dates

Reasoning10

34 resultsNo paper dates
AI2 Reasoning Challenge/Commonsense Reasoning
10 resultsNo paper dates
8 resultsNo paper dates
HellaSwag/Commonsense Reasoning
5 resultsNo paper dates
3 resultsNo paper dates
CommonsenseQA/Commonsense Reasoning
3 resultsNo paper dates
WinoGrande/Commonsense Reasoning
3 resultsNo paper dates
StrategyQA/Multi-step Reasoning
2 resultsNo paper dates
2 resultsNo paper dates
HotpotQA/Multi-step Reasoning
2 resultsNo paper dates

Medical8

7 resultsNo paper dates
4 resultsNo paper dates
RSNA Pneumonia Detection Challenge/Disease Classification
3 resultsLatest: Jan 2024
3 resultsNo paper dates
2 resultsNo paper dates
2 resultsNo paper dates
COVID-19 Image Data Collection/Disease Classification
2 resultsNo paper dates
1 resultsNo paper dates

Computer Code7

0 resultsNo paper dates
38 resultsNo paper dates
19 resultsLatest: Sep 2024
0 resultsNo paper dates
0 resultsNo paper dates
0 resultsNo paper dates
MBPP+ Extended Version/Code Generation
0 resultsNo paper dates

Audio7

DNS Challenge/Audio-to-Audio
0 resultsNo paper dates
0 resultsNo paper dates
MusicCaps/Text-to-Audio
0 resultsNo paper dates
AudioCaps/Text-to-Audio
0 resultsNo paper dates
0 resultsNo paper dates
DIHARD/Voice Activity Detection
0 resultsNo paper dates
AVA-Speech/Voice Activity Detection
0 resultsNo paper dates

Industrial Inspection6

1 resultsNo paper dates
6 resultsLatest: Aug 2024
6 resultsLatest: Mar 2025
Visual Anomaly Dataset/Anomaly Detection
3 resultsNo paper dates
1 resultsNo paper dates
NEU Surface Defect Database/Anomaly Detection
1 resultsNo paper dates

Graphs2

Open Graph Benchmark/Node Classification
0 resultsNo paper dates
OGB (Open Graph Benchmark)/Graph Classification
0 resultsNo paper dates

Speech2

The LJ Speech Dataset/Text-to-Speech
5 resultsLatest: Jun 2024
Mozilla Common Voice/Speech Recognition
3 resultsLatest: Dec 2022

Time Series2

OpenML-CC18/Tabular Classification
5 resultsLatest: Jun 2025
California Housing/Tabular Regression
2 resultsNo paper dates

Robots2

SIMPLER/Robotics
0 resultsNo paper dates
RLBench/Robotics
0 resultsNo paper dates