Benchmarks
326 benchmarks across 82 tasks in 14 research areas.
75Active
2469Results
82Tasks
14Areas
Top by area
Ranked by results, recency, and community interest.
Computer Vision27 tasks · 21/199
1ICDAR 2015 Incidental Scene TextScene Text Detection2Total-TextScene Text Detection3publaynet-valDocument Layout Analysis
27 tasks, 199 benchmarks →Computer Code5 tasks · 6/13
1HumanEval: Hand-Written Evaluation SetCode Generation2SWE-bench Verified SubsetCode Generation3LiveCodeBenchCode Generation
5 tasks, 13 benchmarks →Reasoning5 tasks · 9/19
1Mathematics Aptitude Test of HeuristicsMathematical Reasoning2Massive Multitask Language UnderstandingCommonsense Reasoning3Graduate-Level Google-Proof Q&AMulti-step Reasoning
5 tasks, 19 benchmarks →Time Series3 tasks · 6/8
1M4 Forecasting CompetitionTime Series Forecasting2Weather Time Series BenchmarkTime Series Forecasting3Electricity Transformer Temperature - hourly 2 (ETTh2)Time Series Forecasting
3 tasks, 8 benchmarks →Agentic AI5 tasks · 6/6
1SWE-bench Verified — Agentic LeaderboardSWE-bench2Human-Calibrated Autonomy Software TasksHCAST3WebArena: A Realistic Web Environment for Building Autonomous AgentsWeb & Desktop Agents
5 tasks, 6 benchmarks →Medical2 tasks · 5/13
1Autism Brain Imaging Data Exchange IDisease Classification2Synapse Multi-Organ Abdominal CT SegmentationMedical Image Segmentation3Brain Tumor Segmentation Challenge 2023Medical Image Segmentation
2 tasks, 13 benchmarks →Multimodal10 tasks · 9/23
1Massive Multidiscipline Multimodal UnderstandingVisual Question Answering2TextVQA: Towards VQA Models That Can ReadVisual Question Answering3MMBench: Is Your Multi-modal Model an All-around Player?Visual Question Answering
10 tasks, 23 benchmarks →Natural Language Processing13 tasks · 4/17
1Stanford Question Answering Dataset v2.0Question Answering2CNN/DailyMail SummarizationText Summarization3Stanford Natural Language InferenceNatural Language Inference
13 tasks, 17 benchmarks →Speech2 tasks · 4/6
1LibriSpeech ASR CorpusSpeech Recognition2CSTR VCTK CorpusText-to-Speech3The LJ Speech DatasetText-to-Speech
2 tasks, 6 benchmarks →Industrial Inspection1 tasks · 1/7
1MVTec Anomaly Detection DatasetAnomaly Detection2Kolektor Surface Defect Dataset 2Anomaly Detection3MVTec 3D Anomaly Detection DatasetAnomaly Detection
1 tasks, 7 benchmarks →Reinforcement Learning2 tasks · 2/2
Graphs2 tasks · 1/3
Audio4 tasks · 1/8
1AudioSetAudio Classification2Environmental Sound Classification 50Audio Classification3MusicCapsText-to-Audio
4 tasks, 8 benchmarks →Robots1 tasks · 0/2
5 resultsaccuracyEst. 2019Latest: Jun 2025
6 resultssuccess-rateEst. 2024Latest: Apr 2025
6 resultssuccess-rateEst. 2023Latest: Apr 2025
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer EnvironmentsActive·Web & Desktop Agents
5 resultssuccess-rateEst. 2024Latest: Apr 2025
5 resultsnormalized-scoreEst. 2024Latest: Apr 2025
5 resultstask-horizon-minutesEst. 2024Latest: Apr 2025
6 resultsaurocEst. 2021Latest: Mar 2025
15 resultsresolve-rateEst. 2024Latest: Feb 2025
11 resultsaccuracyEst. 2024Latest: Feb 2025
9 resultsaccuracyEst. 2019Latest: Feb 2025
8 resultsaccuracyEst. 2023Latest: Feb 2025
15 resultsaccuracyEst. 2020Latest: Feb 2025
12 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
All benchmarks
scut-ctw1500Active
82 resultsaccuracyEst. 2020Latest: Dec 2024
cnn-/-daily-mailSaturated
80 resultsaccuracyEst. 2020Latest: May 2023
icdar2013Legacy
39 resultsaccuracyEst. 2020Latest: Aug 2023
dartNeeds Research
32 resultsaccuracyEst. 2020Latest: Oct 2023
tabfactNeeds Research
23 resultsaccuracyEst. 2020Latest: Dec 2024
icdar2015Needs Research
26 resultsaccuracyEst. 2020Latest: Aug 2023
inverse-textNeeds Research
18 resultsaccuracyEst. 2020Latest: May 2023
videodb's-ocr-benchmark-public-collectionNeeds Research
15 resultsaccuracyEst. 2020Latest: Feb 2025
sun-rgb-dNeeds Research
19 resultsaccuracyEst. 2020Latest: Jun 2021
CodeSearchNetNeeds Research
14 resultsaccuracyEst. 2020Latest: Sep 2024
pendigitsNeeds Research
15 resultsaccuracyEst. 2020Latest: May 2021
lam(line-level)Needs Research
12 resultsaccuracyEst. 2020Latest: Sep 2024
read2016(line-level)Needs Research
9 resultsaccuracyEst. 2020Latest: Sep 2024
iam(line-level)Needs Research
9 resultsaccuracyEst. 2020Latest: Sep 2024
howsumm-stepNeeds Research
11 resultsaccuracyEst. 2020Latest: Oct 2021
e2eNeeds Research
10 resultsaccuracyEst. 2020Latest: Jul 2021
howsumm-methodNeeds Research
9 resultsaccuracyEst. 2020Latest: Oct 2021
urdudocNeeds Research
9 resultsaccuracyEst. 2020Latest: Jun 2023
wikibioNeeds Research
8 resultsaccuracyEst. 2020Latest: Feb 2021
KITAB Arabic OCR BenchmarkNeeds Research
8 resultscerEst. 2024
codesearchnet---phpNeeds Research
8 resultsaccuracyEst. 2020Latest: Apr 2021
codesearchnet---javascriptNeeds Research
8 resultsaccuracyEst. 2020Latest: Apr 2021
belfortNeeds Research
8 resultsaccuracyEst. 2020Latest: Jun 2023
reuters-21578Needs Research
8 resultsaccuracyEst. 2020Latest: Mar 2020
codesearchnet---javaNeeds Research
8 resultsaccuracyEst. 2020Latest: Apr 2021
read-2016Needs Research
4 resultsaccuracyEst. 2020Latest: Sep 2024
CodeSOTA VerificationActive
codesearchnet---goNeeds Research
7 resultsaccuracyEst. 2020Latest: Apr 2021
benchmarking-chinese-text-recognition:-datasets,-bNeeds Research
7 resultsaccuracyEst. 2020Latest: Aug 2023
codesearchnet---pythonNeeds Research
7 resultsaccuracyEst. 2020Latest: Apr 2021
codesearchnet---rubyNeeds Research
7 resultsaccuracyEst. 2020Latest: Apr 2021
OCR WER BenchmarkActive
OCR CER BenchmarkActive
webnlg-(all)Needs Research
6 resultsaccuracyEst. 2020Latest: Jul 2021
mldoc-zero-shot-english-to-spanishNeeds Research
6 resultsaccuracyEst. 2020Latest: Sep 2019
hocNeeds Research
6 resultsaccuracyEst. 2020Latest: Oct 2022
tobacco-small-3482Needs Research
6 resultsaccuracyEst. 2020Latest: Apr 2020
webnlg-(seen)Needs Research
6 resultsaccuracyEst. 2020Latest: Jul 2021
mldoc-zero-shot-english-to-frenchNeeds Research
6 resultsaccuracyEst. 2020Latest: Sep 2019
webnlg-(unseen)Needs Research
6 resultsaccuracyEst. 2020Latest: Jul 2021
Thai OCR BenchmarkNeeds Research
5 resultsted-scoreEst. 2024
mldoc-zero-shot-english-to-russianNeeds Research
5 resultsaccuracyEst. 2020Latest: Sep 2019
wikipedia-person-and-animal-datasetNeeds Research
5 resultsaccuracyEst. 2020Latest: Feb 2020
mldoc-zero-shot-english-to-germanNeeds Research
5 resultsaccuracyEst. 2020Latest: Sep 2019
mldoc-zero-shot-english-to-chineseNeeds Research
5 resultsaccuracyEst. 2020Latest: Sep 2019
dwieNeeds Research
1 resultsaccuracyEst. 2020Latest: Dec 2024
re-docredNeeds Research
1 resultsaccuracyEst. 2020Latest: Dec 2024
bc8Needs Research
1 resultsaccuracyEst. 2020Latest: Jan 2025
hyperpartisan-news-detectionNeeds Research
1 resultsaccuracyEst. 2020Latest: Oct 2024
docred-ieNeeds Research
1 resultsaccuracyEst. 2020Latest: Apr 2024
lunNeeds Research
1 resultsaccuracyEst. 2020Latest: Oct 2024
mldoc-zero-shot-english-to-italianNeeds Research
4 resultsaccuracyEst. 2020Latest: Sep 2019
bbcsportNeeds Research
4 resultsaccuracyEst. 2020Latest: Mar 2020
stdwNeeds Research
4 resultsaccuracyEst. 2020Latest: Aug 2022
twitterNeeds Research
3 resultsaccuracyEst. 2020Latest: Mar 2020
cub-200-2011Needs Research
3 resultsaccuracyEst. 2020Latest: Dec 2023
sutNeeds Research
3 resultsaccuracyEst. 2020Latest: Nov 2023
rotowireNeeds Research
3 resultsaccuracyEst. 2020Latest: Aug 2021
fsns---testNeeds Research
3 resultsaccuracyEst. 2020Latest: Dec 2017
bbc-xsumNeeds Research
3 resultsaccuracyEst. 2020Latest: Jul 2020
dareczechNeeds Research
3 resultsaccuracyEst. 2020Latest: Dec 2021
reuters-rcv1/rcv2-german-to-englishNeeds Research
3 resultsaccuracyEst. 2020Latest: Dec 2014
amazonNeeds Research
3 resultsaccuracyEst. 2020Latest: Mar 2020
mldoc-zero-shot-english-to-japaneseNeeds Research
3 resultsaccuracyEst. 2020Latest: Sep 2019
reuters-rcv1/rcv2-english-to-germanNeeds Research
3 resultsaccuracyEst. 2020Latest: Dec 2014
recipeNeeds Research
2 resultsaccuracyEst. 2020Latest: Dec 2019
simaraNeeds Research
2 resultsaccuracyEst. 2020Latest: Apr 2023
wos-5736Needs Research
2 resultsaccuracyEst. 2020Latest: Sep 2017
i2l-140kNeeds Research
2 resultsaccuracyEst. 2020Latest: Feb 2018
classicNeeds Research
2 resultsaccuracyEst. 2020Latest: Dec 2019
scidocs-(mag)Needs Research
2 resultsaccuracyEst. 2020Latest: Feb 2022
aapdNeeds Research
2 resultsaccuracyEst. 2020Latest: Feb 2020
icdar-2019Needs Research
2 resultsaccuracyEst. 2020Latest: Mar 2022
scidocs-(mesh)Needs Research
2 resultsaccuracyEst. 2020Latest: Feb 2022
dise-2021-datasetNeeds Research
2 resultsaccuracyEst. 2020Latest: Oct 2022
imdb-mNeeds Research
2 resultsaccuracyEst. 2020Latest: Mar 2021
cedar-signatureNeeds Research
2 resultsaccuracyEst. 2020Latest: Sep 2020
textzoomNeeds Research
2 resultsaccuracyEst. 2020Latest: Nov 2022
clueweb09-bNeeds Research
2 resultsaccuracyEst. 2020Latest: Jun 2019
australianNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
warppie10pNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
wos-11967Needs Research
1 resultsaccuracyEst. 2020Latest: Sep 2017
digital-peterNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
hkrNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
food-101Needs Research
1 resultsaccuracyEst. 2020Latest: Dec 2020
baNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
arxiv-summarization-datasetNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2021
iam-bNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
mldoc-zero-shot-german-to-frenchNeeds Research
1 resultsaccuracyEst. 2020Latest: May 2018
iam-dNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
saint-gallNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
reuters-de-enNeeds Research
1 resultsaccuracyEst. 2020Latest: Oct 2014
mpqaNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2019
wineNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
textsegNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2022
pixraw10pNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
im2latex-100kNeeds Research
1 resultsaccuracyEst. 2020Latest: Feb 2018
jaffeNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
arxiv-hep-th-citation-graphNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2021
irisNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
and-datasetNeeds Research
1 resultsaccuracyEst. 2020Latest: Sep 2020
reuters-en-deNeeds Research
1 resultsaccuracyEst. 2020Latest: Oct 2014
yelp-14Needs Research
1 resultsaccuracyEst. 2020Latest: Apr 2019
wikilingua-(tr->en)Needs Research
1 resultsaccuracyEst. 2020Latest: Dec 2021
wos-46985Needs Research
1 resultsaccuracyEst. 2020Latest: Sep 2017
benthamNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
scene-text-recognition-benchmarksNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2022
ephoieNeeds Research
1 resultsaccuracyEst. 2020Latest: Apr 2022
cl-scisummNeeds Research
1 resultsaccuracyEst. 2020Latest: Sep 2019
CodeSOTA Polish OCR BenchmarkNeeds Research
cerEst. 2025
IMPACT Polish Digital Libraries Ground TruthNeeds Research
cerEst. 2012
Scanned Receipts OCR and Information ExtractionNeeds Research
f1Est. 2019
PolEval 2021 OCR Post-Correction TaskNeeds Research
cerEst. 2021
188 resultsf1Est. 2015Latest: Apr 2023
Total-TextNeeds Research
108 resultsf1Est. 2017Latest: Aug 2023
msra-td500Needs Research
61 resultsaccuracyEst. 2020Latest: Aug 2023
icdar-2013Legacy
49 resultsaccuracyEst. 2020Latest: Jul 2022
icdar-2017-mltNeeds Research
42 resultsaccuracyEst. 2020Latest: Dec 2019
coco-textNeeds Research
33 resultsaccuracyEst. 2020Latest: May 2023
Curved Text in the Wild 1500Needs Research
18 resultsf1Est. 2019Latest: Feb 2022
8 resultsaccuracyEst. 2023
ic19-artNeeds Research
8 resultsaccuracyEst. 2020Latest: Aug 2023
ICDAR 2019 Arbitrary-Shaped TextNeeds Research
4 resultsf1Est. 2019Latest: Sep 2019
ic19-rectsNeeds Research
1 resultsaccuracyEst. 2020Latest: Jun 2019
publaynet-valActive
85 resultsaccuracyEst. 2020Latest: Dec 2024
document-layout-recognition-challenge-testNeeds Research
18 resultsaccuracyEst. 2020
document-layout-recognition-challenge-mini-devNeeds Research
12 resultsaccuracyEst. 2020
u-diads-bibNeeds Research
8 resultsaccuracyEst. 2020Latest: Sep 2024
d4laNeeds Research
3 resultsaccuracyEst. 2020Latest: Dec 2024
svtActive
40 resultsaccuracyEst. 2020Latest: Aug 2023
iiit5kNeeds Research
21 resultsaccuracyEst. 2020Latest: Aug 2023
cute80Needs Research
20 resultsaccuracyEst. 2020Latest: Aug 2023
svtpNeeds Research
19 resultsaccuracyEst. 2020Latest: Aug 2023
icdar-2003Needs Research
12 resultsaccuracyEst. 2020Latest: Mar 2022
wostNeeds Research
5 resultsaccuracyEst. 2020Latest: May 2023
uber-textNeeds Research
3 resultsaccuracyEst. 2020Latest: May 2023
hostNeeds Research
3 resultsaccuracyEst. 2020Latest: May 2023
msdaNeeds Research
2 resultsaccuracyEst. 2020Latest: Aug 2021
svt-pNeeds Research
1 resultsaccuracyEst. 2020Latest: May 2023
ic13Needs Research
1 resultsaccuracyEst. 2020Latest: May 2023
OmniDocBench v1.5Needs Research
28 resultscompositeEst. 2024
olmOCR-BenchActive
28 resultspass-rateEst. 2024
rvl-cdipActive
33 resultsaccuracyEst. 2020Latest: Dec 2024
tobacco-3482Needs Research
14 resultsaccuracyEst. 2020Latest: Jan 2023
noisy-bangla-numeralNeeds Research
2 resultsaccuracyEst. 2020Latest: Aug 2019
noisy-bangla-charactersNeeds Research
2 resultsaccuracyEst. 2020Latest: Aug 2019
noisy-mnistNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2019
n-mnistNeeds Research
1 resultsaccuracyEst. 2020Latest: Jun 2018
aipNeeds Research
1 resultsaccuracyEst. 2020Latest: Mar 2021
13 resultsmAPEst. 2014
16 resultsmask-apEst. 2019Latest: Nov 2024
Pascal Visual Object Classes Challenge 2012Needs Research
6 resultsmAPEst. 2012Latest: Dec 2015
OCRBench v2Active
32 resultsoverall-en-privateEst. 2024
Comprehensive Challenge OCRNeeds Research
12 resultsmulti-scene-f1Est. 2024
MME Video OCR BenchmarkNeeds Research
6 resultstotal-accuracyEst. 2024
reVISION Polish Vision-Language BenchmarkNeeds Research
accuracyEst. 2025
IAM Handwriting DatabaseActive
22 resultscerEst. 1999Latest: Sep 2024
8 resultshandwritten-levenshteinEst. 2024
cerEst. 2011
kohtdNeeds Research
4 resultsaccuracyEst. 2020Latest: Sep 2021
banglalekha-isolated-datasetNeeds Research
3 resultsaccuracyEst. 2020Latest: Aug 2020
an-extensive-dataset-of-handwritten-central-kurdisNeeds Research
1 resultsaccuracyEst. 2020Latest: Oct 2022
EMNIST Extended with Polish DiacriticsNeeds Research
accuracyEst. 2020
pubtabnetActive
18 resultsaccuracyEst. 2020Latest: Apr 2024
table-recognition-challenge-mini-testNeeds Research
12 resultsaccuracyEst. 2020
table-recognition-challenge-testNeeds Research
6 resultsaccuracyEst. 2020
wtwNeeds Research
1 resultsaccuracyEst. 2020Latest: Mar 2023
1 resultsaccuracyEst. 2020Latest: Apr 2021
16 resultstop-1-accuracyEst. 2012
Canadian Institute for Advanced Research 100Needs Research
4 resultsaccuracyEst. 2009
Canadian Institute for Advanced Research 10Needs Research
3 resultsaccuracyEst. 2009
ImageNet-V2 Matched FrequencyNeeds Research
2 resultstop-1-accuracyEst. 2019
7 resultsmAPEst. 2022
f1Est. 2019
6 resultsmIoUEst. 2016
Cityscapes DatasetNeeds Research
mIoUEst. 2016
NYU Depth V2Needs Research
abs-relEst. 2012
KITTI DepthNeeds Research
abs-relEst. 2012
VBenchNeeds Research
compositeEst. 2023
EvalCrafterNeeds Research
compositeEst. 2023
LVIS Zero-ShotNeeds Research
mapEst. 2019
OmniLabelNeeds Research
mapEst. 2023
ImageNet Zero-ShotNeeds Research
top-1-accuracyEst. 2009
Something-Something V2Needs Research
top-1-accuracyEst. 2017
UCF-101Needs Research
top-1-accuracyEst. 2012
Kinetics-400Needs Research
top-1-accuracyEst. 2017
SA-1BNeeds Research
iouEst. 2023
MPII Human PoseNeeds Research
accuracyEst. 2014
COCO KeypointsNeeds Research
mapEst. 2014
LSUN Bedroom FIDNeeds Research
fidEst. 2015
CIFAR-10 FIDNeeds Research
fidEst. 2009
ImageNet Linear ProbeNeeds Research
top-1-accuracyEst. 2009
GSO (Google Scanned Objects)Needs Research
compositeEst. 2022
I2VBenchNeeds Research
compositeEst. 2024
DAVISNeeds Research
j-and-fEst. 2016
T3BenchNeeds Research
compositeEst. 2023
Mathematics Aptitude Test of HeuristicsNeeds Research
34 resultsaccuracyEst. 2021
Grade School Math 8KActive
15 resultsaccuracyEst. 2021
5 resultsaccuracyEst. 2025
American Invitational Mathematics Examination 2024Needs Research
8 resultsaccuracyEst. 2024
24 resultsaccuracyEst. 2021
AI2 Reasoning ChallengeNeeds Research
10 resultsaccuracyEst. 2018
HellaSwagNeeds Research
5 resultsaccuracyEst. 2019
CommonsenseQANeeds Research
3 resultsaccuracyEst. 2019
WinoGrandeNeeds Research
3 resultsaccuracyEst. 2019
24 resultsaccuracyEst. 2024
BIG-Bench Hard (BBH)Active
5 resultsaccuracyEst. 2022
StrategyQANeeds Research
2 resultsaccuracyEst. 2021
HotpotQANeeds Research
2 resultsf1Est. 2018
5 resultsaccuracyEst. 2024
3 resultsaccuracyEst. 2025
LogiQAActive
2 resultsaccuracyEst. 2020
2 resultsaccuracyEst. 2020
3 resultsaccuracyEst. 2016
Simple Variations on Arithmetic Math Word ProblemsNeeds Research
3 resultsaccuracyEst. 2021
30 resultspass@1Est. 2021Latest: Sep 2024
SWE-bench Verified SubsetNeeds Research
38 resultsresolve-rateEst. 2024
LiveCodeBenchActive
25 resultspass@1Est. 2024Latest: Mar 2024
Mostly Basic Python ProblemsNeeds Research
19 resultspass@1Est. 2021Latest: Sep 2024
SWE-bench: Software Engineering BenchmarkNeeds Research
resolve-rateEst. 2023
MBPP+ Extended VersionNeeds Research
pass@1Est. 2023
Automated Programming Progress StandardNeeds Research
pass@1Est. 2021
CodeContests Competitive ProgrammingNeeds Research
pass@1Est. 2022
HumanEval+ Extended VersionNeeds Research
pass@1Est. 2023
7 resultscomputational-accuracyEst. 2020Latest: Sep 2024
6 resultsaccuracyEst. 2019Latest: Sep 2024
6 resultsexact-matchEst. 2023Latest: Sep 2024
5 resultscorrect-patchesEst. 2014Latest: Apr 2024
39 resultssmapiEst. 2018Latest: Dec 2024
12 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
OpenML-CC18Needs Research
5 resultsaccuracyEst. 2019Latest: Jun 2025
California HousingNeeds Research
2 resultsrmseEst. 1997
7 resultsaverage-scoreEst. 2018Latest: Jul 2024
SuperGLUENeeds Research
7 resultsaverage-scoreEst. 2019Latest: Jul 2024
9 resultsf1Est. 2018Latest: Jul 2024
CNN/DailyMail SummarizationSaturated
15 resultsrouge-1Est. 2015Latest: Jul 2024
8 resultsaccuracyEst. 2015Latest: Jul 2024
7 resultsf1Est. 2003Latest: Jul 2024
MTEB LeaderboardNeeds Research
6 resultsaccuracyEst. 2022Latest: Sep 2024
WMT'23Needs Research
4 resultsbleuEst. 2023
FLORES-200Needs Research
bleuEst. 2022
XNLINeeds Research
3 resultsaccuracyEst. 2018Latest: Jan 2023
WikiTableQuestionsNeeds Research
3 resultsaccuracyEst. 2015Latest: Apr 2020
SQANeeds Research
accuracyEst. 2017
STS BenchmarkNeeds Research
3 resultsspearmanEst. 2017Latest: Jan 2024
WikiText PerplexityNeeds Research
perplexityEst. 2016
6 resultssuccess-rateEst. 2023Latest: Apr 2025
5 resultssuccess-rateEst. 2024Latest: Apr 2025
15 resultsresolve-rateEst. 2024Latest: Feb 2025
6 resultssuccess-rateEst. 2024Latest: Apr 2025
5 resultsnormalized-scoreEst. 2024Latest: Apr 2025
5 resultstask-horizon-minutesEst. 2024Latest: Apr 2025
11 resultsmean-dscEst. 2015Latest: Jan 2024
3 resultsmean-dice-wt-tc-etEst. 2023Latest: Jun 2024
6 resultsmean-dscEst. 2015Latest: Aug 2023
6 resultsmean-dscEst. 2017Latest: Mar 2024
21 resultsaccuracyEst. 2012
7 resultsaurocEst. 2019
NIH Clinical Center Chest X-ray DatasetNeeds Research
4 resultsaurocEst. 2017
RSNA Pneumonia Detection ChallengeNeeds Research
3 resultsmapEst. 2018Latest: Jan 2024
3 resultsaurocEst. 2019
Autism Brain Imaging Data Exchange IINeeds Research
2 resultsaccuracyEst. 2017
VinDr-CXR: Vietnamese Dataset for Chest RadiographNeeds Research
2 resultsaurocEst. 2022
COVID-19 Image Data CollectionNeeds Research
2 resultsaurocEst. 2020
PadChest: A Large Chest X-ray Image DatasetNeeds Research
1 resultsaurocEst. 2020
11 resultsaccuracyEst. 2024Latest: Feb 2025
9 resultsaccuracyEst. 2019Latest: Feb 2025
8 resultsaccuracyEst. 2023Latest: Feb 2025
7 resultsaccuracyEst. 2017Latest: Oct 2024
accuracyEst. 2019
accuracyEst. 2019
COCO CaptionsActive
2 resultsciderEst. 2015Latest: Jan 2023
ciderEst. 2019
accuracyEst. 2024
AudioBenchNeeds Research
accuracyEst. 2024
DEMON BenchNeeds Research
accuracyEst. 2024
DPG-BenchNeeds Research
compositeEst. 2024
GenEvalNeeds Research
accuracyEst. 2023
MJHQ-30K FIDNeeds Research
fidEst. 2024
MagicBrushNeeds Research
clip-scoreEst. 2023
InstructPix2PixNeeds Research
clip-scoreEst. 2023
ViDoReNeeds Research
ndcg-at-5Est. 2024
VideoBenchNeeds Research
compositeEst. 2024
LibriSpeech ASR CorpusActive
17 resultswerEst. 2015Latest: Apr 2024
werEst. 2025
Mozilla Common VoiceNeeds Research
3 resultswerEst. 2019Latest: Dec 2022
CSTR VCTK CorpusActive
6 resultsmosEst. 2019Latest: Jun 2024
The LJ Speech DatasetNeeds Research
5 resultsmosEst. 2017Latest: Jun 2024
9 resultsaurocEst. 2019Latest: Aug 2023
Kolektor Surface Defect Dataset 2Needs Research
6 resultsaurocEst. 2021Latest: Aug 2024
MVTec 3D Anomaly Detection DatasetNeeds Research
6 resultsaurocEst. 2021Latest: Mar 2025
X-Ray Weld Defect Detection DatasetNeeds Research
1 resultsmapEst. 2021
Visual Anomaly DatasetNeeds Research
3 resultsaurocEst. 2022
NEU Surface Defect DatabaseNeeds Research
1 resultsmapEst. 2013
Severstal Steel Defect DetectionNeeds Research
1 resultsdiceEst. 2019
9 resultshuman-normalized-scoreEst. 2013
9 resultsaverage-returnEst. 2012
Cora Citation NetworkActive
6 resultsaccuracyEst. 2000Latest: Apr 2019
Open Graph BenchmarkNeeds Research
accuracyEst. 2020
OGB (Open Graph Benchmark)Needs Research
accuracyEst. 2020
AudioSetActive
mapEst. 2017
Environmental Sound Classification 50Needs Research
accuracyEst. 2015
DIHARDNeeds Research
derEst. 2018
AVA-SpeechNeeds Research
accuracyEst. 2018
VCTK (Voice Conversion)Needs Research
pesqEst. 2019
DNS ChallengeNeeds Research
si-snrEst. 2020