Benchmarks
327 benchmarks across 83 tasks in 15 research areas.
76Active
2509Results
83Tasks
15Areas
Top by area
Ranked by results, recency, and community interest.
Computer Vision27 tasks · 21/199
1ICDAR 2015 Incidental Scene TextScene Text Detection2Total-TextScene Text Detection3publaynet-valDocument Layout Analysis
27 tasks, 199 benchmarks →Computer Code5 tasks · 6/13
1SWE-bench Verified SubsetCode Generation2HumanEval: Hand-Written Evaluation SetCode Generation3LiveCodeBenchCode Generation
5 tasks, 13 benchmarks →Reasoning5 tasks · 9/19
1Mathematics Aptitude Test of HeuristicsMathematical Reasoning2Massive Multitask Language UnderstandingCommonsense Reasoning3Graduate-Level Google-Proof Q&AMulti-step Reasoning
5 tasks, 19 benchmarks →Time Series3 tasks · 6/8
1M4 Forecasting CompetitionTime Series Forecasting2Weather Time Series BenchmarkTime Series Forecasting3Electricity Transformer Temperature - 15-minute (ETTm1)Time Series Forecasting
3 tasks, 8 benchmarks →Agentic AI5 tasks · 6/6
1SWE-bench Verified — Agentic LeaderboardSWE-bench2WebArena: A Realistic Web Environment for Building Autonomous AgentsWeb & Desktop Agents3Human-Calibrated Autonomy Software TasksHCAST
5 tasks, 6 benchmarks →Medical2 tasks · 5/13
1Autism Brain Imaging Data Exchange IDisease Classification2Synapse Multi-Organ Abdominal CT SegmentationMedical Image Segmentation3Brain Tumor Segmentation Challenge 2023Medical Image Segmentation
2 tasks, 13 benchmarks →Multimodal10 tasks · 9/23
1Massive Multidiscipline Multimodal UnderstandingVisual Question Answering2TextVQA: Towards VQA Models That Can ReadVisual Question Answering3MMBench: Is Your Multi-modal Model an All-around Player?Visual Question Answering
10 tasks, 23 benchmarks →Natural Language Processing13 tasks · 4/17
1Stanford Question Answering Dataset v2.0Question Answering2CNN/DailyMail SummarizationText Summarization3Stanford Natural Language InferenceNatural Language Inference
13 tasks, 17 benchmarks →Speech2 tasks · 4/6
1LibriSpeech ASR CorpusSpeech Recognition2CSTR VCTK CorpusText-to-Speech3The LJ Speech DatasetText-to-Speech
2 tasks, 6 benchmarks →Mobile Development1 tasks · 1/1
Industrial Inspection1 tasks · 1/7
1MVTec Anomaly Detection DatasetAnomaly Detection2Kolektor Surface Defect Dataset 2Anomaly Detection3MVTec 3D Anomaly Detection DatasetAnomaly Detection
1 tasks, 7 benchmarks →Reinforcement Learning2 tasks · 2/2
Graphs2 tasks · 1/3
Audio4 tasks · 1/8
1AudioSetAudio Classification2Environmental Sound Classification 50Audio Classification3DIHARDVoice Activity Detection
4 tasks, 8 benchmarks →Robots1 tasks · 0/2
5 resultsaccuracyEst. 2019Latest: Jun 2025
6 resultssuccess-rateEst. 2023Latest: Apr 2025
6 resultssuccess-rateEst. 2024Latest: Apr 2025
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer EnvironmentsActive·Web & Desktop Agents
5 resultssuccess-rateEst. 2024Latest: Apr 2025
5 resultsnormalized-scoreEst. 2024Latest: Apr 2025
5 resultstask-horizon-minutesEst. 2024Latest: Apr 2025
6 resultsaurocEst. 2021Latest: Mar 2025
15 resultsresolve-rateEst. 2024Latest: Feb 2025
11 resultsaccuracyEst. 2024Latest: Feb 2025
9 resultsaccuracyEst. 2019Latest: Feb 2025
8 resultsaccuracyEst. 2023Latest: Feb 2025
15 resultsaccuracyEst. 2020Latest: Feb 2025
12 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
All benchmarks
scut-ctw1500Active
82 resultsaccuracyEst. 2020Latest: Dec 2024
cnn-/-daily-mailSaturated
80 resultsaccuracyEst. 2020Latest: May 2023
icdar2013Legacy
39 resultsaccuracyEst. 2020Latest: Aug 2023
dartNeeds Research
32 resultsaccuracyEst. 2020Latest: Oct 2023
tabfactNeeds Research
23 resultsaccuracyEst. 2020Latest: Dec 2024
icdar2015Needs Research
26 resultsaccuracyEst. 2020Latest: Aug 2023
inverse-textNeeds Research
18 resultsaccuracyEst. 2020Latest: May 2023
videodb's-ocr-benchmark-public-collectionNeeds Research
15 resultsaccuracyEst. 2020Latest: Feb 2025
sun-rgb-dNeeds Research
19 resultsaccuracyEst. 2020Latest: Jun 2021
CodeSearchNetNeeds Research
14 resultsaccuracyEst. 2020Latest: Sep 2024
pendigitsNeeds Research
15 resultsaccuracyEst. 2020Latest: May 2021
lam(line-level)Needs Research
12 resultsaccuracyEst. 2020Latest: Sep 2024
read2016(line-level)Needs Research
9 resultsaccuracyEst. 2020Latest: Sep 2024
iam(line-level)Needs Research
9 resultsaccuracyEst. 2020Latest: Sep 2024
howsumm-stepNeeds Research
11 resultsaccuracyEst. 2020Latest: Oct 2021
e2eNeeds Research
10 resultsaccuracyEst. 2020Latest: Jul 2021
howsumm-methodNeeds Research
9 resultsaccuracyEst. 2020Latest: Oct 2021
urdudocNeeds Research
9 resultsaccuracyEst. 2020Latest: Jun 2023
KITAB Arabic OCR BenchmarkNeeds Research
8 resultscerEst. 2024
belfortNeeds Research
8 resultsaccuracyEst. 2020Latest: Jun 2023
wikibioNeeds Research
8 resultsaccuracyEst. 2020Latest: Feb 2021
codesearchnet---phpNeeds Research
8 resultsaccuracyEst. 2020Latest: Apr 2021
codesearchnet---javascriptNeeds Research
8 resultsaccuracyEst. 2020Latest: Apr 2021
codesearchnet---javaNeeds Research
8 resultsaccuracyEst. 2020Latest: Apr 2021
reuters-21578Needs Research
8 resultsaccuracyEst. 2020Latest: Mar 2020
read-2016Needs Research
4 resultsaccuracyEst. 2020Latest: Sep 2024
CodeSOTA VerificationActive
benchmarking-chinese-text-recognition:-datasets,-bNeeds Research
7 resultsaccuracyEst. 2020Latest: Aug 2023
codesearchnet---pythonNeeds Research
7 resultsaccuracyEst. 2020Latest: Apr 2021
codesearchnet---rubyNeeds Research
7 resultsaccuracyEst. 2020Latest: Apr 2021
codesearchnet---goNeeds Research
7 resultsaccuracyEst. 2020Latest: Apr 2021
OCR CER BenchmarkActive
OCR WER BenchmarkActive
webnlg-(all)Needs Research
6 resultsaccuracyEst. 2020Latest: Jul 2021
webnlg-(seen)Needs Research
6 resultsaccuracyEst. 2020Latest: Jul 2021
webnlg-(unseen)Needs Research
6 resultsaccuracyEst. 2020Latest: Jul 2021
mldoc-zero-shot-english-to-frenchNeeds Research
6 resultsaccuracyEst. 2020Latest: Sep 2019
tobacco-small-3482Needs Research
6 resultsaccuracyEst. 2020Latest: Apr 2020
hocNeeds Research
6 resultsaccuracyEst. 2020Latest: Oct 2022
mldoc-zero-shot-english-to-spanishNeeds Research
6 resultsaccuracyEst. 2020Latest: Sep 2019
Thai OCR BenchmarkNeeds Research
5 resultsted-scoreEst. 2024
mldoc-zero-shot-english-to-germanNeeds Research
5 resultsaccuracyEst. 2020Latest: Sep 2019
wikipedia-person-and-animal-datasetNeeds Research
5 resultsaccuracyEst. 2020Latest: Feb 2020
mldoc-zero-shot-english-to-russianNeeds Research
5 resultsaccuracyEst. 2020Latest: Sep 2019
mldoc-zero-shot-english-to-chineseNeeds Research
5 resultsaccuracyEst. 2020Latest: Sep 2019
re-docredNeeds Research
1 resultsaccuracyEst. 2020Latest: Dec 2024
bc8Needs Research
1 resultsaccuracyEst. 2020Latest: Jan 2025
dwieNeeds Research
1 resultsaccuracyEst. 2020Latest: Dec 2024
hyperpartisan-news-detectionNeeds Research
1 resultsaccuracyEst. 2020Latest: Oct 2024
docred-ieNeeds Research
1 resultsaccuracyEst. 2020Latest: Apr 2024
lunNeeds Research
1 resultsaccuracyEst. 2020Latest: Oct 2024
stdwNeeds Research
4 resultsaccuracyEst. 2020Latest: Aug 2022
mldoc-zero-shot-english-to-italianNeeds Research
4 resultsaccuracyEst. 2020Latest: Sep 2019
bbcsportNeeds Research
4 resultsaccuracyEst. 2020Latest: Mar 2020
twitterNeeds Research
3 resultsaccuracyEst. 2020Latest: Mar 2020
cub-200-2011Needs Research
3 resultsaccuracyEst. 2020Latest: Dec 2023
rotowireNeeds Research
3 resultsaccuracyEst. 2020Latest: Aug 2021
mldoc-zero-shot-english-to-japaneseNeeds Research
3 resultsaccuracyEst. 2020Latest: Sep 2019
dareczechNeeds Research
3 resultsaccuracyEst. 2020Latest: Dec 2021
reuters-rcv1/rcv2-german-to-englishNeeds Research
3 resultsaccuracyEst. 2020Latest: Dec 2014
sutNeeds Research
3 resultsaccuracyEst. 2020Latest: Nov 2023
bbc-xsumNeeds Research
3 resultsaccuracyEst. 2020Latest: Jul 2020
fsns---testNeeds Research
3 resultsaccuracyEst. 2020Latest: Dec 2017
amazonNeeds Research
3 resultsaccuracyEst. 2020Latest: Mar 2020
reuters-rcv1/rcv2-english-to-germanNeeds Research
3 resultsaccuracyEst. 2020Latest: Dec 2014
recipeNeeds Research
2 resultsaccuracyEst. 2020Latest: Dec 2019
i2l-140kNeeds Research
2 resultsaccuracyEst. 2020Latest: Feb 2018
scidocs-(mesh)Needs Research
2 resultsaccuracyEst. 2020Latest: Feb 2022
imdb-mNeeds Research
2 resultsaccuracyEst. 2020Latest: Mar 2021
wos-5736Needs Research
2 resultsaccuracyEst. 2020Latest: Sep 2017
cedar-signatureNeeds Research
2 resultsaccuracyEst. 2020Latest: Sep 2020
clueweb09-bNeeds Research
2 resultsaccuracyEst. 2020Latest: Jun 2019
icdar-2019Needs Research
2 resultsaccuracyEst. 2020Latest: Mar 2022
textzoomNeeds Research
2 resultsaccuracyEst. 2020Latest: Nov 2022
classicNeeds Research
2 resultsaccuracyEst. 2020Latest: Dec 2019
aapdNeeds Research
2 resultsaccuracyEst. 2020Latest: Feb 2020
dise-2021-datasetNeeds Research
2 resultsaccuracyEst. 2020Latest: Oct 2022
simaraNeeds Research
2 resultsaccuracyEst. 2020Latest: Apr 2023
scidocs-(mag)Needs Research
2 resultsaccuracyEst. 2020Latest: Feb 2022
warppie10pNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
australianNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
textsegNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2022
saint-gallNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
scene-text-recognition-benchmarksNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2022
reuters-de-enNeeds Research
1 resultsaccuracyEst. 2020Latest: Oct 2014
food-101Needs Research
1 resultsaccuracyEst. 2020Latest: Dec 2020
hkrNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
wos-11967Needs Research
1 resultsaccuracyEst. 2020Latest: Sep 2017
benthamNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
pixraw10pNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
im2latex-100kNeeds Research
1 resultsaccuracyEst. 2020Latest: Feb 2018
mpqaNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2019
reuters-en-deNeeds Research
1 resultsaccuracyEst. 2020Latest: Oct 2014
yelp-14Needs Research
1 resultsaccuracyEst. 2020Latest: Apr 2019
and-datasetNeeds Research
1 resultsaccuracyEst. 2020Latest: Sep 2020
arxiv-hep-th-citation-graphNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2021
irisNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
wos-46985Needs Research
1 resultsaccuracyEst. 2020Latest: Sep 2017
ephoieNeeds Research
1 resultsaccuracyEst. 2020Latest: Apr 2022
iam-bNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
cl-scisummNeeds Research
1 resultsaccuracyEst. 2020Latest: Sep 2019
iam-dNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
wikilingua-(tr->en)Needs Research
1 resultsaccuracyEst. 2020Latest: Dec 2021
mldoc-zero-shot-german-to-frenchNeeds Research
1 resultsaccuracyEst. 2020Latest: May 2018
digital-peterNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2021
baNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
arxiv-summarization-datasetNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2021
wineNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
jaffeNeeds Research
1 resultsaccuracyEst. 2020Latest: Nov 2020
CodeSOTA Polish OCR BenchmarkNeeds Research
cerEst. 2025
IMPACT Polish Digital Libraries Ground TruthNeeds Research
cerEst. 2012
PolEval 2021 OCR Post-Correction TaskNeeds Research
cerEst. 2021
Scanned Receipts OCR and Information ExtractionNeeds Research
f1Est. 2019
188 resultsf1Est. 2015Latest: Apr 2023
Total-TextNeeds Research
108 resultsf1Est. 2017Latest: Aug 2023
msra-td500Needs Research
61 resultsaccuracyEst. 2020Latest: Aug 2023
icdar-2013Legacy
49 resultsaccuracyEst. 2020Latest: Jul 2022
icdar-2017-mltNeeds Research
42 resultsaccuracyEst. 2020Latest: Dec 2019
coco-textNeeds Research
33 resultsaccuracyEst. 2020Latest: May 2023
Curved Text in the Wild 1500Needs Research
18 resultsf1Est. 2019Latest: Feb 2022
8 resultsaccuracyEst. 2023
ic19-artNeeds Research
8 resultsaccuracyEst. 2020Latest: Aug 2023
ICDAR 2019 Arbitrary-Shaped TextNeeds Research
4 resultsf1Est. 2019Latest: Sep 2019
ic19-rectsNeeds Research
1 resultsaccuracyEst. 2020Latest: Jun 2019
publaynet-valActive
85 resultsaccuracyEst. 2020Latest: Dec 2024
document-layout-recognition-challenge-testNeeds Research
18 resultsaccuracyEst. 2020
document-layout-recognition-challenge-mini-devNeeds Research
12 resultsaccuracyEst. 2020
u-diads-bibNeeds Research
8 resultsaccuracyEst. 2020Latest: Sep 2024
d4laNeeds Research
3 resultsaccuracyEst. 2020Latest: Dec 2024
svtActive
40 resultsaccuracyEst. 2020Latest: Aug 2023
iiit5kNeeds Research
21 resultsaccuracyEst. 2020Latest: Aug 2023
cute80Needs Research
20 resultsaccuracyEst. 2020Latest: Aug 2023
svtpNeeds Research
19 resultsaccuracyEst. 2020Latest: Aug 2023
icdar-2003Needs Research
12 resultsaccuracyEst. 2020Latest: Mar 2022
wostNeeds Research
5 resultsaccuracyEst. 2020Latest: May 2023
uber-textNeeds Research
3 resultsaccuracyEst. 2020Latest: May 2023
hostNeeds Research
3 resultsaccuracyEst. 2020Latest: May 2023
msdaNeeds Research
2 resultsaccuracyEst. 2020Latest: Aug 2021
ic13Needs Research
1 resultsaccuracyEst. 2020Latest: May 2023
svt-pNeeds Research
1 resultsaccuracyEst. 2020Latest: May 2023
OmniDocBench v1.5Needs Research
28 resultscompositeEst. 2024
olmOCR-BenchActive
28 resultspass-rateEst. 2024
rvl-cdipActive
33 resultsaccuracyEst. 2020Latest: Dec 2024
tobacco-3482Needs Research
14 resultsaccuracyEst. 2020Latest: Jan 2023
noisy-bangla-numeralNeeds Research
2 resultsaccuracyEst. 2020Latest: Aug 2019
noisy-bangla-charactersNeeds Research
2 resultsaccuracyEst. 2020Latest: Aug 2019
noisy-mnistNeeds Research
1 resultsaccuracyEst. 2020Latest: Aug 2019
n-mnistNeeds Research
1 resultsaccuracyEst. 2020Latest: Jun 2018
aipNeeds Research
1 resultsaccuracyEst. 2020Latest: Mar 2021
13 resultsmAPEst. 2014
16 resultsmask-apEst. 2019Latest: Nov 2024
Pascal Visual Object Classes Challenge 2012Needs Research
6 resultsmAPEst. 2012Latest: Dec 2015
OCRBench v2Active
32 resultsoverall-en-privateEst. 2024
Comprehensive Challenge OCRNeeds Research
12 resultsmulti-scene-f1Est. 2024
MME Video OCR BenchmarkNeeds Research
6 resultstotal-accuracyEst. 2024
reVISION Polish Vision-Language BenchmarkNeeds Research
accuracyEst. 2025
IAM Handwriting DatabaseActive
22 resultscerEst. 1999Latest: Sep 2024
8 resultshandwritten-levenshteinEst. 2024
cerEst. 2011
kohtdNeeds Research
4 resultsaccuracyEst. 2020Latest: Sep 2021
banglalekha-isolated-datasetNeeds Research
3 resultsaccuracyEst. 2020Latest: Aug 2020
an-extensive-dataset-of-handwritten-central-kurdisNeeds Research
1 resultsaccuracyEst. 2020Latest: Oct 2022
EMNIST Extended with Polish DiacriticsNeeds Research
accuracyEst. 2020
pubtabnetActive
18 resultsaccuracyEst. 2020Latest: Apr 2024
table-recognition-challenge-mini-testNeeds Research
12 resultsaccuracyEst. 2020
table-recognition-challenge-testNeeds Research
6 resultsaccuracyEst. 2020
wtwNeeds Research
1 resultsaccuracyEst. 2020Latest: Mar 2023
1 resultsaccuracyEst. 2020Latest: Apr 2021
16 resultstop-1-accuracyEst. 2012
Canadian Institute for Advanced Research 100Needs Research
4 resultsaccuracyEst. 2009
Canadian Institute for Advanced Research 10Needs Research
3 resultsaccuracyEst. 2009
ImageNet-V2 Matched FrequencyNeeds Research
2 resultstop-1-accuracyEst. 2019
7 resultsmAPEst. 2022
f1Est. 2019
6 resultsmIoUEst. 2016
Cityscapes DatasetNeeds Research
mIoUEst. 2016
LSUN Bedroom FIDNeeds Research
fidEst. 2015
CIFAR-10 FIDNeeds Research
fidEst. 2009
NYU Depth V2Needs Research
abs-relEst. 2012
KITTI DepthNeeds Research
abs-relEst. 2012
EvalCrafterNeeds Research
compositeEst. 2023
VBenchNeeds Research
compositeEst. 2023
ImageNet Zero-ShotNeeds Research
top-1-accuracyEst. 2009
UCF-101Needs Research
top-1-accuracyEst. 2012
Kinetics-400Needs Research
top-1-accuracyEst. 2017
Something-Something V2Needs Research
top-1-accuracyEst. 2017
OmniLabelNeeds Research
mapEst. 2023
LVIS Zero-ShotNeeds Research
mapEst. 2019
DAVISNeeds Research
j-and-fEst. 2016
COCO KeypointsNeeds Research
mapEst. 2014
MPII Human PoseNeeds Research
accuracyEst. 2014
I2VBenchNeeds Research
compositeEst. 2024
GSO (Google Scanned Objects)Needs Research
compositeEst. 2022
ImageNet Linear ProbeNeeds Research
top-1-accuracyEst. 2009
T3BenchNeeds Research
compositeEst. 2023
SA-1BNeeds Research
iouEst. 2023
Mathematics Aptitude Test of HeuristicsNeeds Research
34 resultsaccuracyEst. 2021
Grade School Math 8KActive
15 resultsaccuracyEst. 2021
5 resultsaccuracyEst. 2025
American Invitational Mathematics Examination 2024Needs Research
8 resultsaccuracyEst. 2024
24 resultsaccuracyEst. 2021
AI2 Reasoning ChallengeNeeds Research
10 resultsaccuracyEst. 2018
HellaSwagNeeds Research
5 resultsaccuracyEst. 2019
CommonsenseQANeeds Research
3 resultsaccuracyEst. 2019
WinoGrandeNeeds Research
3 resultsaccuracyEst. 2019
24 resultsaccuracyEst. 2024
BIG-Bench Hard (BBH)Active
5 resultsaccuracyEst. 2022
StrategyQANeeds Research
2 resultsaccuracyEst. 2021
HotpotQANeeds Research
2 resultsf1Est. 2018
5 resultsaccuracyEst. 2024
3 resultsaccuracyEst. 2025
LogiQAActive
2 resultsaccuracyEst. 2020
2 resultsaccuracyEst. 2020
3 resultsaccuracyEst. 2016
Simple Variations on Arithmetic Math Word ProblemsNeeds Research
3 resultsaccuracyEst. 2021
SWE-bench Verified SubsetNeeds Research
38 resultsresolve-rateEst. 2024
30 resultspass@1Est. 2021Latest: Sep 2024
LiveCodeBenchActive
25 resultspass@1Est. 2024Latest: Mar 2024
Mostly Basic Python ProblemsNeeds Research
19 resultspass@1Est. 2021Latest: Sep 2024
SWE-bench: Software Engineering BenchmarkNeeds Research
resolve-rateEst. 2023
MBPP+ Extended VersionNeeds Research
pass@1Est. 2023
HumanEval+ Extended VersionNeeds Research
pass@1Est. 2023
Automated Programming Progress StandardNeeds Research
pass@1Est. 2021
CodeContests Competitive ProgrammingNeeds Research
pass@1Est. 2022
7 resultscomputational-accuracyEst. 2020Latest: Sep 2024
6 resultsexact-matchEst. 2023Latest: Sep 2024
6 resultsaccuracyEst. 2019Latest: Sep 2024
5 resultscorrect-patchesEst. 2014Latest: Apr 2024
39 resultssmapiEst. 2018Latest: Dec 2024
12 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
6 resultsmseEst. 2021Latest: Feb 2025
OpenML-CC18Needs Research
5 resultsaccuracyEst. 2019Latest: Jun 2025
California HousingNeeds Research
2 resultsrmseEst. 1997
7 resultsaverage-scoreEst. 2018Latest: Jul 2024
SuperGLUENeeds Research
7 resultsaverage-scoreEst. 2019Latest: Jul 2024
9 resultsf1Est. 2018Latest: Jul 2024
CNN/DailyMail SummarizationSaturated
15 resultsrouge-1Est. 2015Latest: Jul 2024
8 resultsaccuracyEst. 2015Latest: Jul 2024
7 resultsf1Est. 2003Latest: Jul 2024
MTEB LeaderboardNeeds Research
6 resultsaccuracyEst. 2022Latest: Sep 2024
WMT'23Needs Research
4 resultsbleuEst. 2023
FLORES-200Needs Research
bleuEst. 2022
XNLINeeds Research
3 resultsaccuracyEst. 2018Latest: Jan 2023
STS BenchmarkNeeds Research
3 resultsspearmanEst. 2017Latest: Jan 2024
WikiTableQuestionsNeeds Research
3 resultsaccuracyEst. 2015Latest: Apr 2020
SQANeeds Research
accuracyEst. 2017
WikiText PerplexityNeeds Research
perplexityEst. 2016
6 resultssuccess-rateEst. 2023Latest: Apr 2025
5 resultssuccess-rateEst. 2024Latest: Apr 2025
15 resultsresolve-rateEst. 2024Latest: Feb 2025
6 resultssuccess-rateEst. 2024Latest: Apr 2025
5 resultsnormalized-scoreEst. 2024Latest: Apr 2025
5 resultstask-horizon-minutesEst. 2024Latest: Apr 2025
11 resultsmean-dscEst. 2015Latest: Jan 2024
3 resultsmean-dice-wt-tc-etEst. 2023Latest: Jun 2024
6 resultsmean-dscEst. 2017Latest: Mar 2024
6 resultsmean-dscEst. 2015Latest: Aug 2023
21 resultsaccuracyEst. 2012
7 resultsaurocEst. 2019
NIH Clinical Center Chest X-ray DatasetNeeds Research
4 resultsaurocEst. 2017
RSNA Pneumonia Detection ChallengeNeeds Research
3 resultsmapEst. 2018Latest: Jan 2024
3 resultsaurocEst. 2019
VinDr-CXR: Vietnamese Dataset for Chest RadiographNeeds Research
2 resultsaurocEst. 2022
Autism Brain Imaging Data Exchange IINeeds Research
2 resultsaccuracyEst. 2017
COVID-19 Image Data CollectionNeeds Research
2 resultsaurocEst. 2020
PadChest: A Large Chest X-ray Image DatasetNeeds Research
1 resultsaurocEst. 2020
11 resultsaccuracyEst. 2024Latest: Feb 2025
9 resultsaccuracyEst. 2019Latest: Feb 2025
8 resultsaccuracyEst. 2023Latest: Feb 2025
7 resultsaccuracyEst. 2017Latest: Oct 2024
accuracyEst. 2019
accuracyEst. 2019
COCO CaptionsActive
2 resultsciderEst. 2015Latest: Jan 2023
ciderEst. 2019
accuracyEst. 2024
AudioBenchNeeds Research
accuracyEst. 2024
DEMON BenchNeeds Research
accuracyEst. 2024
DPG-BenchNeeds Research
compositeEst. 2024
MJHQ-30K FIDNeeds Research
fidEst. 2024
GenEvalNeeds Research
accuracyEst. 2023
ViDoReNeeds Research
ndcg-at-5Est. 2024
InstructPix2PixNeeds Research
clip-scoreEst. 2023
MagicBrushNeeds Research
clip-scoreEst. 2023
VideoBenchNeeds Research
compositeEst. 2024
LibriSpeech ASR CorpusActive
17 resultswerEst. 2015Latest: Apr 2024
werEst. 2025
Mozilla Common VoiceNeeds Research
3 resultswerEst. 2019Latest: Dec 2022
CSTR VCTK CorpusActive
6 resultsmosEst. 2019Latest: Jun 2024
The LJ Speech DatasetNeeds Research
5 resultsmosEst. 2017Latest: Jun 2024
9 resultsaurocEst. 2019Latest: Aug 2023
Kolektor Surface Defect Dataset 2Needs Research
6 resultsaurocEst. 2021Latest: Aug 2024
MVTec 3D Anomaly Detection DatasetNeeds Research
6 resultsaurocEst. 2021Latest: Mar 2025
X-Ray Weld Defect Detection DatasetNeeds Research
1 resultsmapEst. 2021
Visual Anomaly DatasetNeeds Research
3 resultsaurocEst. 2022
NEU Surface Defect DatabaseNeeds Research
1 resultsmapEst. 2013
Severstal Steel Defect DetectionNeeds Research
1 resultsdiceEst. 2019
40 resultsrequirement-satisfactionEst. 2025
9 resultshuman-normalized-scoreEst. 2013
9 resultsaverage-returnEst. 2012
Cora Citation NetworkActive
6 resultsaccuracyEst. 2000Latest: Apr 2019
Open Graph BenchmarkNeeds Research
accuracyEst. 2020
OGB (Open Graph Benchmark)Needs Research
accuracyEst. 2020
AudioSetActive
mapEst. 2017
Environmental Sound Classification 50Needs Research
accuracyEst. 2015
DIHARDNeeds Research
derEst. 2018
AVA-SpeechNeeds Research
accuracyEst. 2018
VCTK (Voice Conversion)Needs Research
pesqEst. 2019
DNS ChallengeNeeds Research
si-snrEst. 2020