Computer Vision

Research focused on enabling computers to interpret and understand visual information from images and videos, including tasks such as image classification, object detection, segmentation, and visual recognition.

15 tasks202 datasets94 results

Tasks & Benchmarks

Show all datasets and SOTA results

Object Detection

COCO2014
66.12(box-map)ScyllaNet
COCO 2014 val2014
COCO test-dev2014
COCO val20172014
DIOR
ImageNet Detection (ILSVRC DET)
ImageNet Localization (ILSVRC LOC)
LVIS v1.02019
71.4(box-ap)DINO-X
PASCAL VOC 2007
Pascal VOC 20122012
80(mAP-coco-pretrain)SSD512 (VGG-16)
Roboflow100-VL (RF100-VL)

Image Classification

CIFAR-102009
99.1(accuracy)DeiT-B Distilled
CIFAR-10
CIFAR-100
CIFAR-1002009
94.55(accuracy)ViT-H/14
CUB (CUB-200-2011)
DTD
GEO-Bench (classification suite)
Galaxy10
ImageNet2009
97.75(top-5-accuracy)SENet
ImageNet Real
ImageNet V2
ImageNet-1K2012
91(top-1-accuracy)CoCa (finetuned)
ImageNet-R
ImageNet-S
ImageNet-V22019
84(top-1-accuracy)Swin Transformer V2 Large
Met (Metropolitan Museum artworks)
ObjectNet
Oxford Flowers-102
Places205
Places365
Stanford Cars
VTAB (19 tasks)
aircr.
iNat 2017
iNat 2018
iNat 2019
iNaturalist 2021

Image segmentation

ADE20K
BRAVO (OOD)
BSDS500
0.77(ODS)Segment Anything Model (SAM)
COCO 2017 Instance Segmentation
46.5(mAP)Segment Anything Model (SAM)
44.7(mAP)Segment Anything Model (SAM)
LoveDA
Oxford-IIIT Pets
PASCAL VOC 2012

OCR

Fox (English subset, 600-1300 text tokens)
OCRBench
860(Score)HunyuanOCR (1B)
OmniDocBench v1.02024
OmniDocBench v1.52024
olmOCR-Bench2025

Image editing

GEdit-Bench
ImgEdit (ImgEdit benchmark)
KRIS-Bench
PICABench
RISEBench

Image generation

CVTG-2K
GenEval
ICE-Bench (Task1-31 Overall)
ImageNet 1024x1024
ImageNet 256x256
ImageNet 512x512
LongText-Bench
OmniContext
OneIG-EN
OneIG-ZH
TIIF-Bench mini

Object counting

Open-Vocabulary Object Detection

ODinW13 (subset of ODinW)

Video classification

COIN
Diving-48
Epic-Kitchens-100 (EK100)
Kinetics-400
Something-Something V2
UCF101

Video generation

No datasets indexed yet. Contribute on GitHub

3D generation

No datasets indexed yet. Contribute on GitHub

Video segmentation

DAVIS
MOSE
YouTube-VOS

3D Understanding

CO3Dv2
DTU
Re10K
ScanNet-1500

Depth estimation

DA-2K
DDAD (relative)
DIODE (relative)
DIODE Outdoor (metric)
ETH3D (relative)
HyperSim (metric)
KITTI (metric)
KITTI (relative)
NYUv2 (metric)
NYUv2 (relative)
SUN RGB-D (metric)
ScanNet
Sintel (relative)
iBims-1 (metric)

Few-Shot Image Classification

COCO 2017 Captions
COCO 2017 Panoptic Segmentation
COCO 2017 Stuff
COCO Captions2015
COCO minival2014
COCO test-challenge2014
COCO val2017 (Instance Segmentation)
COCO-Stuff2018
COCO-Text2016
COCO-WholeBody2020
Crossmodal-3600 (XM3600)
DL3DV-Benchmarks (140)
HELMET
HiRoom
IMC (Image Matching Challenge)
ImageNet-Hard
LOFT
LVD-142M
LVD-1689M
Language benchmarks (overall)
MMVP
MRCR
NTIRE 2024 Transparent Surface Challenge (relative)
OCRBench v2
SAT-493M
SciVideoBench
TAP-Vid (RGB-S)
Tanks and Temples (6)

Get notified when these results update

New models drop weekly. We track them so you don't have to.