Codesota · Computer VisionCOCO · ImageNet · ADE20K · Pascal VOCBenchmarks hub
Home·Computer Vision·Benchmarks

Computer vision,
measured in pixels.

From classification on ImageNet to detection on COCO to segmentation on ADE20K — the models that see, and what they see well. Every score dated, every metric defined, every dataset linked.

Descriptions in serif; scores in tabular mono; navigation in sans.

§ 01 · Metrics

Reading the numbers.

Three families of metric cover nearly every computer-vision leaderboard. Each asks a different question of the model — localization, classification, or pixel-level agreement.

Mean Average Precision
mAP

The gold standard for object detection. Measures how well the model places bounding boxes and classifies objects.

  • AP50: Easy mode. IoU threshold > 50%.
  • AP75: Hard mode. IoU threshold > 75% (tight boxes).
  • mAP (COCO): Average across IoU 0.50 to 0.95.
Used in — COCO, Pascal VOC
Classification Accuracy
Top-1 / Top-5

For image classification. Top-1 is exact match, Top-5 is correct class in top five predictions.

  • Top-1: Percentage where the top prediction is correct.
  • Top-5: Percentage where correct class is in top 5.
  • Higher is better: 90% means 90% accuracy.
Used in — ImageNet, CIFAR-10/100
Mean Intersection over Union
mIoU

For semantic segmentation. Measures pixel-level overlap between prediction and ground truth.

  • Pixel-level: Evaluates every pixel in the image.
  • IoU per class: Calculated for each semantic class.
  • Mean: Average IoU across all classes.
Used in — ADE20K, Cityscapes
Image
Dog 0.98
IoU = Area(Overlap) / Area(Union)
§ 02 · Coverage

Three task families.

Detection, classification, and segmentation — the backbone tasks of modern computer vision. Each card links to its leaderboard below.

Task family
Object Detection

Locating and classifying objects with bounding boxes. COCO and Pascal VOC benchmarks.

Metric: mAP (Mean Average Precision)
Task family
Image Classification

Categorizing images into predefined classes. ImageNet and CIFAR benchmarks.

Metric: Top-1 / Top-5 Accuracy
Task family
Semantic Segmentation

Pixel-level classification of images. ADE20K and Cityscapes benchmarks.

Metric: mIoU (Mean IoU)
§ 03 · Detection

Object detection.

Locating and classifying objects with bounding boxes. Higher mAP is better. Shaded row marks current state of the art on COCO.

#ModelVendorCOCO mAPPascal VOC mAPArchitecture
01InternImage-HShanghai AI Lab65.4Deformable Convolution
02Co-DETR (Swin-L)Research66.0Transformer Detector
03DINO (Swin-L)Research63.3Transformer Detector
04YOLOv10-XTsinghua57.4CNN (Real-time)
05EfficientDet-D7xGoogle55.1EfficientNet+BiFPN
Fig 2 · Em-dash means no result on file for that model × dataset pair — not evidence of weakness.
§ 04 · Classification

Image classification.

Categorizing images into predefined classes. Higher accuracy is better.

Coming soon

ImageNet and CIFAR classification benchmarks will be added soon.

§ 05 · Segmentation

Semantic segmentation.

Pixel-level classification of images. Higher mIoU is better.

#ModelVendorADE20K mIoUCityscapes mIoUArchitecture
01InternImage-HShanghai AI Lab62.9Deformable Convolution
02Mask2Former (Swin-L)Meta57.3Transformer
Fig 3 · Em-dash means no result on file for that model × dataset pair.
§ 06 · Datasets

The benchmarks.

Every canonical computer-vision dataset, grouped by task. Click through for the paper or the dataset download.

Object detection
COCO
2014

330K images, 1.5 million object instances, 80 object categories. Standard benchmark for object detection and segmentation.

Task
object-detection
Images
330,000
Pascal VOC 2012
2012

11,530 images with 27,450 ROI annotated objects and 6,929 segmentations. Classic object detection benchmark.

Task
object-detection
Images
11,530
Image classification
ImageNet-1K
2012

1.28M training images, 50K validation images across 1,000 object classes. The standard benchmark for image classification since 2012.

Task
image-classification
Images
1,281,167
ImageNet Linear Probe
2012

Linear classification on frozen ImageNet-1K features. Used to evaluate representation quality of self-supervised and contrastive models without fine-tuning the backbone.

Task
image-classification
Images
1,281,167
ImageNet-V2
2019

10K new test images following ImageNet collection process. Tests model generalization beyond the original test set.

Task
image-classification
Images
10,000
CIFAR-10
2009

60K 32x32 color images in 10 classes. Classic small-scale image classification benchmark with 50K training and 10K test images.

Task
image-classification
Images
60,000
Semantic segmentation
Cityscapes
2016

5,000 images with fine annotations and 20,000 with coarse annotations of urban street scenes.

Task
semantic-segmentation
Images
25,000
ADE20K
2016

20K training, 2K validation images annotated with 150 object categories. Complex scene parsing benchmark.

Task
semantic-segmentation
Images
22,210
§ 07 · Related

Keep exploring.

Beyond detection, classification, and segmentation — adjacent sections of the vision registry.

Section
All computer-vision tasks
Scene text, OCR, depth, optical flow — the full index.
Section
All modalities
Text, vision, audio, agentic and multimodal combined.
Section
Vision hub
Editorial overview and news across computer vision.
Submit a result Back to Computer Vision