Image Classification Standard

ImageNet (ILSVRC)

The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is the world's most influential computer vision benchmark. Since 2010, it has served as the primary catalyst for the deep learning revolution, benchmarking over 1.4 million images across 1,000 object categories.

SOTA Top-1
86.60%
MaxViT
Total Images
14.2M
Full Dataset
Classes
1,000
ILSVRC Subset
Citations
71k+
Original Paper

The Benchmark that Changed Everything

Before ImageNet, computer vision datasets were small and specialized. In 2009, researchers from Stanford and Princeton introduced a dataset of unprecedented scale, organized according to the WordNet hierarchy.

The annual ILSVRC competition (2010–2017) provided a standardized evaluation framework that allowed researchers to compare architectures fairly. The 2012 victory of AlexNet marked the definitive shift from hand-crafted features (like SIFT) to end-to-end learned representations via Convolutional Neural Networks (CNNs).

Key Innovations

  • Standardized 1,000-class subset for reproducible research.
  • Hierarchical structure enabling fine-grained classification.
  • Established Top-1 and Top-5 error as industry-standard metrics.
ImageNet Category Distribution

VISUALIZATION 01

Synset Hierarchy: From "Mammal" to "Golden Retriever"

Error Evolution

The rapid decline of classification error rates over the ILSVRC era.

CNN Models
Human Baseline (5.1%)
28.2%
NEC-UIUC
2010
16.4%
AlexNet
2012
6.7%
GoogLeNet
2014
3.57%
ResNet
2015
2.25%
SENet
2017
0.00%
ViT-G/14
2021

Current SOTA Leaderboard

Metric: Top-1 Accuracy (%)
RankModel ArchitectureTop-1 AccDateResources
1
maxvit_base_tf_512.in1k
ImageNet-1K Fine-tuned
86.598%2023-04
2
coatnet_2_rw_224.sw_in12k_ft_in1k
ImageNet-1K Fine-tuned
86.580%2022-09
3
nextvit_large.bd_ssld_6m_in1k_384
ImageNet-1K Fine-tuned
86.542%2022-11
4
coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k
ImageNet-1K Fine-tuned
86.540%2022-09
5
nextvit_base.bd_ssld_6m_in1k_384
ImageNet-1K Fine-tuned
86.364%2022-11
6
swin_large_patch4_window7_224.ms_in22k_ft_in1k
ImageNet-1K Fine-tuned
86.330%2021-03
7
convnext_base.fb_in22k_ft_in1k
ImageNet-1K Fine-tuned
86.298%2022-01
8
hgnetv2_b6.ssld_stage1_in22k_in1k
ImageNet-1K Fine-tuned
86.298%2023-05
9
maxvit_base_tf_384.in1k
ImageNet-1K Fine-tuned
86.294%2023-04
10
swinv2_base_window12to16_192to256.ms_in22k_ft_in1k
ImageNet-1K Fine-tuned
86.276%2022-06

Note: Many modern models use ImageNet-21K for pre-training before evaluating on ImageNet-1K.

Dataset Variants

While ILSVRC 2012 is the "standard" ImageNet, the ecosystem has expanded to address specific challenges like scale, robustness, and distribution shift.

ImageNet-1K (ILSVRC)

1.28M Images1,000 Classes

Standard Benchmark

ImageNet-21K

14M Images21,841 Classes

Large-scale Pre-training

ImageNet-v2

10K Images1,000 Classes

Robustness Testing

ImageNet-C / R

N/A Images1,000 Classes

Corruption & Rendition

The Evaluation Pipeline

01

Preprocessing

Resizing to 224x224 or 384x384, center cropping, and normalization.

02

Inference

Forward pass through the model to generate class logits.

03

Softmax

Converting logits to a probability distribution over 1,000 classes.

04

Scoring

Checking if the ground truth label is the top prediction (Top-1).

Implementation & Tools

Foundational Papers

Top-1 Accuracy

The standard metric for ImageNet. It measures the percentage of test images where the model's highest-probability prediction exactly matches the ground truth label. As of 2024, SOTA models exceed 90% Top-1 accuracy on the 1K validation set.

Accuracy = (Correct Predictions) / (Total Images)

Top-5 Error

Historically used when classification was more difficult. A "success" is counted if the correct label is among the model's top 5 predictions. This was the primary metric for the original ILSVRC competitions.

Error = 1 - (Correct in Top 5) / (Total Images)

Related Benchmarks

BenchmarkFocusScaleKey Difference
CIFAR-10/100Small-scale classification60k images (32x32)Low resolution, toy dataset
COCODetection & Segmentation330k imagesFocus on object localization
PASCAL VOCObject Recognition11k imagesPre-dated ImageNet scale

Access the ImageNet Dataset

Ready to train your own models? Access the official ImageNet database for research and non-commercial use. Requires registration and institutional affiliation.

Download Dataset