Computer Visionimage-classification

Image Classification

Image Classification is a fundamental task in computer vision that aims to assign a label or class to an entire image. The goal is to train a model that can recognize and categorize images into predefined classes.

28 datasets44 resultsView full task mapping →

Image classification assigns a single label to an entire image. It's the oldest deep learning benchmark and the task that proved neural networks work — ImageNet top-1 accuracy went from 63% (hand-crafted features, 2011) to 91%+ (SigLIP, 2024). Today it's largely solved for standard benchmarks, but domain-specific classification (medical, satellite, industrial) remains the real deployment challenge.

History

2009

ImageNet dataset (Deng et al.) created with 14M images across 21k categories, establishing the benchmark that would define a decade

2012

AlexNet (Krizhevsky et al.) wins ILSVRC with 16.4% top-5 error, 10+ points ahead of second place — proves deep learning works for vision

2014

VGGNet (19 layers) and GoogLeNet (Inception modules) push top-5 error to 6.7%, showing that depth matters

2015

ResNet introduces skip connections enabling 152-layer networks, achieves 3.57% top-5 error — surpassing human-level (5.1%)

2017

SENet wins last ILSVRC competition with 2.25% top-5 error; channel attention becomes standard

2019

EfficientNet (Tan & Le) uses neural architecture search to optimize width/depth/resolution scaling, sets new efficiency frontier

2020

Vision Transformer (ViT) by Dosovitskiy et al. proves transformers work for images when pretrained on large data (JFT-300M)

2021

CLIP (Radford et al.) and ALIGN show that contrastive language-image pretraining produces classification-capable representations without labeled data

2023

DINOv2 (Meta) achieves strong classification via self-supervised learning on 142M curated images, no labels needed

2024

SigLIP-SO400M achieves 91.1% ImageNet top-1 with sigmoid loss, and open foundation models make linear probing competitive with full fine-tuning

How Image Classification Works

1Input PreprocessingImages are resized (typical…2Feature ExtractionA backbone network (ResNet3PoolingSpatial features are collap…4Classification HeadA linear layer (or small ML…5InferenceAt test timeImage Classification Pipeline
1

Input Preprocessing

Images are resized (typically 224×224 or 384×384), normalized to dataset statistics, and augmented (random crop, flip, RandAugment, CutMix) during training.

2

Feature Extraction

A backbone network (ResNet, ConvNeXt, ViT, SigLIP encoder) processes the image into a high-dimensional feature map. CNNs use hierarchical convolutions; ViTs split the image into patches and apply self-attention.

3

Pooling

Spatial features are collapsed into a single vector — global average pooling for CNNs, the [CLS] token or mean pooling for transformers.

4

Classification Head

A linear layer (or small MLP) projects the pooled features to class logits. Softmax converts logits to probabilities, and cross-entropy loss drives training.

5

Inference

At test time, optional techniques like test-time augmentation (TTA) and model ensembling can boost accuracy by 0.5-1%. Top-1 and top-5 accuracy are the standard metrics.

Current Landscape

The image classification landscape in 2025 is mature and bifurcated. For standard benchmarks, the task is effectively solved — ImageNet top-1 has plateaued above 91%, and gains are measured in tenths of a percent. Vision transformers dominate at scale, while ConvNeXt proved CNNs can match them with modern training recipes. The real action is in foundation model representations: CLIP, SigLIP, DINOv2, and InternVL produce features so good that a linear probe rivals full fine-tuning, making the backbone choice matter more than the classification head. The practical question is no longer 'how accurate can we get on ImageNet' but 'which pretrained features transfer best to my specific domain.'

Key Challenges

Domain shift between training data (ImageNet, web-scraped) and deployment domains (medical imaging, satellite, industrial inspection) — models that hit 90%+ on benchmarks can drop to 60% on new distributions

Long-tail distributions where rare classes have very few training examples, common in real-world datasets like iNaturalist (8k+ species, some with <10 images)

Calibration — models are often overconfident on wrong predictions, which matters critically in medical and safety applications

Computational cost of ViT-Large/Huge models (300M-600M params) vs. deployment constraints on edge devices and mobile phones

Label noise in web-scraped training data (estimated 5-10% noise in ImageNet itself) propagates into learned representations

Quick Recommendations

Best accuracy (compute unlimited)

SigLIP-SO400M + linear probe or InternViT-6B

91.1% top-1 on ImageNet with minimal fine-tuning; SigLIP's sigmoid loss handles noisy data better than softmax-based CLIP

Best accuracy/efficiency tradeoff

ConvNeXt V2-Base or EVA-02-Base

85-86% top-1 at ~90M params, strong transfer to downstream tasks, runs well on a single GPU

Edge deployment / mobile

EfficientNet-B0 or MobileNetV3-Large

77-80% accuracy at 4-5M params, optimized for TFLite/ONNX, <10ms on modern phones

Few-shot / low-data regime

DINOv2-ViT-L + k-NN classifier

Self-supervised features generalize with as few as 5 examples per class, no fine-tuning needed

Open-vocabulary / zero-shot

SigLIP or OpenCLIP ViT-G/14

Classify into arbitrary text-described categories without retraining, 80%+ zero-shot on ImageNet

What's Next

The frontier is moving toward open-vocabulary classification (classify into any text-described category), continual learning (adapt to new classes without forgetting old ones), and multimodal classification that uses text, metadata, and images jointly. Foundation models pretrained on billions of image-text pairs are making task-specific classifiers obsolete for many applications. The remaining hard problems are fine-grained recognition under distribution shift and calibrated uncertainty estimation for safety-critical deployments.

Benchmarks & SOTA

ImageNet-1K

ImageNet Large Scale Visual Recognition Challenge 2012

201220 results

1.28M training images, 50K validation images across 1,000 object classes. The standard benchmark for image classification since 2012.

State of the Art

CoCa (finetuned)

Google

91

top-1-accuracy

ImageNet

ImageNet (ILSVRC)

200915 results

ImageNet Large Scale Visual Recognition Challenge (ILSVRC): the standard 1,000-class image classification benchmark. Sparked the deep learning revolution from 2010 onward.

State of the Art

SENet

Momenta

97.75

top-5-accuracy

CIFAR-100

Canadian Institute for Advanced Research 100

20094 results

60K 32x32 color images in 100 fine-grained classes grouped into 20 superclasses. More challenging than CIFAR-10.

State of the Art

ViT-H/14

Google

94.55

accuracy

CIFAR-10

Canadian Institute for Advanced Research 10

20093 results

60K 32x32 color images in 10 classes. Classic small-scale image classification benchmark with 50K training and 10K test images.

State of the Art

DeiT-B Distilled

Meta

99.1

accuracy

ImageNet-V2

ImageNet-V2 Matched Frequency

20192 results

10K new test images following ImageNet collection process. Tests model generalization beyond the original test set.

State of the Art

Swin Transformer V2 Large

Microsoft

84

top-1-accuracy

Met (Metropolitan Museum artworks)

The Met Dataset (Metropolitan Museum of Art dataset)

0 results

The Met (The Met dataset) is a large-scale instance-level recognition dataset built from the Metropolitan Museum of Art Open Access collection. The training set contains ~400k images covering more than 224k classes (each museum exhibit is treated as a distinct class), producing a long-tail / many-singleton distribution that resembles few-shot scenarios. The authors collected ground-truth for the query set from museum visitors (≈1,100 query images) to form the Met queries; additionally a set of out-of-distribution distractor queries is provided (images not related to The Met). Evaluation protocols used include average classification accuracy (ACC) on the Met queries and Global Average Precision (GAP). The dataset was introduced to support instance-level recognition and retrieval research in the artwork domain and to benchmark recognition under distribution shift between studio-like catalog images and in-the-wild visitor photos.

No results tracked yet

ObjectNet

ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models

0 results

ObjectNet is a bias-controlled, real-world out-of-distribution test set for object recognition designed to evaluate robustness of object classification models. By design it controls for common dataset biases (background, rotation, and viewpoint) and was collected via a crowdsourced, highly-automated image-capture and annotation pipeline. The dataset contains roughly 50,000 high-resolution images across 313 object classes (with ~113 classes overlapping ImageNet). ObjectNet is provided primarily as a test set (no paired training set) to measure true generalization; when evaluated, modern object-recognition models showed a large drop in performance (~40–45%) relative to standard benchmarks. The dataset website provides downloads, metadata, and label formats. Sources: NeurIPS 2019 paper (Barbu et al.) and the official ObjectNet site (objectnet.dev).

No results tracked yet

iNaturalist 2021

iNaturalist 2021 (iNat-2021) Challenge Dataset

0 results

iNaturalist 2021 (iNat-2021) is a large-scale fine-grained species recognition benchmark derived from the iNaturalist community observations and released for the FGVC8 / iNat Challenge (2021). The dataset is designed for large-scale, long-tailed image classification of plants/animals/insects with many visually similar classes. The iNat2021 challenge split contains roughly 10,000 species and ≈2.7 million training images (there is also a "mini" version with 50 images per species, ≈500K images). Images were collected and user-verified via iNaturalist, and the benchmark emphasizes real-world class imbalance and fine-grained discrimination. Common uses: supervised image classification, long-tailed / fine-grained recognition, and semi-supervised variants (e.g., Semi-iNat2021). Sources: FGVC8 iNat Challenge 2021 pages and the visipedia iNat competition repository (inat_comp/2021). Note: the original iNaturalist dataset was introduced in Van Horn et al., CVPR 2018 (arXiv:1707.06642); iNaturalist 2021 is a later challenge release built on the iNaturalist platform rather than a separate peer-reviewed dataset paper.

No results tracked yet

GEO-Bench (classification suite)

GEO-Bench: Toward Foundation Models for Earth Monitoring

0 results

GEO-Bench is a curated benchmark suite for Earth-monitoring (geospatial) tasks introduced in Lacoste et al., 2023. The benchmark comprises 12 downstream tasks (six classification and six segmentation tasks) assembled from multiple existing geospatial datasets and adapted to create a standard evaluation protocol for foundation models for Earth observation. The classification “suite” reported in the paper aggregates per-dataset classification tasks and reports mean classification scores across those tasks. GEO-Bench is multimodal in scope (covers optical/RGB, multispectral, SAR and other Earth-observation modalities according to the project resources) and includes code to run evaluations and reproduce results (see the project repository and paper supplement for the detailed list of component datasets and evaluation details). Source: Lacoste et al., “GEO-Bench: Toward Foundation Models for Earth Monitoring” (NeurIPS 2023 / arXiv:2306.03831) and the ServiceNow GEO-Bench GitHub repository.

No results tracked yet

ImageNet V2

ImageNet V2 (ImageNetV2)

0 results

ImageNet V2 (ImageNetV2) is a set of new test splits for the ImageNet (ILSVRC2012) classification benchmark created to measure how well ImageNet classifiers generalize to new data sampled from the same source. The authors followed the original ImageNet collection and labeling protocol and released a pool of candidate Flickr images, the final test splits, and rich metadata (Flickr queries, MTurk annotations). ImageNetV2 contains three curated 10,000-image test sets (each ~10k images): "matched-frequency" (matched-frequency), "top-images" (top-images), and "threshold-0.7" (threshold-0.7) — corresponding to different image-selection strategies described in the paper. The label space matches ImageNet2012 (1000 classes). ImageNetV2 was introduced and evaluated in Recht et al., "Do ImageNet Classifiers Generalize to ImageNet?", and is intended as an independent, distribution-matched testbed to detect adaptive overfitting and measure true generalization of ImageNet models.

No results tracked yet

ImageNet-R

ImageNet-R (ImageNet-Rendition)

0 results

ImageNet-R (ImageNet-Rendition) is a robustness / out-of-distribution evaluation dataset consisting of artistic and non-photorealistic renditions of ImageNet classes. It contains renditions such as art, cartoons, deviantart, graffiti, embroidery, graphics, origami, paintings, patterns, plastic objects, plush objects, sculptures, sketches, tattoos, toys, and video game renderings. The dataset covers 200 ImageNet class WordNet IDs (the same label space as ImageNet) and contains about 30,000 images. It was created to test ImageNet-trained models’ robustness to style/domain shifts and to evaluate performance on non-photorealistic renditions of the same object classes (introduced as part of the datasets in Hendrycks et al., “The Many Faces of Robustness”).

No results tracked yet

ImageNet-S

ImageNet-Sketch (ImageNet-S)

0 results

ImageNet-S (ImageNet-Sketch) is an out-of-domain sketch image dataset aligned to the 1000 ImageNet classes, created to evaluate models' semantic robustness at ImageNet scale. The original release contains roughly 50,000 images (commonly reported as ~50,889 images / ≈50 images per class for the 1000 classes). Images were collected via Google Image queries of the form “sketch of <class>” (searching within a black-and-white color scheme), manually cleaned to remove irrelevant or mislabelled images, and in some cases augmented (flipping/rotations) when fewer than the target number of images were available for a class. The dataset is widely used as an OOD/robustness benchmark for image-classification models. (Sources: original ImageNet-Sketch GitHub, PapersWithCode dataset page, TensorFlow Datasets, Hugging Face dataset cards.)

No results tracked yet

Places205

Places205 (MIT Places Database)

0 results

Places205 (part of the MIT Places Database) is a large scene-centric image dataset for scene recognition / scene classification. The dataset contains 205 scene categories and roughly 2.5 million images for training (the project reports ~2,448,873 images in some listings). Standard splits include a validation set with 100 images per category (20,500 images total) and a test set with 200 images per category (41,000 images total). The dataset was released by the CSAIL Vision group (MIT) and is intended for academic research and educational use (license restricts commercial redistribution of the images). Homepage and download information are provided by the MIT Places project.

No results tracked yet

iNat 2017

iNaturalist 2017 (iNat 2017) - iNaturalist Species Classification and Detection Dataset

0 results

iNaturalist 2017 (iNat 2017) is a large-scale fine-grained species classification dataset released for the iNaturalist 2017 challenge. It contains 5,089 categories (species) with approximately 579,184 training images and 95,986 validation images (total ~675k images). Images were contributed by citizen scientists and exhibit varying image quality, heavy class imbalance (long-tailed distribution), and many visually similar species, making the benchmark challenging for real-world species classification. The original release also included some bounding-box annotations, though most uses are image-level (single-label) classification; test labels were not publicly released by the organizers. Introduced in Van Horn et al., “The iNaturalist Species Classification and Detection Dataset” (arXiv:1707.06642, CVPR 2018).

No results tracked yet

CIFAR-10

CIFAR-10 (Canadian Institute for Advanced Research-10)

0 results

CIFAR-10 is a widely used image classification benchmark introduced as a labeled subset of the Tiny Images collection. It contains 60,000 32x32 color images in 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck), with 6,000 images per class. The standard split has 50,000 training images and 10,000 test images; the original release is organized into five training batches and one test batch (each batch contains 10,000 images). CIFAR-10 was created by Alex Krizhevsky (University of Toronto) and described in the technical report “Learning Multiple Layers of Features from Tiny Images” (2009). The images are drawn from the 80 Million Tiny Images dataset; note that the Tiny Images collection has since been the subject of dataset-level concerns and partial retraction, but CIFAR-10 remains a standard benchmark for small-image classification and transfer experiments. Common usage: training and evaluating image classification models (standard metric: classification accuracy on the 10k test images). Source / dataset homepage: https://www.cs.toronto.edu/~kriz/cifar.html. Canonical Hugging Face dataset page: https://huggingface.co/datasets/uoft-cs/cifar10.

No results tracked yet

iNat 2018

iNaturalist 2018 (iNat 2018) — iNaturalist Species Classification and Detection Dataset

0 results

The iNaturalist 2018 dataset (iNat 2018) is a large-scale, fine-grained species classification (and detection) dataset built from observations on the iNaturalist platform. Introduced by Van Horn et al., it emphasizes real-world challenges: long-tailed / highly imbalanced class distributions, many visually similar species, varied image quality and capture conditions, and global coverage. The dataset contains on the order of 0.8–0.9 million images from several thousand species (the paper reports ~859,000 images from over 5,000 species) and was released together with the FGVC / CVPR 2018 challenge (often referenced as the iNaturalist 2018 competition). It has been widely used as a benchmark for fine-grained and long-tailed image classification; variants/splits for the competition (training/validation/test) and detection labels were also provided for challenge participants. Common mirrors / references: the original CVPR paper and arXiv entry (see arXiv:1707.06642), Kaggle competition pages (iNaturalist 2018 / FGVC5), and dataset builders in TFDS and community uploads on Hugging Face.

No results tracked yet

iNat 2019

iNaturalist 2019 (iNat Challenge 2019, FGVC6)

0 results

iNaturalist 2019 (iNat 2019) is a fine-grained species classification dataset and challenge derived from observations on the citizen-science platform iNaturalist. The 2019 FGVC (iNat Challenge 2019) release was organized as part of FGVC6/CVPR 2019 and focuses on large-scale, real-world species recognition with many visually similar categories and a highly imbalanced class distribution. The FGVC6 challenge page reports that the 2019 split contains 1,010 species with a combined training+validation set of 268,243 images (images verified by multiple users on iNaturalist). The dataset is intended for image classification (species identification) and has been widely used as a fine-grained classification benchmark; papers typically report top-1 classification accuracy when evaluating models trained or fine-tuned on this split. The iNaturalist project more broadly (earlier/larger releases) was introduced in the CVPR 2018 paper “The iNaturalist Species Classification and Detection Dataset” (Van Horn et al.), arXiv:1707.06642 / CVPR 2018.

No results tracked yet

Stanford Cars

Stanford Cars (Cars196)

0 results

The Stanford Cars dataset (also referred to as Cars196) is a fine-grained image classification benchmark of car make/model/year. It contains 16,185 images of cars across ~196 classes (the original FGVC13 paper refers to 197 classes; common dataset distributions and usages report 196 classes). Images are labeled at the car model (often including year) and are commonly provided with a roughly 50/50 train/test split (8,144 training images and 8,041 test images). The dataset was collected and released by Jonathan Krause, Jia Deng, Michael Stark and Li Fei-Fei (Stanford); it is widely used for fine-grained categorization and metric-learning / retrieval experiments and often distributed with metadata (class labels, model/maker/year) and bounding-box annotations.

No results tracked yet

DTD

Describable Textures Dataset (DTD)

0 results

The Describable Textures Dataset (DTD) is a benchmark dataset of textures “in the wild” designed for human-centric texture description and classification. It contains 5,640 images organized into 47 describable texture categories (120 images per category). Images are natural/web images and are annotated with a vocabulary of 47 texture attributes (semantic terms). The dataset provides predefined evaluation splits (DTD R1.0.1 uses a 1/3 train, 1/3 validation, 1/3 test split and the authors provide split files). DTD was introduced in the paper “Describing Textures in the Wild” (Cimpoi et al., CVPR 2014 / arXiv:1311.3618) and is widely used for texture classification and attribute-recognition tasks.

No results tracked yet

Galaxy10

Galaxy10 (Galaxy10 / Galaxy10 DECaLS)

0 results

Galaxy10 is a CIFAR10-like galaxy morphology image classification dataset derived from Galaxy Zoo labels and optical survey imaging. The dataset provides RGB (g, r, i-band) cutouts of galaxies grouped into 10 broad morphology classes (e.g., smooth/round, disk face-on no spiral, edge-on disk, cigar-shaped, etc.). There are multiple published/used variants: the original Galaxy10/Galaxy10 SDSS release (images from the Sloan Digital Sky Survey; common counts reported in documentation are ~21.7k–25.7k images at 69×69 px) and the Galaxy10 DECaLS variant (images replaced/updated with higher-quality DESI Legacy Imaging Surveys / DECaLS images; the Hugging Face mirror of this variant lists ~17.7k images at 256×256 px). Labels originate from Galaxy Zoo volunteer votes. Typical use is supervised image classification (galaxy morphology). Note: some papers that evaluate on “Galaxy10” report using specific training subsets (for example the provided paper reports ~11,000 training samples across 10 classes for their experiments). Primary community resources: the original GitHub and astroNN documentation (henrysky/Galaxy10 and astroNN docs) and a Hugging Face dataset mirror at matthieulel/galaxy10_decals.

No results tracked yet

aircr.

FGVCAircraft (FineGrained Visual Classification of Aircraft)

0 results

FGVCAircraft (FineGrained Visual Classification of Aircraft) is a benchmark dataset for finegrained image classification of aircraft. The dataset contains ~10,000 images organized in a threelevel hierarchy (manufacturer / family / variant) covering 100 aircraft models (variants) and multiple manufacturers/families. It was introduced to support finegrained visual categorization research and includes image-level annotations and evaluation code; it has been widely used in fewshot and finegrained classification evaluations. Official project page and the original paper (arXiv:1306.5151) provide download links and annotation files. (No additional split information was provided in the paper beyond the benchmark/evaluation protocol.)

No results tracked yet

CUB (CUB-200-2011)

Caltech-UCSD Birds-200-2011 (CUB-200-2011)

0 results

Caltech-UCSD Birds-200-2011 (CUB-200-2011) is a fine-grained image classification dataset of 200 bird species containing 11,788 images. Each image is annotated with a class label, one bounding box, 15 part locations, and 312 binary attributes. The dataset provides standard train/test splits (train: 5,994 images, test: 5,794 images) and is widely used as a benchmark for fine-grained categorization and part localization. Note: images overlap with ImageNet (caution when using ImageNet-pretrained models).

No results tracked yet

ImageNet-1k

0 results

ImageNet-1k is a dataset used for image classification. It contains 1,281,167 training images, 50,000 validation images, and 100,000 test images. Each category in ImageNet-1k is a leaf category, meaning that there are no child nodes below it.

No results tracked yet

Places365

Places365 (Places365-Standard)

0 results

Places365 (Places365-Standard) is a large-scale scene recognition dataset derived from the Places / Places2 collection (a repository of ~10M scene images). The Places365-Standard split contains ~1.8 million training images across 365 scene categories, with 50 images per category in the validation set and 900 images per category in the test set. It was released to support scene (environment) classification and to provide large-scale training data for CNNs; variants include Places365-Standard (core set) and Places365-Challenge (larger competition set). Official materials, pretrained CNNs and category lists are hosted by the MIT CSAIL Places project (places2.csail.mit.edu).

No results tracked yet

CIFAR-100

CIFAR-100 (Canadian Institute For Advanced Research)

0 results

CIFAR-100 is a widely used image classification dataset of 60,000 color images (32x32) in 100 classes, with 600 images per class. There are 50,000 training images and 10,000 test images; each class has 500 training and 100 test images. The 100 "fine" classes are grouped into 20 "coarse" superclasses, and each image has both a fine label (one of 100 classes) and a coarse label (one of 20 superclasses). CIFAR-100 was created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton as labeled subsets of the 80 Million Tiny Images dataset and is commonly used for benchmarking image-classification models and transfer learning. Canonical dataset page / download and documentation: https://www.cs.toronto.edu/~kriz/cifar.html. A common Hugging Face hosted variant is uoft-cs/cifar100 (https://huggingface.co/datasets/uoft-cs/cifar100). Citation (original dataset page / tech report): Alex Krizhevsky, "Learning multiple layers of features from tiny images" (2009, dataset homepage: https://www.cs.toronto.edu/~kriz/cifar.html).

No results tracked yet

Oxford Flowers-102

Oxford 102 Flower Dataset (Oxford Flowers-102)

0 results

The Oxford 102 Flower Dataset (often called Oxford Flowers-102) is a fine-grained image classification dataset created by the Visual Geometry Group (VGG) at the University of Oxford. It contains 102 flower categories commonly occurring in the United Kingdom. Each class has between 40 and 258 images, for a total of 8,189 images. The images exhibit large variation in scale, pose and illumination, and several classes are visually similar making the task challenging for classifiers. The dataset is split into training, validation and test sets: training and validation each contain 10 images per class (1,020 images each) and the test set contains the remaining 6,149 images (min 20 images per class). The dataset has been widely used for image classification and fine-grained visual categorization research and is available through multiple libraries and mirrors (official VGG homepage, TensorFlow Datasets, PyTorch torchvision and community Hugging Face dataset entries). Original dataset documentation and the authors' paper and thesis are hosted on the VGG (Oxford) website.

No results tracked yet

VTAB (19 tasks)

Visual Task Adaptation Benchmark (VTAB)

0 results

VTAB (Visual Task Adaptation Benchmark) is a benchmark suite of 19 image classification tasks designed to evaluate how well general visual representations adapt to diverse, unseen tasks with limited labeled data. VTAB frames all tasks as classification problems (to provide a consistent API) and emphasizes low-data transfer: the commonly-used VTAB-1k protocol uses 1,000 training examples per task and reports the mean (top-1) accuracy averaged across the 19 tasks; VTAB also supports a full-dataset evaluation scenario. The benchmark tasks are drawn from multiple domains (commonly described as Natural, Specialized and Structured groups) to exercise different aspects of representations. VTAB places one key constraint on pre-training: the evaluation datasets must not be used during pre-training. Public resources: project site and leaderboard (https://google-research.github.io/task_adaptation/), code and data splits on GitHub (https://github.com/google-research/task_adaptation), the OpenReview (ICLR 2020) page for “The Visual Task Adaptation Benchmark” (https://openreview.net/forum?id=BJena3VtwS), and the related arXiv paper “A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark” (arXiv:1910.04867, https://arxiv.org/abs/1910.04867).

No results tracked yet

ImageNet Real

ImageNet ReaL (Reassessed ImageNet Real Labels)

0 results

ImageNet ReaL (often written ImageNet-ReaL) is the set of cleaned-up/reassessed labels for the ImageNet ILSVRC2012 validation split produced by Beyer et al. (2020) to provide a more reliable evaluation benchmark. The authors collected new human annotations for the original 50,000 validation images (the ILSVRC2012 val split), allowed discovery of valid multi-labels and corrected many original labeling errors, and released the reassessed labels and supporting files (e.g. real.json) in the google-research/reassessed-imagenet repository. The reassessed labels are intended to be used in place of (or alongside) the original ImageNet validation labels when reporting model accuracy; evaluations are commonly reported on the validation split.

No results tracked yet

Related Tasks

Get notified when these results update

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep Image Classification benchmarks accurate. Report outdated results, missing benchmarks, or errors.

0/2000
Image Classification Benchmarks - Computer Vision - CodeSOTA | CodeSOTA