Who leads the ImageNet-V2 benchmark?

Swin Transformer V2 Large currently leads ImageNet-V2 with a score of 84 on top-1-accuracy.

What is the state-of-the-art score on ImageNet-V2?

The state-of-the-art result on ImageNet-V2 is 84 (top-1-accuracy), achieved by Swin Transformer V2 Large as of 2025.

How many models are tracked on ImageNet-V2?

Codesota tracks 6 models on ImageNet-V2 across 2 metrics.

When was the ImageNet-V2 leaderboard last updated?

The ImageNet-V2 leaderboard on Codesota includes results through 2025, with the earliest tracked result from 2021.

Codesota · Computer Vision · Image Classification · ImageNet-V2Tasks/Computer Vision/Image Classification

Image Classification · benchmark dataset · 2019 · EN

ImageNet-V2 Matched Frequency.

Name: ImageNet-V2 Matched Frequency Benchmark Results
Creator: Codesota
Published: 2021-01-01
License: https://creativecommons.org/licenses/by/4.0/

10K new test images following ImageNet collection process. Tests model generalization beyond the original test set.

Saturated benchmark

Benchmark near ceiling or stagnant — no meaningful SOTA movement in 2+ years

Paper ↗Submit a result ↵

§ 01 · Leaderboard

Best published scores.

6 results indexed across 2 metrics. Shaded row marks current SOTA; ties broken by submission date.

Primary: top-1-accuracy · higher is better
All metrics: accuracy, top-1-accuracy

accuracy

4 rows

#	Model	Org	Submitted	Paper / code	accuracy
01	DINOv3 (7B)	—	Aug 2025	DINOv3 · code	81.40
02	SigLIP 2 (g/16)	—	Feb 2025	SigLIP 2: Multilingual Vision-Language Encoders with Imp… · code	79.80
03	ALIGN	—	Feb 2021	Scaling Up Visual and Vision-Language Representation Lea… · code	70.10
04	AltCLIP	—	Nov 2022	AltCLIP: Altering the Language Encoder in CLIP for Exten… · code	68.20

top-1-accuracy· primary

2 rows

#	Model	Org	Submitted	Paper / code	top-1-accuracy
01	Swin Transformer V2 LargeOpen	Microsoft	Dec 2025	microsoft-research	84
02	ConvNeXt V2 HugeOpen	Meta	Dec 2025	meta-research	80.50

Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.

§ 03 · Progress

1 steps
of state of the art.

Each row below marks a model that broke the previous record on top-1-accuracy. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · top-1-accuracy

Dec 18, 2025Swin Transformer V2 LargeMicrosoft84

Fig 3 · SOTA-setting models only. 1 entries span Dec 2025 → Dec 2025.

§ 04 · Literature

4 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

DINOv3
Aug 2025·DINOv3 (7B)
arXiv ↗Code
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Feb 2025·SigLIP 2 (g/16)
arXiv ↗Code
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
Nov 2022·AltCLIP
arXiv ↗Code
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Feb 2021·ALIGN
arXiv ↗Code

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

ImageNet-V2 Matched Frequency.

Best published scores.

1 stepsof state of the art.

4 paperstied to this benchmark.

Neighbouring benchmarks.

Have a score that beatsthis table?

1 steps
of state of the art.

4 papers
tied to this benchmark.

Have a score that beats
this table?