Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Benchmark · CIFAR-10Home/Leaderboards/Vision & Documents/Image Classification/CIFAR-10
Unknown

CIFAR-10.

60K 32x32 color images in 10 classes. Classic small-scale image classification benchmark with 50K training and 10K test images.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

accuracy

Accuracy is the reported evaluation metric for CIFAR-10. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksEdit
01ViT-H/14 (JFT-300M)
Vision Transformer ViT-H/14, pre-trained on JFT-300M and fine-tuned on CIFAR-10. 99.50 ± 0.06% reported in ViT paper Table 2 (appendix). This is the published state-of-the-art on CIFAR-10 as of 2025 (PapersWithCode SOTA). Paper: Dosovitskiy et al. 2021, ICLR 2021, arxiv:2010.11929.
verified99.52026Source ↗Edit result
02AIMv2 ViT-3B/14 448pxunverified99.52024Paper ↗Code ↗Edit result
03Vision Transformer (ViT-H/14)unverified99.52020Paper ↗Code ↗Edit result
04ViT-L/16 (JFT-300M)
Vision Transformer ViT-L/16, pre-trained on JFT-300M and fine-tuned on CIFAR-10. 99.42% reported in ViT paper Table 2 (appendix). Paper: Dosovitskiy et al. 2021, ICLR 2021, arxiv:2010.11929.
verified99.422026Source ↗Edit result
05BiT-Lunverified99.372019Paper ↗Code ↗Edit result
06BiT-L (ResNet152x4)
Big Transfer (BiT) ResNet-152x4 large upstream variant, pre-trained on JFT-300M and fine-tuned on CIFAR-10. 99.37% reported in BiT paper Table 1. Paper: Kolesnikov et al., ECCV 2020, arxiv:1912.11370.
verified99.372026Source ↗Edit result
07ViT-H/14 (IN-21K)
Vision Transformer ViT-H/14, pre-trained on ImageNet-21K and fine-tuned on CIFAR-10. 99.27% reported in ViT paper Table B1 (appendix). Paper: Dosovitskiy et al. 2021, ICLR 2021, arxiv:2010.11929.
verified99.272026Source ↗Edit result
08DeiT-B Distilled
Near-SOTA on CIFAR-10 with transfer learning.
unverified99.12025Source ↗Edit result
09deit-b-distilled
Near-SOTA on CIFAR-10 with transfer learning.
paper99.12025Source ↗Edit result
10ViT-L/16 (IN-21K)
Vision Transformer ViT-L/16, pretrained on ImageNet-21K and finetuned on CIFAR-10. 99.0% reported in ViT paper. Paper: Dosovitskiy et al. 2021, arxiv:2010.11929.
verified992026Source ↗Edit result
11ConvNeXt V2 Base
Strong CNN performance on small-scale benchmark.
unverified98.72025Source ↗Edit result
12convnext-v2-base
Strong CNN performance on small-scale benchmark.
paper98.72025Source ↗Edit result
13EfficientNet-B8 (NoisyStudent)
NoisyStudent EfficientNet-B8 trained with self-training and noise. 98.7% on CIFAR-10. Paper: Xie et al. 2020, arxiv:1911.04252.
verified98.72026Source ↗Edit result
14ViT-B/16 (IN-21K)
Vision Transformer ViT-B/16, pretrained on ImageNet-21K and finetuned on CIFAR-10. 98.13% reported in ViT paper. Paper: Dosovitskiy et al. 2021, arxiv:2010.11929.
verified98.132026Source ↗Edit result
15Swin-B
Swin Transformer Base, finetuned from IN-21K pretraining on CIFAR-10. Paper: Liu et al. 2021, arxiv:2103.14030.
verified982026Source ↗Edit result
16LeJEPA ViT-L (304M)unverified96.52025Paper ↗Code ↗Edit result
17ResNet-50
With Cutout augmentation.
paper96.012025Source ↗Edit result
18CN-CLIPunverified962022Paper ↗Code ↗Edit result
19ResNet-110unverified93.572015Paper ↗Code ↗Edit result
§ 04 · Submit a result

Add to the leaderboard.

← Back to Image Classification